NFV evolution, VNF vs CNF, OpenStack vs Kubernetes, MANO architecture, deployment models, vendor landscape — why the platform matters as much as the NF software
1. What Is COTS Virtualisation in 5GC — The Simple Version
COTS (Commercial Off-The-Shelf) virtualisation means running 5G Core NFs on standard x86 or ARM servers using software virtualisation — instead of purpose-built telecom hardware with proprietary ASICs. The motivation is straightforward: faster feature velocity, lower hardware cost, multi-vendor flexibility, and cloud-native operations. The reality is more nuanced: 5GC NFs running on COTS hardware have specific configuration requirements (CPU pinning, hugepages, NUMA topology) that are very different from enterprise application workloads.
Every major 5GC vendor — Ericsson, Nokia, Huawei, Samsung — now ships NFs as container-native software designed to run on Kubernetes. But the operators who have had the smoothest deployments are the ones who understood that “runs on Kubernetes” does not mean “configure it like a web app.”
| 3GPP Reference |
| ETSI GS NFV-INF 001 — NFV Infrastructure Requirements |
| ETSI GS NFV-IFA 014 — Network Functions Virtualisation Management and Orchestration |
| GSMA NG.126 — Cloud Infrastructure Reference Model for 5G Core |
| 3GPP TS 23.501 Section 5.17 — Network Function Services |
2. Architecture — VNF vs CNF and the Platform Stack
VNF vs CNF — The Shift to Cloud-Native
| Dimension | VNF (VM-based) | CNF (Container-based) |
| Deployment unit | Full VM — OS + NF software | Container — NF software only, shared kernel |
| Boot time | Minutes (full OS boot) | Seconds (container start) |
| Memory overhead | 2–4 GB per VM for OS | ~50–200 MB per container |
| Scaling | Clone full VM — slow, heavyweight | Pod scale-out in seconds |
| Lifecycle management | VNFM (ETSI NFV IFA) | Kubernetes Operator + Helm charts |
| NIC access | SR-IOV via PCI passthrough | SR-IOV via CNI plugins (Multus + SRIOV-CNI) |
| State management | State on VM local disk | State in PersistentVolumes or external database |
| GCC operator trend | Legacy — still operating existing VNF deployments | CNF-first from 2022 onwards for all new 5GC buildouts |
Table 1 — VNF vs CNF. Industry direction is CNF-first for all new 5GC. VNF continues for operators with existing OpenStack investment and active NF contracts.
The Full Platform Stack
| Layer | VM-based (VNF) | Container-based (CNF) | Notes |
| NF Software | VNF on VM | CNF on Pod | Same NF logic, different packaging |
| Orchestration | VNFM (per-vendor) | Kubernetes Operator + Helm | K8s Operator handles NF-specific lifecycle |
| Infrastructure Mgmt | VIM (OpenStack) | K8s + CNI | OpenStack still used as IaaS under some K8s deployments |
| MANO | NFVO + VNFM + VIM (ETSI NFV) | K8s + Helm + ArgoCD/Flux | NFVO concept replaced by K8s service mesh + GitOps |
| Compute | COTS x86 servers | COTS x86 or ARM servers | Same hardware — virtualisation layer differs |
| Networking | SR-IOV + OVS-DPDK | Multus + SRIOV-CNI + OVS-DPDK | Both paths use SR-IOV for UPF data plane |
| Storage | Ceph RBD or NFS for VMs | Ceph RBD or local NVMe StorageClass | UDR and CHF: NVMe for low-latency DB I/O |
Table 2 — Platform stack comparison. The hardware is the same. The management and lifecycle layer is fundamentally different.
3. How It Works — The Kubernetes 5GC Platform
In a CNF-based 5GC deployment on Kubernetes, here is how the pieces fit together:
The K8s cluster is structured with dedicated node types. Master nodes run the K8s control plane (etcd, API server, scheduler) — typically 3 nodes for HA. Worker nodes are specialised: signalling-plane workers host AMF, SMF, PCF, UDM, AUSF, NRF pods (standard compute, 25 GbE management NIC). UPF workers are separately timed out with 100 GbE SR-IOV NICs, hugepages pre-allocated, CPU pinned.
NF deployment uses Helm charts. The operator installs the vendor-provided Helm chart with a values.yaml override file specifying: PLMN IDs, TAI configurations, DNN definitions, NRF endpoint, N2/N3 interface IP addresses, resource limits, replica counts. The chart deploys all Kubernetes objects: Deployment or StatefulSet, Services (ClusterIP for SBI, LoadBalancer or NodePort for N2/N3), ConfigMaps, Secrets (TLS certificates), NetworkAttachmentDefinitions (Multus secondary NICs for UPF).
K8s Operators manage NF-specific lifecycle events that vanilla K8s cannot handle: graceful SMF pod termination (drain active PDU sessions before killing the pod), UPF rolling upgrade (redirect GTP-U sessions to standby UPF, upgrade primary, redirect back), AMF session context preservation across restarts (write context to PersistentVolume before pod terminates).
4. Key Parameters and Technical Terms
| Term | Definition | Why It Matters for 5GC |
| CPU Pinning | Binding vCPUs to specific physical CPUs. Prevents OS scheduler from migrating threads. | UPF and AMF worker threads must be pinned. Without pinning: cache misses and latency jitter. Configure via K8s CPU Manager policy=static. |
| Hugepages | 2 MB or 1 GB memory pages (vs default 4 KB). Pre-allocated at boot. | DPDK (used by UPF packet processing) requires hugepages. Without them: 40–60% UPF throughput loss at line rate. |
| NUMA Topology | Non-Uniform Memory Access. Multi-socket servers have memory banks per socket. Cross-socket access adds 30–80 ns latency. | Set topologyManagerPolicy=single-numa-node in kubelet. UPF and AMF pods must have all CPUs and memory from the same NUMA node. |
| SR-IOV | Single Root I/O Virtualisation. One physical NIC presents as multiple Virtual Functions. | UPF requires SR-IOV for N3 and N6 line-rate forwarding. Without SR-IOV: all traffic through kernel network stack — cannot sustain 100 Gbps. |
| DPDK | Data Plane Development Kit. User-space packet processing library that bypasses kernel network stack. | Used by UPF for line-rate packet forwarding. Requires hugepages and CPU pinning to function efficiently. |
| Multus | Kubernetes CNI meta-plugin. Allows pods to have multiple network interfaces. | UPF pod needs: primary CNI interface for management + N4, plus SR-IOV VFs for N3 and N6. Multus attaches the SR-IOV VFs. |
| Guaranteed QoS (K8s) | Pod QoS class where requests = limits for all containers. These pods are last to be evicted. | Set for ALL 5GC NF pods. SMF or UPF evicted under memory pressure = mass session drop. |
| PodDisruptionBudget | K8s policy specifying minimum available replicas during voluntary disruptions (upgrades, node drain). | Set PDB for each NF: minAvailable=N-1. Prevents all AMF pods from being drained simultaneously during cluster upgrade. |
| Helm Chart | Kubernetes application package: templates + default values. Operators override via values.yaml. | Vendor delivers NF as Helm chart. Operator customises via values.yaml. Version-controlled deployment. |
| StatefulSet vs Deployment | StatefulSet gives pods stable hostnames and persistent storage. Deployment does not. | SMF and UDM: StatefulSet (stable pod names needed for session state and DB clustering). AMF: Deployment acceptable if state in external store. |
Table 3 — Platform key terms. CPU pinning, hugepages, and NUMA topology are the three configuration items that most commonly degrade UPF performance in initial deployments.
5. Common Issues in the Field
UPF Throughput 50% Below Expected — Hugepages Not Configured
Hugepages are a non-obvious requirement that is easy to miss in initial deployments. DPDK allocates large packet buffers and expects them to be in hugepage memory for TLB (Translation Lookaside Buffer) efficiency. Without hugepages, DPDK falls back to standard 4 KB pages and TLB misses under load dominate CPU time. The UPF CPU looks busy, but actual packet forwarding throughput is 40–60% below the server’s capability.
| Field Note: UPF Capped at 45 Gbps — Hugepages Missing from K8s Node Config |
| New SA deployment. UPF server spec: dual 100 GbE NICs, 64 vCPUs. Expected throughput: ~90 Gbps. |
| Production load test: UPF capped at 45 Gbps. CPU utilisation appeared normal (70%). |
| Investigation: hugepages-1Gi not configured in K8s node spec. DPDK using 4 KB standard pages. |
| Fix: add hugepages-1Gi: “32Gi” to K8s node spec; restart UPF pods. |
| Throughput jumped to 91 Gbps on same hardware with no other change. |
UPF Pod Evicted During Memory Pressure — Wrong QoS Class
If UPF pods are deployed as Burstable QoS (requests < limits), the K8s kubelet can evict them during node memory pressure events. An evicted UPF pod drops all active GTP-U sessions immediately. The replacement pod starts fresh with no session state. Every UE whose session was on that UPF loses connectivity until their device re-establishes the PDU session.
| Field Note: 40,000 Sessions Dropped — UPF Pod Evicted During Memory Pressure |
| Operator ran UPF with memory requests=16Gi, limits=64Gi (Burstable QoS). |
| During a memory pressure event on the node, kubelet selected the UPF pod for eviction. |
| 40,000 active PDU sessions dropped simultaneously. Session re-establishment took 3-5 minutes. |
| Fix: set UPF memory requests=limits=64Gi (Guaranteed QoS). Also: set PodDisruptionBudget=0 for UPF (never voluntarily evict). Set nodeSelector to UPF-dedicated worker nodes — no other pods compete. |
6. Troubleshooting
| Symptom | Root Cause | Check | Fix |
| UPF throughput well below spec | Hugepages not configured; DPDK using 4KB pages | UPF pod spec: hugepages-1Gi resource request; node: hugetlbfs mount | Configure hugepages-1Gi on K8s node spec; set resource request in UPF pod |
| UPF/SMF pod evicted during peak hours | Pod QoS class is Burstable — K8s evicts under memory pressure | K8s events: kubectl get events –field-selector reason=Evicted | Set requests=limits for all 5GC pods (Guaranteed QoS); dedicate worker nodes |
| NF latency spikes during busy hour | NUMA cross-socket memory access — vCPUs split across NUMA nodes | numactl –hardware on worker node; K8s topology manager policy | Set topologyManagerPolicy=single-numa-node; set CPU Manager policy=static |
| UPF N3 interface cannot sustain 10Gbps+ | SR-IOV not configured — packets going through kernel network stack | UPF pod: check NetworkAttachmentDefinition for SR-IOV VF; ethtool on N3 interface | Configure Multus + SRIOV-CNI; verify SR-IOV VF is attached to UPF pod |
| NF pod restart loses all sessions | SMF/UPF state not persisted — ephemeral pod storage | K8s pod spec: check volume mounts for session state persistence | Use StatefulSet with PersistentVolume for SMF session state; or external session DB |
Table 4 — Platform troubleshooting. Most performance issues are K8s configuration errors, not NF software bugs.
7. Design Recommendations
Separate UPF worker nodes from signalling-plane worker nodes. UPF requires: hugepages pre-allocated, DPDK-enabled NICs, CPU pinned, NUMA-local. Sharing a node with AMF/SMF pods introduces resource contention that is extremely difficult to debug. Dedicated UPF nodes with nodeSelector and Taints/Tolerations prevent accidental co-scheduling.
Set K8s resource requests = limits for every 5GC NF pod from day one. Guaranteed QoS class prevents eviction under memory pressure. Size limits based on vendor specifications plus 20% headroom. Accepting Burstable QoS to save memory today guarantees an outage under load tomorrow.
Version-control all Helm values.yaml files in Git. Every NF configuration change should go through a Git pull request review before applying to production. This is the single most effective change management practice for K8s-based 5GC deployments — it creates an audit trail and a rollback point for every configuration change.
8. Summary — Key Takeaways
| Topic | Key Takeaway |
| VNF vs CNF | Industry has moved to CNF-first. Same NF software, container packaging. K8s Operator replaces VNFM for lifecycle management. |
| Hugepages | Mandatory for UPF DPDK packet processing. Configure hugepages-1Gi on K8s node spec AND in UPF pod resource request. Missing = 40-60% throughput loss. |
| CPU pinning + NUMA | Set CPU Manager policy=static and topologyManagerPolicy=single-numa-node. UPF and AMF worker threads must not cross NUMA nodes. |
| SR-IOV | Required for UPF N3/N6 line-rate. Configured via Multus + SRIOV-CNI. Without SR-IOV: packet throughput limited by kernel network stack. |
| Guaranteed QoS | requests=limits for ALL 5GC pods. Burstable UPF pod can be evicted = mass session drop. Non-negotiable. |
| Dedicated UPF nodes | Use nodeSelector + Taints to prevent non-UPF pods on UPF worker nodes. Resource contention between UPF and signalling NFs is hard to debug. |
| GitOps for config | Version-control all Helm values.yaml. Every change has a PR, review, and rollback point. |
Table 5 — Post 06 summary. COTS virtualisation works reliably when the platform is configured correctly. Most production failures are platform config issues.
Next: Post 07 — 5GC Hardware & Infrastructure
