COTS servers, x86 vs ARM, NIC offload, SmartNIC/DPU, DC design, vendor HW stacks, NFVI, sizing methodology
1. What Is 5GC Hardware — The Simple Version
5GC NFs run on standard COTS (Commercial Off-The-Shelf) servers — the same x86 platforms used in enterprise data centres, from Dell, HPE, Lenovo, or SuperMicro. The difference is in how those servers are configured and which peripherals they carry. An AMF pod can run on any modern Xeon server. A UPF pod forwarding 100 Gbps of GTP-U traffic cannot — it needs specific NIC hardware, hugepages, CPU pinning, and NUMA-aware scheduling.
The other difference from enterprise is the data centre itself. A 5GC DC must support high power density (up to 20 kW per rack for UPF-heavy deployments), carrier-grade redundancy (2N power, dual corded servers, geo-redundant DC pair), and a network fabric that delivers < 0.5 ms east-west latency between NFs. Get the hardware wrong and no amount of software tuning recovers the performance.
| 3GPP Reference |
| ETSI GS NFV-INF 001 — NFV Infrastructure requirements |
| GSMA NG.126 — Cloud Infrastructure Reference Model for 5G Core |
| 3GPP TS 28.500 — Management concept, architecture and requirements for mobile networks |
2. Architecture — The Hardware Stack
Typical 5GC Server Specifications
| Component | Signalling-Plane NFs (AMF, SMF, PCF) | User-Plane (UPF) | Notes |
| CPU | 2× Intel Xeon Scalable 4th Gen or AMD EPYC — 32–48 cores each | Same — but CPU pinning mandatory for DPDK workers | Disable C-states in BIOS for UPF servers. C-state transitions add ms-level jitter. |
| RAM | 256–512 GB DDR5 ECC | 256–512 GB DDR5 ECC + hugepages pre-allocated | SMF needs large RAM for session state. UPF needs hugepages for DPDK. |
| NIC | 2× 25 GbE — management + SBI | 2× 100 GbE SR-IOV (Mellanox ConnectX-7 or Intel E810) | UPF NIC choice directly determines max throughput. |
| Storage | NVMe SSD RAID-1 for OS; shared NVMe for UDR/CHF logs | OS NVMe only — UPF is stateless | UDR needs low-latency NVMe for subscriber DB I/O. |
| Form factor | 2U rack-mount, hot-swap drives, dual corded PSU | Same — no spinning disk | Carrier-grade: hot-swap everything, no single cable failure. |
Table 1 — 5GC server specifications by NF role. The hardware difference between signalling and user-plane NFs is primarily NIC type and hugepages.
x86 vs ARM
| Dimension | x86 (Intel/AMD) | ARM (Huawei Kunpeng, Ampere Altra) |
| Ecosystem maturity | Dominant — all NF vendors ship x86-native builds | Growing — Ericsson, Nokia, Huawei have ARM roadmaps; check version support matrix |
| Power efficiency | ~200–300W per socket at full load | 30–50% lower power for equivalent throughput |
| DPDK support | Mature, extensively tested | Supported but smaller ecosystem — fewer NIC driver options |
| GCC operator use | Universal — all SA deployments currently x86 | On roadmap — ARM UPF deployments expected 2025+ |
| Decision | Default choice — zero risk | Choose only if power/cooling is a primary constraint and vendor supports ARM version |
Table 2 — x86 vs ARM for 5GC. ARM offers power efficiency but requires explicit vendor version support validation before commitment.
NIC Selection — The UPF Bottleneck
| NIC Type | Max Throughput | Key Feature | Best For |
| Standard 10/25 GbE NIC | 10–25 Gbps | No offload — software only | Signalling-plane NFs only (AMF, SMF) |
| 100 GbE SR-IOV (Intel E810 / Mellanox ConnectX-6) | 100 Gbps | SR-IOV VFs, RSS, hardware filters | UPF N3/N6 — required for any serious user-plane throughput |
| SmartNIC / DPU (Nvidia BlueField-3) | 100–400 Gbps | ARM co-processor, GTP-U offload, inline IPsec | UPF with GTP-U hardware offload — frees host CPU for session management |
| Intel IPU (E2000) | 100 Gbps | Programmable pipeline, P4-based offload | Advanced UPF offload — less deployed than BlueField currently |
Table 3 — NIC options for 5GC. SR-IOV 100 GbE is the minimum for UPF production deployments. SmartNIC/DPU enables GTP-U hardware offload and frees CPU for control plane work.
3. Data Centre Design for 5GC
A 5GC data centre is not an enterprise DC. The combination of high-throughput UPF servers, geo-redundancy requirements, and carrier-grade availability targets drives specific design decisions:
| Design Parameter | 5GC Requirement | Reason |
| Rack power density | 8–20 kW per rack | UPF servers with dual 100 GbE NICs + NVMe draw 1–2 kW each. 10–12 per rack = 15–20 kW. |
| Cooling | Direct liquid cooling (DLC) or rear-door heat exchangers for UPF racks | Air cooling insufficient above 15 kW/rack. Inlet temperature must stay < 25°C to prevent CPU throttling. |
| DC fabric | Spine-leaf, 100 GbE ToR switches, ECMP | < 0.5 ms AMF-to-SMF-to-UPF east-west latency. Oversubscription kills NF-to-NF SBI performance. |
| Power redundancy | 2N UPS + generator; dual-corded servers to separate PDU chains | Single PDU failure must not affect any NF. 5-nines availability requires 2N at every power layer. |
| DC pairing | Two DCs within 50–100 km | > 100 km inter-DC RTT becomes > 1 ms. Stateful NF replication latency affects AMF/SMF failover speed. |
| Network separation | Separate VLANs/VRFs for SBI, N3/N6 user-plane, OAM management | SBI and user-plane QoS requirements are different. Management traffic must not compete with N4 PFCP. |
Table 4 — 5GC data centre design parameters. The DC fabric east-west latency and power density are the two most commonly underestimated requirements.
4. Key Parameters and Technical Terms
| Term | Definition | 5GC Significance |
| NFVI | Network Functions Virtualisation Infrastructure. The compute, storage, and network hardware that NFs run on. | Abstracted by VIM (OpenStack) or K8s. NF sees vCPU, vMemory, vNIC — not physical hardware. |
| C-states | CPU power management states. Deeper C-states save more power but add latency to return to active. | Must be disabled in BIOS for UPF servers. C-state transitions add ms-level wakeup latency — unacceptable for DPDK packet processing. |
| NUMA Node | Non-Uniform Memory Access node — one CPU socket and its directly attached memory banks. | UPF and AMF pods must have all vCPUs and memory from the same NUMA node. Cross-NUMA access: +30–80 ns per memory operation. |
| SR-IOV VF | Virtual Function created by SR-IOV NIC. Multiple VFs share one physical NIC but have independent TX/RX queues. | UPF pod gets one or more VFs for N3 and N6 interfaces. Bypasses kernel network stack for near-line-rate forwarding. |
| RSS (Receive Side Scaling) | Hardware NIC feature that distributes incoming packets across multiple CPU cores using a hash of packet headers. | Distributes GTP-U flows across UPF DPDK worker threads by TEID hash. Balances load. Without RSS: one core handles all N3 traffic. |
| GTP-U Offload | SmartNIC/DPU handles GTP-U encap/decap in hardware, not host CPU. | BlueField-3 can offload GTP-U processing, freeing host CPU for PFCP session management. Enables higher session density on same server. |
| Inline IPsec | IPsec encryption/decryption performed in NIC hardware pipeline. | N3 IPsec at full 100 Gbps line rate without CPU overhead. Requires NIC with crypto acceleration (Intel QAT, Mellanox ConnectX). |
Table 5 — Hardware key terms. C-state disablement and NUMA pinning are the two BIOS/OS configurations that most commonly degrade UPF performance in production.
5. Sizing Methodology
Hardware sizing starts from subscriber and session projections, not from server specs. Work backward from traffic to hardware:
| NF | Primary Sizing Driver | Rule of Thumb per 1M Subscribers | Key Caveat |
| AMF | Registration/paging event rate | 16–32 vCPU, 64–128 GB RAM | Higher in dense urban — frequent idle/active transitions |
| SMF | Concurrent PDU sessions (stateful) | 32–64 vCPU, 256–512 GB RAM | Memory-dominant. 10M concurrent sessions = up to 80 GB RAM. |
| UPF | Throughput (Gbps) and concurrent sessions | 32–64 vCPU + 2× 100GbE SR-IOV + hugepages | 1 vCPU per 3–5 Gbps with DPDK. SmartNIC GTP offload changes this significantly. |
| PCF | Policy decisions per second | 8–16 vCPU, 32–64 GB RAM | Scales with SMF — typically 1:1 vCPU ratio |
| UDM/UDR | Subscriber DB IOPS | 8–16 vCPU, 64–128 GB RAM + NVMe SSD | NVMe mandatory. Spinning disk latency causes auth failures at scale. |
| NRF | Discovery requests per second | 8–16 vCPU, 32–64 GB RAM | Often undersized. Add 50% headroom for registration storm scenarios. |
Table 6 — 5GC NF sizing rules of thumb. Always validate with vendor sizing tool using actual subscriber and traffic projections. These are starting points, not final specs.
6. Common Issues in the Field
| Field Note: NUMA Misconfiguration — AMF Latency 4× Expected |
| SA deployment: AMF N11 response latency P95 was 8 ms against 2 ms target. |
| CPU and memory utilisation appeared normal. No obvious bottleneck. |
| Investigation: AMF vCPUs were split across both NUMA nodes of a dual-socket server. |
| Memory allocated on NUMA node 0; some worker threads running on NUMA node 1. |
| Cross-NUMA memory access adding ~60 ns per session lookup. At high request rate: cumulative latency spike. |
| Fix: set topologyManagerPolicy=single-numa-node; set cpuManagerPolicy=static; restart AMF pods. |
| N11 latency P95 dropped to 1.8 ms. No hardware change. |
| Field Note: C-States Not Disabled — UPF Packet Processing Jitter |
| UPF throughput at low traffic (< 1 Gbps): excellent, < 0.1 ms forwarding latency. |
| Under burst traffic: latency spikes to 8–12 ms for first packets in each burst. |
| Root cause: CPU C-states enabled. CPU entered C3 state during quiet periods. |
| Return from C3 to C0 (active): ~5–10 ms wakeup latency. First burst packets delayed. |
| Fix: BIOS: set C-states = disabled. OS: add intel_idle.max_cstate=0 to kernel cmdline. |
| Burst latency spikes eliminated. Consistent < 0.5 ms forwarding latency. |
7. Troubleshooting
| Symptom | Root Cause | Check | Fix |
| UPF throughput well below server spec | Hugepages not configured; DPDK on 4KB pages | Node hugepages allocation; UPF pod hugepages resource request | Configure hugepages-1Gi on node; add resource request to UPF pod spec |
| AMF/SMF latency 2-5× expected at low utilisation | Cross-NUMA memory access | K8s node topology manager policy; numactl –hardware | Set single-numa-node policy; restart pods with CPU Manager pinning |
| UPF burst latency spikes (otherwise OK) | CPU C-states enabled — wakeup latency on first burst packets | BIOS C-state config; /sys/devices/system/cpu/cpu*/cpuidle/state*/disable | Disable C-states in BIOS; set intel_idle.max_cstate=0 in kernel |
| UPF N3 cannot sustain > 10 Gbps despite 100GbE NIC | SR-IOV VF not attached — kernel network stack bottleneck | UPF pod NetworkAttachmentDefinition; verify VF assigned to pod interface | Configure Multus SR-IOV CNI plugin; verify VF shows in UPF pod with ip link |
| NF pod repeatedly crashes during load | Insufficient memory limit — OOMKill | K8s events: kubectl get events; pod describe shows OOMKilled | Increase memory limit; set requests=limits for Guaranteed QoS |
Table 7 — Hardware platform troubleshooting. Most issues are BIOS/OS configuration errors discovered after deployment.
8. Summary — Key Takeaways
| Topic | Key Takeaway |
| COTS hardware | Standard x86 servers work for all 5GC NFs. Configuration matters more than brand. Same server misconfigured delivers 50% of its rated performance. |
| UPF NIC | SR-IOV 100 GbE (Intel E810 or Mellanox ConnectX) is the minimum for production UPF. SmartNIC/DPU (BlueField-3) enables hardware GTP-U offload for higher session density. |
| Hugepages | Configure hugepages-1Gi on K8s node spec AND in UPF pod resource request. Not optional for DPDK UPF — missing = 40–60% throughput loss. |
| NUMA + CPU pinning | Single-numa-node topology manager + static CPU manager policy. Cross-NUMA access is a silent latency multiplier. |
| C-states | Disable in BIOS for UPF servers. C-state wakeup latency causes burst latency spikes even when average utilisation is low. |
| DC design | Spine-leaf with < 0.5 ms east-west latency. 2N power. Geo-redundant DC pair within 50–100 km. Dedicated VLANs for SBI, user-plane, OAM. |
| Sizing | Work backward from subscriber/session projections. SMF is memory-dominant. UPF is throughput-dominant. NRF is request-rate-dominant — add 50% headroom for storms. |
Table 8 — Post 07 summary. Hardware is not the limiting factor. Configuration of hugepages, NUMA, C-states, and SR-IOV is.
Next: Post 08 — Cloud-Native 5GC on Kubernetes
