5GC KPIs & Performance Management

3GPP TS 28.552 PM counters, registration and session KPIs, UPF throughput, N4 latency, Huawei/Ericsson/Nokia counter mapping, alert thresholds

1. What Are 5GC KPIs — The Simple Version

5GC KPIs are the performance counters that tell you whether the core is healthy before your subscribers notice it is not. 3GPP TS 28.552 defines the standard PM (Performance Measurement) counter framework for 5GC. Every NF should implement these counters. Every vendor names them differently. This article maps the critical KPIs to the 3GPP standard names, explains what they measure, defines alert thresholds, and provides vendor counter name mappings.

The most common KPI mistake: operators monitor NF CPU and memory (platform KPIs) but not service KPIs (registration success rate, PDU session setup latency, N4 timeout rate). A UPF can be at 30% CPU and still be dropping PDU sessions because its N4 processing thread pool is exhausted. Platform metrics do not substitute for service metrics.

3GPP Reference
3GPP TS 28.552 — Management and orchestration: 5G performance measurements
3GPP TS 28.550 — Performance Assurance for 5G Network Slices
3GPP TS 28.554 — 5G end-to-end Key Performance Indicators (KPI)

2. Registration KPIs

KPI Name3GPP Counter (TS 28.552)FormulaAlert ThresholdFailure Signature
Registration Success RateVS.AMF.Reg.InitReg.Att, VS.AMF.Reg.InitReg.Succ(Succ/Att) × 100< 99.5% = investigate; < 98% = critical< 98%: check AUSF auth failures, UDM N8 response time, NRF discovery latency
Auth Success RateVS.AUSF.Auth.Att, VS.AUSF.Auth.Succ(Succ/Att) × 100< 99.8% = investigate< 99%: SQN desync (check SYNCH_FAILURE counter), AUSF overload, UDM ARPF latency
Registration Latency P95VS.AMF.Reg.Latency (histogram)P95 of (RegistrationAccept_time – RegRequest_time)P95 > 500ms = investigate; > 1s = critical> 1s: AUSF/UDM response latency; NRF discovery not cached; PCF AM policy latency
Periodic Reg Update FailureVS.AMF.Reg.PeriodicReg.FailCount of T3512-triggered registrations failing> 0.5% of periodic regs = investigateIndicates UEs cannot re-register in coverage — AMF capacity or coverage issue
Implicit Deregistration RateVS.AMF.Dereg.ImplicitCount per hourSpike = investigate (baseline from BH profile)Sudden spike: AMF restart without context preservation; or mass device power-off (shift change at factory)

Table 1 — Registration KPIs with 3GPP counter names, formulas, and alert thresholds.

3. PDU Session KPIs

KPI Name3GPP CounterFormula / ValueAlert ThresholdRoot Cause if Breached
PDU Session Estab Success RateVS.SMF.PDUSess.Estab.Att, .Succ(Succ/Att) × 100< 99.0% = investigate; < 97% = criticalCheck: N4 timeout rate, PCF N7 rejection, UDM N10 latency, IP pool exhaustion
PDU Session Setup Latency P95VS.SMF.PDUSess.Estab.LatencyP95 of (PFCP Session Estab confirmed – CreateSMContext received)P95 > 300ms = investigateN4 timeout (UPF overloaded), PCF high latency, NRF cache miss
Active PDU Sessions (concurrent)VS.UPF.PDUSess.ActiveCurrent count (gauge)> 85% of max configured capacityNear capacity: scale UPF or add UPF instance
Abnormal PDU Session Release RateVS.SMF.PDUSess.Rel.Abnormal(Abnormal_rel / Total_rel) × 100> 0.5% = investigateGTP-U path failure, UPF crash, PFCP Association loss, SMF timeout
N4 Association Failure RateVS.SMF.N4Assoc.FailCount per hour> 0 = investigate (should always be 0)SMF cannot reach UPF on N4. Check IP connectivity, PFCP port 8805, IPsec if used
PFCP Session Modification TimeoutVS.SMF.N4.Sess.Mod.TimeoutCount per hour> 0.01% of modifications = investigateUPF N4 thread pool saturated. Handover path switch updates failing. Asymmetric connectivity.

Table 2 — PDU Session KPIs. The PFCP Session Modification Timeout counter is the single most important diagnostic for asymmetric connectivity failures.

4. UPF User Plane KPIs

KPI3GPP CounterAlert ThresholdAction
UPF DL Throughput MeanVS.UPF.Tput.DL.Mean> 80% of N6 link capacity = plan upgradeProvision additional N6 capacity or add UPF instance
N3 Packet Drop RateVS.UPF.N3.PktDrop / VS.UPF.N3.PktRx> 0.01% sustainedNIC SR-IOV queue overflow or UPF DPDK worker CPU saturated
UPF CPU (DPDK workers)Platform metric> 70% sustained on DPDK workersScale UPF pods horizontally; or add hugepages/CPU pinning verification
DDN Processing LatencyVS.UPF.DDN.LatencyP95 > 100msN4 congestion; prioritise DDN messages on N4
Buffer UtilisationVS.UPF.Buffer.Utilisation> 80%Downlink buffering for idle UEs exceeding capacity; review IoT DNN NOCP config

Table 3 — UPF user plane KPIs. N3 packet drop and DPDK CPU utilisation are the early warning signs before full UPF capacity saturation.

5. Vendor Counter Mapping

KPI3GPP TS 28.552 NameHuawei iMaster NCEEricsson ENM / ERIC-CELLNokia NetAct
Reg Success RateVS.AMF.Reg.InitReg.Succ/AttAMF.RegSuccRatiopmAmfRegInitSucc / pmAmfRegInitAttNF_AMF_REG_SR
Auth Success RateVS.AUSF.Auth.Succ/AttAUSF.AuthSuccRatiopmAusfAuthSucc / pmAusfAuthAttNF_AUSF_AUTH_SR
PDU Estab SRVS.SMF.PDUSess.Estab.Succ/AttSMF.PduSessEstSuccRatiopmSmfPduEstSucc / pmSmfPduEstAttNF_SMF_SESS_SR
N4 Session EstabVS.SMF.N4.Sess.Estab.Succ/AttSMF.N4SessEstSuccRatiopmSmfN4SessEstSuccNF_SMF_N4_SR
UPF DL ThroughputVS.UPF.Tput.DL.MeanUPF.DLThroughputMeanpmUPFDlMeanThroughputNF_UPF_DL_TPUT
PFCP Mod TimeoutVS.SMF.N4.Sess.Mod.TimeoutSMF.N4SessModTimeoutCntpmSmfN4SessModTimeoutNF_SMF_N4_MOD_TO

Table 4 — Vendor KPI counter mapping (indicative — verify with vendor PM guide for exact counter names and collection granularity). All counters available at 15-minute collection interval.

6. Grafana Dashboard Layout for 5GC

A production 5GC Grafana dashboard should have five row groups:

Row 1 — Registration Health: AMF Registration SR (%), Auth SR (%), Registration Latency P95, Implicit Deregistration Rate (per hour). Alert: Reg SR < 99.5% fires page-worthy alert immediately.

Row 2 — Session Health: PDU Session Estab SR (%), Active PDU Sessions (gauge + capacity line), Abnormal Release Rate (%), PFCP Modification Timeout Rate. Alert: PFCP Mod Timeout Rate > 0.01% fires investigation alert.

Row 3 — User Plane: UPF DL/UL Throughput Gbps (time series), N3 Packet Drop Rate (%), UPF DPDK CPU Utilisation (%). Alert: N3 drop > 0.01% = investigate immediately.

Row 4 — NRF/SBI Health: NRF Discovery Request Rate, NRF Cache Hit Rate, SBI 4xx/5xx Error Rate by NF pair. Alert: SBI 503 rate from NRF > 0.1% = NRF overload investigation.

Row 5 — Per-Slice KPIs: Registration SR per S-NSSAI, PDU Session SR per S-NSSAI, Throughput per S-NSSAI. Alert: enterprise slice SR < 99.0% = SLA breach investigation.

7. Summary — Key Takeaways

TopicKey Takeaway
Service vs platform KPIsMonitor service KPIs (Reg SR, PDU SR, PFCP timeouts) alongside platform KPIs (CPU, memory). A healthy CPU with 0% PFCP mod timeouts is more meaningful than a stressed CPU with zero session failures.
Registration SRThreshold: > 99.5%. Below 99%: check AUSF, UDM, NRF in that order. Auth SR below 99.8% with SQN SYNCH_FAILURE = USIM batch issue.
PFCP Mod TimeoutCritical counter. Even 0.1% rate means users experiencing asymmetric connectivity. Check UPF N4 thread pool utilisation.
Vendor counter names3GPP names are the standard. Vendor names differ. Build counter mapping table during acceptance testing — before production incident requires it at 2am.
Per-slice KPIsEssential for enterprise SLA management. A degraded enterprise slice may have < 1% impact on total registrations — invisible in aggregate KPIs but a contract breach.

Table 5 — Post 13 summary. KPIs are early warning systems. Build them before go-live, not after the first outage.

Next: Post 14 — 5GC Troubleshooting — Real Failures, Wireshark & Grafana

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top