3GPP TS 28.552 PM counters, registration and session KPIs, UPF throughput, N4 latency, Huawei/Ericsson/Nokia counter mapping, alert thresholds
1. What Are 5GC KPIs — The Simple Version
5GC KPIs are the performance counters that tell you whether the core is healthy before your subscribers notice it is not. 3GPP TS 28.552 defines the standard PM (Performance Measurement) counter framework for 5GC. Every NF should implement these counters. Every vendor names them differently. This article maps the critical KPIs to the 3GPP standard names, explains what they measure, defines alert thresholds, and provides vendor counter name mappings.
The most common KPI mistake: operators monitor NF CPU and memory (platform KPIs) but not service KPIs (registration success rate, PDU session setup latency, N4 timeout rate). A UPF can be at 30% CPU and still be dropping PDU sessions because its N4 processing thread pool is exhausted. Platform metrics do not substitute for service metrics.
| 3GPP Reference |
| 3GPP TS 28.552 — Management and orchestration: 5G performance measurements |
| 3GPP TS 28.550 — Performance Assurance for 5G Network Slices |
| 3GPP TS 28.554 — 5G end-to-end Key Performance Indicators (KPI) |
2. Registration KPIs
| KPI Name | 3GPP Counter (TS 28.552) | Formula | Alert Threshold | Failure Signature |
| Registration Success Rate | VS.AMF.Reg.InitReg.Att, VS.AMF.Reg.InitReg.Succ | (Succ/Att) × 100 | < 99.5% = investigate; < 98% = critical | < 98%: check AUSF auth failures, UDM N8 response time, NRF discovery latency |
| Auth Success Rate | VS.AUSF.Auth.Att, VS.AUSF.Auth.Succ | (Succ/Att) × 100 | < 99.8% = investigate | < 99%: SQN desync (check SYNCH_FAILURE counter), AUSF overload, UDM ARPF latency |
| Registration Latency P95 | VS.AMF.Reg.Latency (histogram) | P95 of (RegistrationAccept_time – RegRequest_time) | P95 > 500ms = investigate; > 1s = critical | > 1s: AUSF/UDM response latency; NRF discovery not cached; PCF AM policy latency |
| Periodic Reg Update Failure | VS.AMF.Reg.PeriodicReg.Fail | Count of T3512-triggered registrations failing | > 0.5% of periodic regs = investigate | Indicates UEs cannot re-register in coverage — AMF capacity or coverage issue |
| Implicit Deregistration Rate | VS.AMF.Dereg.Implicit | Count per hour | Spike = investigate (baseline from BH profile) | Sudden spike: AMF restart without context preservation; or mass device power-off (shift change at factory) |
Table 1 — Registration KPIs with 3GPP counter names, formulas, and alert thresholds.
3. PDU Session KPIs
| KPI Name | 3GPP Counter | Formula / Value | Alert Threshold | Root Cause if Breached |
| PDU Session Estab Success Rate | VS.SMF.PDUSess.Estab.Att, .Succ | (Succ/Att) × 100 | < 99.0% = investigate; < 97% = critical | Check: N4 timeout rate, PCF N7 rejection, UDM N10 latency, IP pool exhaustion |
| PDU Session Setup Latency P95 | VS.SMF.PDUSess.Estab.Latency | P95 of (PFCP Session Estab confirmed – CreateSMContext received) | P95 > 300ms = investigate | N4 timeout (UPF overloaded), PCF high latency, NRF cache miss |
| Active PDU Sessions (concurrent) | VS.UPF.PDUSess.Active | Current count (gauge) | > 85% of max configured capacity | Near capacity: scale UPF or add UPF instance |
| Abnormal PDU Session Release Rate | VS.SMF.PDUSess.Rel.Abnormal | (Abnormal_rel / Total_rel) × 100 | > 0.5% = investigate | GTP-U path failure, UPF crash, PFCP Association loss, SMF timeout |
| N4 Association Failure Rate | VS.SMF.N4Assoc.Fail | Count per hour | > 0 = investigate (should always be 0) | SMF cannot reach UPF on N4. Check IP connectivity, PFCP port 8805, IPsec if used |
| PFCP Session Modification Timeout | VS.SMF.N4.Sess.Mod.Timeout | Count per hour | > 0.01% of modifications = investigate | UPF N4 thread pool saturated. Handover path switch updates failing. Asymmetric connectivity. |
Table 2 — PDU Session KPIs. The PFCP Session Modification Timeout counter is the single most important diagnostic for asymmetric connectivity failures.
4. UPF User Plane KPIs
| KPI | 3GPP Counter | Alert Threshold | Action |
| UPF DL Throughput Mean | VS.UPF.Tput.DL.Mean | > 80% of N6 link capacity = plan upgrade | Provision additional N6 capacity or add UPF instance |
| N3 Packet Drop Rate | VS.UPF.N3.PktDrop / VS.UPF.N3.PktRx | > 0.01% sustained | NIC SR-IOV queue overflow or UPF DPDK worker CPU saturated |
| UPF CPU (DPDK workers) | Platform metric | > 70% sustained on DPDK workers | Scale UPF pods horizontally; or add hugepages/CPU pinning verification |
| DDN Processing Latency | VS.UPF.DDN.Latency | P95 > 100ms | N4 congestion; prioritise DDN messages on N4 |
| Buffer Utilisation | VS.UPF.Buffer.Utilisation | > 80% | Downlink buffering for idle UEs exceeding capacity; review IoT DNN NOCP config |
Table 3 — UPF user plane KPIs. N3 packet drop and DPDK CPU utilisation are the early warning signs before full UPF capacity saturation.
5. Vendor Counter Mapping
| KPI | 3GPP TS 28.552 Name | Huawei iMaster NCE | Ericsson ENM / ERIC-CELL | Nokia NetAct |
| Reg Success Rate | VS.AMF.Reg.InitReg.Succ/Att | AMF.RegSuccRatio | pmAmfRegInitSucc / pmAmfRegInitAtt | NF_AMF_REG_SR |
| Auth Success Rate | VS.AUSF.Auth.Succ/Att | AUSF.AuthSuccRatio | pmAusfAuthSucc / pmAusfAuthAtt | NF_AUSF_AUTH_SR |
| PDU Estab SR | VS.SMF.PDUSess.Estab.Succ/Att | SMF.PduSessEstSuccRatio | pmSmfPduEstSucc / pmSmfPduEstAtt | NF_SMF_SESS_SR |
| N4 Session Estab | VS.SMF.N4.Sess.Estab.Succ/Att | SMF.N4SessEstSuccRatio | pmSmfN4SessEstSucc | NF_SMF_N4_SR |
| UPF DL Throughput | VS.UPF.Tput.DL.Mean | UPF.DLThroughputMean | pmUPFDlMeanThroughput | NF_UPF_DL_TPUT |
| PFCP Mod Timeout | VS.SMF.N4.Sess.Mod.Timeout | SMF.N4SessModTimeoutCnt | pmSmfN4SessModTimeout | NF_SMF_N4_MOD_TO |
Table 4 — Vendor KPI counter mapping (indicative — verify with vendor PM guide for exact counter names and collection granularity). All counters available at 15-minute collection interval.
6. Grafana Dashboard Layout for 5GC
A production 5GC Grafana dashboard should have five row groups:
Row 1 — Registration Health: AMF Registration SR (%), Auth SR (%), Registration Latency P95, Implicit Deregistration Rate (per hour). Alert: Reg SR < 99.5% fires page-worthy alert immediately.
Row 2 — Session Health: PDU Session Estab SR (%), Active PDU Sessions (gauge + capacity line), Abnormal Release Rate (%), PFCP Modification Timeout Rate. Alert: PFCP Mod Timeout Rate > 0.01% fires investigation alert.
Row 3 — User Plane: UPF DL/UL Throughput Gbps (time series), N3 Packet Drop Rate (%), UPF DPDK CPU Utilisation (%). Alert: N3 drop > 0.01% = investigate immediately.
Row 4 — NRF/SBI Health: NRF Discovery Request Rate, NRF Cache Hit Rate, SBI 4xx/5xx Error Rate by NF pair. Alert: SBI 503 rate from NRF > 0.1% = NRF overload investigation.
Row 5 — Per-Slice KPIs: Registration SR per S-NSSAI, PDU Session SR per S-NSSAI, Throughput per S-NSSAI. Alert: enterprise slice SR < 99.0% = SLA breach investigation.
7. Summary — Key Takeaways
| Topic | Key Takeaway |
| Service vs platform KPIs | Monitor service KPIs (Reg SR, PDU SR, PFCP timeouts) alongside platform KPIs (CPU, memory). A healthy CPU with 0% PFCP mod timeouts is more meaningful than a stressed CPU with zero session failures. |
| Registration SR | Threshold: > 99.5%. Below 99%: check AUSF, UDM, NRF in that order. Auth SR below 99.8% with SQN SYNCH_FAILURE = USIM batch issue. |
| PFCP Mod Timeout | Critical counter. Even 0.1% rate means users experiencing asymmetric connectivity. Check UPF N4 thread pool utilisation. |
| Vendor counter names | 3GPP names are the standard. Vendor names differ. Build counter mapping table during acceptance testing — before production incident requires it at 2am. |
| Per-slice KPIs | Essential for enterprise SLA management. A degraded enterprise slice may have < 1% impact on total registrations — invisible in aggregate KPIs but a contract breach. |
Table 5 — Post 13 summary. KPIs are early warning systems. Build them before go-live, not after the first outage.
Next: Post 14 — 5GC Troubleshooting — Real Failures, Wireshark & Grafana
