5G Transport KPIs What Really Matters and Why

Latency, jitter, packet loss, throughput — how to measure them, correlate them with RAN degradation, and fix them

1. Why Transport KPIs Matter More Than Most Teams Realise

Transport KPIs are the bridge between the physical network and RAN performance. When a gNB shows poor PDCP throughput, elevated RLC retransmission rates, or increased handover failure ratio, the RAN team checks RF parameters first. In most cases they find nothing wrong. The problem is on the transport — but without proper transport KPI monitoring, the investigation stalls and the blame game begins.

The four fundamental transport KPIs are latency, jitter (packet delay variation), packet loss, and throughput/utilisation. Each one has a direct causal relationship with specific RAN KPIs. Understanding these relationships is what separates a transport engineer who can diagnose cross-domain problems from one who just monitors link utilisation.

2. The Four Core KPIs — What They Mean in Practice

Latency — One-Way and Round-Trip

Latency in the transport context is the time a packet takes to traverse the network from source to destination. For 5G interfaces, the relevant latency measurements are:

InterfaceLatency MetricThresholdRAN Impact if Exceeded
Fronthaul (RU→DU)One-way latency< 100 µs (Option 7-2x)HARQ timing failure, increased HARQ retransmissions
Midhaul (DU→CU)One-way latency< 1 msPDCP SDU reordering, RLC buffer growth
N3 (CU-UP→UPF)Round-trip latency< 10 ms (eMBB), < 1 ms (URLLC)High application RTT, poor user throughput via slow-start
N2 (gNB→AMF)Round-trip latency< 100 msSlow RRC setup, paging delays, handover latency
X2/Xn (inter-gNB)Round-trip latency< 10 ms (X2 for PDCP split)NSA: poor dual connectivity performance, PDCP reordering

Jitter — Packet Delay Variation

Jitter is the variation in packet latency over time. A link with average latency of 2ms but jitter of 5ms is worse for real-time services than a link with average latency of 4ms and jitter of 0.1ms. Jitter destroys VoNR quality even when average latency is acceptable, because the voice codec cannot compensate for unpredictable arrival times.

For PTP timing distribution, jitter on PTP packets (measured as PDV — Packet Delay Variation) directly translates to timing error at the RU. A PDV spike during a burst of data traffic on a shared link can push the RU’s phase error past the 1.5 µs threshold, causing TDD interference.

Packet Loss — The Silent Throughput Killer

Packet loss in transport is categorised as:

  • Random loss — typically caused by bit errors on physical links. In fibre networks, this should be < 10^-12. Rates above 10^-9 indicate a fibre or connector problem.
  • Congestion loss — caused by queue overflow at router interfaces. This is the dominant loss mechanism in live networks. A 1% congestion loss on the N3 path causes TCP throughput to drop by roughly 30% due to congestion avoidance mechanisms.
  • Policer/shaper loss — intentional drops by traffic policers enforcing rate limits. Must be distinguished from congestion loss — policer drops on RAN traffic indicate QoS misconfiguration.

Throughput and Utilisation

Link utilisation is the most visible KPI but also the most misleading. A link at 70% average utilisation can be causing significant packet loss at the microsecond level during bursts — which never shows up in 5-minute polling intervals. Real transport monitoring requires sub-minute granularity and burst detection.

Utilisation LevelRisk AssessmentRecommended Action
< 50%Safe — adequate headroom for burstsMonitor quarterly, no action needed
50-70%Watch zone — burst headroom shrinkingIncrease monitoring frequency, plan upgrade within 12 months
70-85%Congestion risk during peak eventsPriority upgrade planning, deploy traffic shaping, investigate offload options
> 85%Active congestion likely — VoNR at riskImmediate upgrade or traffic rerouting required

3. Tools Used in Real Operations

TWAMP — Two-Way Active Measurement Protocol

TWAMP (RFC 5357) is the standard tool for measuring latency, jitter, and packet loss on live transport paths. It works by sending test packets between two TWAMP-enabled endpoints (typically PE routers or DU/CU management interfaces) and measuring timestamps at both ends. Unlike ICMP ping, TWAMP uses UDP with configurable DSCP markings — meaning you can measure QoS-class-specific performance on the actual forwarding path.

  • Deploy TWAMP sessions on all major transport paths: fronthaul aggregation, midhaul hub-to-hub, N3 path to UPF
  • Run separate TWAMP sessions for each DSCP class — EF (VoNR), AF41 (video), CS0 (best effort) — to detect class-specific issues
  • Set measurement interval to 10 seconds for real-time monitoring — 5-minute polling misses burst events

Streaming Telemetry — The Modern Approach

SNMP polling every 5 minutes is not sufficient for 5G transport. Streaming telemetry (gRPC/gNMI) pushes interface counters, queue statistics, and routing state to a TSDB (Time Series Database — typically InfluxDB, Prometheus, or VictoriaMetrics) at 10-60 second intervals. This gives you:

  • Sub-minute link utilisation — detect micro-bursts invisible to SNMP polling
  • Per-queue drop counters — see which traffic class is being dropped, not just total drops
  • BGP session state and route count changes — detect control plane events that precede data plane problems
  • PTP clock state and time error — integrate timing monitoring into the same dashboard as traffic KPIs

4. Practical Example — High Latency Causing Poor User Throughput

An operator in Muscat reports that 5G download speeds at a CBD site are 30-50% below the expected throughput despite strong signal and low interference. Transport investigation reveals:

KPIMeasured ValueExpectedRoot Cause
N3 RTT (CU-UP to UPF)28ms< 8msUPF is at remote DC — N3 traverses 3 extra hops
N3 Jitter8ms peak PDV< 1msN3 path shares queue with bulk internet traffic, no QoS
TCP throughput calculation28ms RTT, 0.1% loss → max ~4Mbps per flowExpected 100Mbps+TCP slow-start never opens window — RTT-limited
FixDeploy regional UPF at Muscat hub + enable QoS on N3 pathN3 RTT dropped to 4ms, throughput improved 4x

Key insight: TCP throughput is fundamentally limited by RTT via the bandwidth-delay product. A TCP flow on a 28ms RTT path with a typical window size of 65KB can achieve at most ~18Mbps — even on a 10GE link with zero congestion. High N3 RTT is a throughput ceiling, not a congestion problem. Fix the RTT, not the link capacity.

5. KPI Correlation With RAN Issues

RAN KPI DegradationTransport KPI to CheckLikely Cause
VoNR call drop / MOS degradationN3 jitter > 5ms, priority queue dropsQoS misconfiguration — VoNR not in priority queue
High PDCP retransmission rateMidhaul or N3 packet loss > 0.01%Congestion on transport link, or WRED dropping GTP-U
RRC setup failure spikeN2 latency > 100ms or packet lossTransport congestion or routing failure on N2 path
Handover failure increaseX2/Xn latency > 10ms or intermittent lossX2 routing via congested or indirect path
TDD interference / PDCCH errorsPTP PDV spike, TE > 1µsTiming: PTP packets queued with data traffic
Poor throughput despite good RFN3 RTT > 15msUPF too far from RAN — throughput RTT-limited

6. Common Monitoring Mistakes

  • Using 5-minute SNMP polling for transport monitoring — this interval misses micro-bursts that last milliseconds but cause packet loss visible at the application layer. Deploy streaming telemetry at 10-30s granularity minimum.
  • Monitoring only average latency — average latency hides tail latency. A link with average 2ms and 99th percentile 20ms is causing VoNR degradation at peak load. Always monitor P95 and P99 latency, not just average.
  • Not correlating transport and RAN KPIs in the same dashboard — RAN and transport teams monitor separate systems. Cross-domain correlation requires integrating transport TSDB with RAN performance management data in a unified view.
  • Ignoring queue drop counters — many operators monitor link utilisation but not per-queue drops. A link at 60% utilisation with 5% drops in the best-effort queue is healthy. A link at 60% utilisation with 0.1% drops in the priority queue is a critical problem.

7. Design Recommendations

  • Deploy TWAMP between all major transport endpoints — fronthaul aggregation, midhaul hubs, N3 path. Make TWAMP a network standard in commissioning acceptance criteria.
  • Integrate transport telemetry with RAN PM data — build a unified dashboard (Grafana is the standard open-source choice) that shows transport KPIs and RAN KPIs side by side. Correlation analysis becomes visual, not forensic.
  • Set latency SLAs per interface in transport contracts — do not accept vague ‘best effort with QoS’ wording. Specify TWAMP-measured latency, jitter, and loss thresholds per interface type, with penalties for sustained violations.
  • Monitor P99 latency, not average — configure TWAMP and telemetry to export percentile latency metrics. P99 latency spikes are what cause user-visible service degradation.

8. Summary — Practical Takeaways

Transport KPIs are not abstract numbers — they have direct, measurable causal relationships with RAN performance. High N3 RTT limits TCP throughput. Jitter degrades VoNR. Packet loss in the wrong traffic class causes retransmissions. Timing PDV causes TDD interference. Every transport team needs to monitor these KPIs proactively, at the right granularity, and correlate them with RAN KPIs to close the loop on cross-domain troubleshooting.

TakeawayAction
5-minute polling is too slowDeploy streaming telemetry at 10-30s intervals on all core transport links
Monitor P99 latency, not averageConfigure TWAMP to export percentile metrics — average hides tail latency problems
Correlate transport and RAN KPIsBuild unified dashboard — transport jitter and RAN VoNR drops in the same view
TCP throughput is RTT-limitedMeasure N3 RTT and fix UPF placement — do not upgrade link capacity to fix an RTT problem
Per-queue drops are the signalMonitor priority queue drops separately — any drops there are a critical event

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top