Transport Design & Capacity Planning How Experts Actually Make Decisions

Link dimensioning, traffic forecasting, redundancy planning, and cost vs performance trade-offs from a real consultant’s perspective

1. What Transport Design Actually Involves

Transport design is not an activity that happens after the RAN team has decided where to put base stations. It is a parallel engineering process that must start from day one of network planning, because transport constraints directly determine what RAN configurations are feasible, where UPFs can be placed, and what SLAs can be offered to enterprise customers.

The three pillars of transport design are dimensioning (how big do the links need to be), topology (what physical paths connect which nodes), and redundancy (what happens when things fail). Get any one wrong and you either overspend by deploying capacity you do not need, or underspend and face an upgrade crisis 18 months into operations.

2. Link Dimensioning — The Engineering Process

Step 1: Traffic Forecasting Per Segment

Every link dimensioning exercise starts with a traffic forecast. For 5G transport, the forecasting inputs are:

InputSourceHow It Drives Dimensioning
Number of gNB sites per hubRAN rollout planEach site contributes fronthaul and backhaul traffic
Peak throughput per gNBRAN link budget, spectrum allocationMassive MIMO 64T64R on 100MHz TDD can generate 3-5 Gbps downlink per sector
Number of sectors per siteSite designTypical: 3 sectors × peak throughput = site peak
Traffic model (oversubscription)Operator traffic data or ITU-T modelsSimultaneous utilisation rate — typically 5-15% of sites at peak simultaneously
Growth rate forecastBusiness planYear-over-year traffic growth — typically 30-50% in 5G networks
Overhead factorsGTP-U, IP/MPLS, eCPRI headersAdd 10-20% for encapsulation overhead above application throughput

The oversubscription factor is the most critical — and most debated — input. In a mature LTE network, an operator might know that 8% of sites are simultaneously at peak utilisation. In a new 5G network with unknown traffic patterns, you have to estimate. Being too conservative wastes capex. Being too aggressive means congestion in month 6 of operations.

Step 2: Per-Segment Traffic Calculation

Once you have per-site throughput and oversubscription factors, calculate the traffic at each aggregation point:

  • Site fronthaul: RU to DU capacity = peak eCPRI throughput per sector × number of sectors. For Option 7-2x with 100MHz NR: approximately 25Gbps per sector. Three sectors = 75Gbps. This drives fronthaul link sizing — typically 25GE per sector or 100GE aggregated per site.
  • Aggregation hub midhaul: sum of all site backhaul throughputs feeding that hub × oversubscription factor. Example: 20 sites at 3Gbps peak each, 10% oversubscription = 20 × 3 × 0.10 = 6Gbps. A 10GE uplink from the hub to the core is adequate; plan for 100GE when growth reaches 50%.
  • Core/N3 links: sum of all hub traffic plus growth headroom. If core carries traffic from 10 hubs each with 6Gbps aggregate = 60Gbps core link traffic. Two 100GE links with ECMP provide 200Gbps capacity and redundancy.

Step 3: Applying the 70% Rule

Never dimension transport links to 100% utilisation. The standard practice is to dimension to 70% average utilisation at peak busy hour, providing 30% headroom for:

  • Traffic bursts above the busy-hour average — burstiness factor is typically 1.5-2x the average
  • Rerouted traffic during failures — if RSVP-TE FRR or SR TI-LFA reroutes traffic from a failed link, the surviving links absorb the additional load
  • Growth before the next upgrade cycle — links upgraded today must handle traffic growth for 12-24 months

3. Deciding When to Upgrade — 10G to 100G

The decision to upgrade links from 10GE to 100GE is driven by four factors:

FactorThreshold That Triggers Upgrade PlanningLead Time Consideration
Sustained utilisation > 70%At 70% average during busy hour6-12 months for fibre augmentation, 2-4 months for router port upgrade
Burst loss detectedAny burst loss in priority queueImmediate — priority traffic loss is a live SLA breach
Forecast horizon < 12 monthsProjected to hit 85% within 12 monthsStart planning now — procurement and installation take time
New service launchEnterprise slice, MEC deployment adding significant trafficPre-provision capacity before service launch — never retrofit under load

Field Note: The most common mistake in transport capacity planning is reacting to congestion rather than anticipating it. By the time a link hits 85% utilisation and packet loss begins, the upgrade procurement process has not even started. In the GCC, fibre augmentation on a national backbone link can take 3-6 months. Plan upgrades when links hit 60%, not when they hit 80%.

4. Practical Example — Oman National Backbone Capacity Planning

An operator is planning 5G SA rollout across Oman: 300 sites in Year 1 (Muscat), 150 sites in Year 2 (Muscat + Salalah), 100 sites in Year 3 (interior + Dhofar). The transport design process:

Planning StepYear 1 CalculationYear 3 Projection
Sites per Muscat hub30 sites per hub (10 hubs)50 sites per hub (6 hubs after consolidation)
Peak throughput per site3 Gbps (3 sectors × 1 Gbps avg NR throughput)5 Gbps (capacity growth + more UEs)
Hub aggregate (10% oversubscription)30 × 3 × 0.10 = 9 Gbps per hub50 × 5 × 0.10 = 25 Gbps per hub
Hub uplink sizing10GE (9 Gbps < 70% of 10GE)100GE (25 Gbps = 25% of 100GE — headroom for growth)
Core ring capacity (10 hubs)10 × 9 Gbps = 90 Gbps6 hubs × 25 Gbps = 150 Gbps
Core ring sizing2 × 100GE ECMP (200 Gbps capacity, 90 Gbps load = 45%)2 × 100GE still adequate (150/200 = 75% — plan 400GE for Year 4)

5. Redundancy Planning — The Framework

Levels of Redundancy

Transport redundancy must be designed at three levels:

  • Link redundancy — every link has a backup path. Achieved via MPLS FRR / SR TI-LFA providing sub-50ms automatic reroute on any single link failure. Minimum requirement for all RAN-facing transport.
  • Node redundancy — every aggregation hub has two diverse exit paths, connected to different PE routers. If a PE router fails, traffic reroutes via the alternate PE. Requires careful design of PE router placement and inter-PE routing.
  • Path diversity — control plane paths (N2 to AMF) and user plane paths (N3 to UPF) take physically diverse routes through the transport network. A single infrastructure failure cannot simultaneously kill both control and data plane for a site.
Redundancy TypeMechanismRecovery TimeCost Impact
Link protection (RSVP FRR / TI-LFA)Pre-computed bypass/backup path — automatic< 50msLow — software config only
Dual-homed PESite connected to two separate PE routers< 1s (BFD + IGP)Medium — dual uplinks per site
Dual-path N2/N3AMF and UPF reachable via two diverse WAN pathsSeamless (SCTP multi-homing for N2)Medium — dual VRF routing design
Geographic redundancyCore nodes at diverse physical locationsMinutes (requires operator action)High — duplicate DC infrastructure

6. Cost vs Performance Trade-offs — The Consultant’s Lens

Every transport design involves trade-offs between cost and performance. Here are the real decisions operators face:

  • Fronthaul fibre vs wireless — dedicated dark fibre provides lowest latency and highest capacity for fronthaul. Microwave or XHAUL alternatives are lower cost but add latency and capacity constraints. For TDD 5G with tight fronthaul requirements, the cost premium for fibre is justified. For rural sites where fibre is not viable, budget carefully for latency overhead.
  • Centralised vs distributed CU deployment — centralising CUs reduces hardware costs but increases midhaul traffic and latency. Distributed CUs (per site or per cluster) cost more in hardware but reduce transport complexity and latency. The right answer depends on the latency budget for the specific use case mix.
  • RSVP-TE vs SR-MPLS operational cost — RSVP-TE has higher per-router configuration and management overhead. SR-MPLS has lower ongoing operational cost but requires a PCE investment for dynamic path computation. For networks with > 100 sites, SR-MPLS TCO is typically lower over a 5-year horizon.
  • 10GE vs 25GE at the access layer — 25GE transceivers cost marginally more than 10GE but provide 2.5x the capacity for the same fibre. For new sites, always deploy 25GE or 100GE-capable hardware. Deploying 10GE today on 5G sites creates a guaranteed upgrade cycle within 18-24 months.

7. Common Planning Mistakes

  • Planning for average throughput, not peak — transport must handle peak busy hour throughput, not average. Using average traffic in dimensioning calculations creates links that are fine most of the time and congested when it matters most.
  • Not accounting for eCPRI overhead in fronthaul — eCPRI Option 7-2x generates significantly more traffic than the air interface throughput suggests. Operators who dimension fronthaul based on RF throughput alone find links saturated at 40% RF utilisation.
  • Underestimating growth — 5G traffic growth in the first 2-3 years of deployment consistently exceeds forecasts. Build 40-50% growth into Year 1 designs. Links that are at 50% utilisation at launch will be at 75% within 18 months.
  • Forgetting OAM traffic in dimensioning — NMS, telemetry, TWAMP, CDN synchronisation, and timing traffic all consume transport capacity. While individually small, aggregated OAM traffic on trunk links can be 5-10% of capacity. Include it in calculations.
  • Single vendor assumptions — dimensioning based on a specific vendor’s peak throughput specifications often does not match field performance under real traffic conditions. Use conservative field measurements (80% of spec) for dimensioning, not theoretical maximums.

8. Troubleshooting Capacity Problems

  • Identify the congested segment before upgrading — use interface utilisation telemetry to pinpoint exactly which link or queue is congested. Do not assume the obvious bottleneck — sometimes core rings are fine and the congestion is on a single hub-to-site aggregation link.
  • Differentiate congestion from misconfiguration — a link at 40% utilisation with packet loss is a QoS problem (wrong traffic in wrong queue), not a capacity problem. A link at 85% utilisation with packet loss is a capacity problem. Treat them differently.
  • Use traffic engineering to redistribute load before upgrading — if one path is at 80% and a parallel path is at 30%, use SR Policy or RSVP-TE to redistribute traffic before committing to a capacity upgrade. This buys time and may defer capex by 6-12 months.

9. Summary — Practical Takeaways

Transport design and capacity planning is a continuous process, not a one-time exercise. The best transport engineers are the ones who build in feedback loops — measuring real traffic against forecasts, identifying congestion before it becomes a problem, and triggering upgrade planning at 60% utilisation, not 80%.

The decisions that matter most are made before deployment: UPF placement, fronthaul technology choice, redundancy architecture, and oversubscription ratios. Getting these right means the network handles growth gracefully. Getting them wrong means a rushed upgrade cycle with service-affecting work under load.

TakeawayAction
Plan at peak, not averageUse busy-hour throughput with oversubscription factor — never average utilisation
Trigger upgrade planning at 60% utilisation6-12 month procurement lead times mean 80% is already too late
Always deploy 25GE or 100GE at new sitesNever deploy 10GE-only on new 5G sites — guaranteed upgrade cycle within 24 months
Account for eCPRI overhead in fronthauleCPRI Option 7-2x traffic is 10-15x higher than air interface throughput — dimension accordingly
Build redundancy at link, node, and path levelSingle-level redundancy is not sufficient — design all three levels from day one

Muhammad Tahir Riaz  |  trmtelcocloudai.com  |  Telecom Transport Series — Article 10 of 10 — COMPLETE SERIES

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top