Link dimensioning, traffic forecasting, redundancy planning, and cost vs performance trade-offs from a real consultant’s perspective
1. What Transport Design Actually Involves
Transport design is not an activity that happens after the RAN team has decided where to put base stations. It is a parallel engineering process that must start from day one of network planning, because transport constraints directly determine what RAN configurations are feasible, where UPFs can be placed, and what SLAs can be offered to enterprise customers.
The three pillars of transport design are dimensioning (how big do the links need to be), topology (what physical paths connect which nodes), and redundancy (what happens when things fail). Get any one wrong and you either overspend by deploying capacity you do not need, or underspend and face an upgrade crisis 18 months into operations.
2. Link Dimensioning — The Engineering Process
Step 1: Traffic Forecasting Per Segment
Every link dimensioning exercise starts with a traffic forecast. For 5G transport, the forecasting inputs are:
| Input | Source | How It Drives Dimensioning |
| Number of gNB sites per hub | RAN rollout plan | Each site contributes fronthaul and backhaul traffic |
| Peak throughput per gNB | RAN link budget, spectrum allocation | Massive MIMO 64T64R on 100MHz TDD can generate 3-5 Gbps downlink per sector |
| Number of sectors per site | Site design | Typical: 3 sectors × peak throughput = site peak |
| Traffic model (oversubscription) | Operator traffic data or ITU-T models | Simultaneous utilisation rate — typically 5-15% of sites at peak simultaneously |
| Growth rate forecast | Business plan | Year-over-year traffic growth — typically 30-50% in 5G networks |
| Overhead factors | GTP-U, IP/MPLS, eCPRI headers | Add 10-20% for encapsulation overhead above application throughput |
The oversubscription factor is the most critical — and most debated — input. In a mature LTE network, an operator might know that 8% of sites are simultaneously at peak utilisation. In a new 5G network with unknown traffic patterns, you have to estimate. Being too conservative wastes capex. Being too aggressive means congestion in month 6 of operations.
Step 2: Per-Segment Traffic Calculation
Once you have per-site throughput and oversubscription factors, calculate the traffic at each aggregation point:
- Site fronthaul: RU to DU capacity = peak eCPRI throughput per sector × number of sectors. For Option 7-2x with 100MHz NR: approximately 25Gbps per sector. Three sectors = 75Gbps. This drives fronthaul link sizing — typically 25GE per sector or 100GE aggregated per site.
- Aggregation hub midhaul: sum of all site backhaul throughputs feeding that hub × oversubscription factor. Example: 20 sites at 3Gbps peak each, 10% oversubscription = 20 × 3 × 0.10 = 6Gbps. A 10GE uplink from the hub to the core is adequate; plan for 100GE when growth reaches 50%.
- Core/N3 links: sum of all hub traffic plus growth headroom. If core carries traffic from 10 hubs each with 6Gbps aggregate = 60Gbps core link traffic. Two 100GE links with ECMP provide 200Gbps capacity and redundancy.
Step 3: Applying the 70% Rule
Never dimension transport links to 100% utilisation. The standard practice is to dimension to 70% average utilisation at peak busy hour, providing 30% headroom for:
- Traffic bursts above the busy-hour average — burstiness factor is typically 1.5-2x the average
- Rerouted traffic during failures — if RSVP-TE FRR or SR TI-LFA reroutes traffic from a failed link, the surviving links absorb the additional load
- Growth before the next upgrade cycle — links upgraded today must handle traffic growth for 12-24 months
3. Deciding When to Upgrade — 10G to 100G
The decision to upgrade links from 10GE to 100GE is driven by four factors:
| Factor | Threshold That Triggers Upgrade Planning | Lead Time Consideration |
| Sustained utilisation > 70% | At 70% average during busy hour | 6-12 months for fibre augmentation, 2-4 months for router port upgrade |
| Burst loss detected | Any burst loss in priority queue | Immediate — priority traffic loss is a live SLA breach |
| Forecast horizon < 12 months | Projected to hit 85% within 12 months | Start planning now — procurement and installation take time |
| New service launch | Enterprise slice, MEC deployment adding significant traffic | Pre-provision capacity before service launch — never retrofit under load |
Field Note: The most common mistake in transport capacity planning is reacting to congestion rather than anticipating it. By the time a link hits 85% utilisation and packet loss begins, the upgrade procurement process has not even started. In the GCC, fibre augmentation on a national backbone link can take 3-6 months. Plan upgrades when links hit 60%, not when they hit 80%.
4. Practical Example — Oman National Backbone Capacity Planning
An operator is planning 5G SA rollout across Oman: 300 sites in Year 1 (Muscat), 150 sites in Year 2 (Muscat + Salalah), 100 sites in Year 3 (interior + Dhofar). The transport design process:
| Planning Step | Year 1 Calculation | Year 3 Projection |
| Sites per Muscat hub | 30 sites per hub (10 hubs) | 50 sites per hub (6 hubs after consolidation) |
| Peak throughput per site | 3 Gbps (3 sectors × 1 Gbps avg NR throughput) | 5 Gbps (capacity growth + more UEs) |
| Hub aggregate (10% oversubscription) | 30 × 3 × 0.10 = 9 Gbps per hub | 50 × 5 × 0.10 = 25 Gbps per hub |
| Hub uplink sizing | 10GE (9 Gbps < 70% of 10GE) | 100GE (25 Gbps = 25% of 100GE — headroom for growth) |
| Core ring capacity (10 hubs) | 10 × 9 Gbps = 90 Gbps | 6 hubs × 25 Gbps = 150 Gbps |
| Core ring sizing | 2 × 100GE ECMP (200 Gbps capacity, 90 Gbps load = 45%) | 2 × 100GE still adequate (150/200 = 75% — plan 400GE for Year 4) |
5. Redundancy Planning — The Framework
Levels of Redundancy
Transport redundancy must be designed at three levels:
- Link redundancy — every link has a backup path. Achieved via MPLS FRR / SR TI-LFA providing sub-50ms automatic reroute on any single link failure. Minimum requirement for all RAN-facing transport.
- Node redundancy — every aggregation hub has two diverse exit paths, connected to different PE routers. If a PE router fails, traffic reroutes via the alternate PE. Requires careful design of PE router placement and inter-PE routing.
- Path diversity — control plane paths (N2 to AMF) and user plane paths (N3 to UPF) take physically diverse routes through the transport network. A single infrastructure failure cannot simultaneously kill both control and data plane for a site.
| Redundancy Type | Mechanism | Recovery Time | Cost Impact |
| Link protection (RSVP FRR / TI-LFA) | Pre-computed bypass/backup path — automatic | < 50ms | Low — software config only |
| Dual-homed PE | Site connected to two separate PE routers | < 1s (BFD + IGP) | Medium — dual uplinks per site |
| Dual-path N2/N3 | AMF and UPF reachable via two diverse WAN paths | Seamless (SCTP multi-homing for N2) | Medium — dual VRF routing design |
| Geographic redundancy | Core nodes at diverse physical locations | Minutes (requires operator action) | High — duplicate DC infrastructure |
6. Cost vs Performance Trade-offs — The Consultant’s Lens
Every transport design involves trade-offs between cost and performance. Here are the real decisions operators face:
- Fronthaul fibre vs wireless — dedicated dark fibre provides lowest latency and highest capacity for fronthaul. Microwave or XHAUL alternatives are lower cost but add latency and capacity constraints. For TDD 5G with tight fronthaul requirements, the cost premium for fibre is justified. For rural sites where fibre is not viable, budget carefully for latency overhead.
- Centralised vs distributed CU deployment — centralising CUs reduces hardware costs but increases midhaul traffic and latency. Distributed CUs (per site or per cluster) cost more in hardware but reduce transport complexity and latency. The right answer depends on the latency budget for the specific use case mix.
- RSVP-TE vs SR-MPLS operational cost — RSVP-TE has higher per-router configuration and management overhead. SR-MPLS has lower ongoing operational cost but requires a PCE investment for dynamic path computation. For networks with > 100 sites, SR-MPLS TCO is typically lower over a 5-year horizon.
- 10GE vs 25GE at the access layer — 25GE transceivers cost marginally more than 10GE but provide 2.5x the capacity for the same fibre. For new sites, always deploy 25GE or 100GE-capable hardware. Deploying 10GE today on 5G sites creates a guaranteed upgrade cycle within 18-24 months.
7. Common Planning Mistakes
- Planning for average throughput, not peak — transport must handle peak busy hour throughput, not average. Using average traffic in dimensioning calculations creates links that are fine most of the time and congested when it matters most.
- Not accounting for eCPRI overhead in fronthaul — eCPRI Option 7-2x generates significantly more traffic than the air interface throughput suggests. Operators who dimension fronthaul based on RF throughput alone find links saturated at 40% RF utilisation.
- Underestimating growth — 5G traffic growth in the first 2-3 years of deployment consistently exceeds forecasts. Build 40-50% growth into Year 1 designs. Links that are at 50% utilisation at launch will be at 75% within 18 months.
- Forgetting OAM traffic in dimensioning — NMS, telemetry, TWAMP, CDN synchronisation, and timing traffic all consume transport capacity. While individually small, aggregated OAM traffic on trunk links can be 5-10% of capacity. Include it in calculations.
- Single vendor assumptions — dimensioning based on a specific vendor’s peak throughput specifications often does not match field performance under real traffic conditions. Use conservative field measurements (80% of spec) for dimensioning, not theoretical maximums.
8. Troubleshooting Capacity Problems
- Identify the congested segment before upgrading — use interface utilisation telemetry to pinpoint exactly which link or queue is congested. Do not assume the obvious bottleneck — sometimes core rings are fine and the congestion is on a single hub-to-site aggregation link.
- Differentiate congestion from misconfiguration — a link at 40% utilisation with packet loss is a QoS problem (wrong traffic in wrong queue), not a capacity problem. A link at 85% utilisation with packet loss is a capacity problem. Treat them differently.
- Use traffic engineering to redistribute load before upgrading — if one path is at 80% and a parallel path is at 30%, use SR Policy or RSVP-TE to redistribute traffic before committing to a capacity upgrade. This buys time and may defer capex by 6-12 months.
9. Summary — Practical Takeaways
Transport design and capacity planning is a continuous process, not a one-time exercise. The best transport engineers are the ones who build in feedback loops — measuring real traffic against forecasts, identifying congestion before it becomes a problem, and triggering upgrade planning at 60% utilisation, not 80%.
The decisions that matter most are made before deployment: UPF placement, fronthaul technology choice, redundancy architecture, and oversubscription ratios. Getting these right means the network handles growth gracefully. Getting them wrong means a rushed upgrade cycle with service-affecting work under load.
| Takeaway | Action |
| Plan at peak, not average | Use busy-hour throughput with oversubscription factor — never average utilisation |
| Trigger upgrade planning at 60% utilisation | 6-12 month procurement lead times mean 80% is already too late |
| Always deploy 25GE or 100GE at new sites | Never deploy 10GE-only on new 5G sites — guaranteed upgrade cycle within 24 months |
| Account for eCPRI overhead in fronthaul | eCPRI Option 7-2x traffic is 10-15x higher than air interface throughput — dimension accordingly |
| Build redundancy at link, node, and path level | Single-level redundancy is not sufficient — design all three levels from day one |
Muhammad Tahir Riaz | trmtelcocloudai.com | Telecom Transport Series — Article 10 of 10 — COMPLETE SERIES
