How many OSPF routers can be deployed in one area ?

This question fits in the same category as “How long is a piece of string ?”. The answer is of course, “it depends”.

Back in 1998 whilst working for a router vendor, I was helping design a backbone network for a European ISP. Our web site was recommending a maximum of 40 – 60 routers per OSPF area, but the customer was designing for 180 in a single backbone area. The network was deployed and went live fine. I was curious as to how it would perform over the long term, so 6 months later I contacted the customers operations team for an update. (If you want to verify a network design, then there are few better ways than asking the Operations team. If there have been problems at 2am, you will definitely get some candid feedback). The customers operations team said the network was stable, and were happy with its performance. So the “official” 40 – 60 routers per area recommendation looked pretty conservative.

Today, if you google the title of this blog, the top hit says an area should have no more than 50 routers. That is from a design guide written in 2011. So over the 13 years from 1998 to 2011 the recommendation had surprisingly not increased. Now it’s almost 2020. 22 years later, what can we expect ?

Networking in 2020

Being a Link State protocol OSPF will consume as much CPU and Bandwidth as it needs until the network is converged. After that, whilst the network is stable, a good implementation will sit idle except for sending periodic hello’s to its neighbours. So (WAN) link stability will have a large impact on how well OSPF scales. We can probably assume that WAN links today are more stable than WAN links in 1998 [citation needed]. Anyway lets be conservative and assume they are no worse.

The critical resources OSPF needs are CPU to run the SPF algorithm, and CPU / Bandwidth to flood the LSA’s. Back in 1998 the Route Processors in the customers network used a 667 MHz Power PC CPU and the WAN links were either 2.5G (STM-16) or 10G (STM-64).

Over the 22 years from 1998-2020, Moore’s law states the CPU transistor count doubles every 2 years. Due to the increased transistor count the performance is expected to double around every 18 months. So the performance should have doubled around 14 times. In terms of CPU we should be able to support 180 x 2^14 = ~~3 million routers per area !

The lowest bandwidth link in the customer network in 1998 was 2.5G. In 2020 we can expect ISP backbones to be provisioned at 40G and 100G. So this is a much more modest 40/2.5 = 16X speedup. In terms of bandwidth for flooding LSA’s we should be able to support 180 x 16 = ~~3000 routers per area.

So in 2020, in theory you can have up to 3000 routers in a single area, in practice maybe not so much. Complex systems scale in surprising ways that are hard to model. But in 2020 OSPF should easily be able to scale to the 100’s of routers per area.

What if ?

  • What if we created a modern implementation of OSPF, developed with 21st Century Hardware in mind ?
  • What if the SPF Algorithm, the Link State Database and the Flood Queues used cache friendly data structures, so the CPU spends less time idling waiting for data ?
  • What if all incoming signalling was processed before running SPF and servicing the LSA flood queues ?
  • What if the OSPF route table remembered all the paths it has computed, rather than just the best ?
  • What if you could transfer 10,000 routes from OSPF to the RIB by moving an 8 byte pointer, rather than copying a gadzillion bytes between processes using IPC ?
  • What if the RIB only signalled the forwarding plane when a route had really changed, rather than when some route meta data had changed that the forwarding plane has no interest in ?
  • What if I stopped asking all these annoying “what if” questions and got to the point ?

In 1Q 2020 Flock Networks are shipping Routing Suite v20.0. If you would like to:

  • Be part of the early field trials.
  • Deploy over 1000 routers in a single OSPF area and not have the expected result being your vendor saying “You did what ?”
  • Save power by running a routing suite that mostly sleeps once the network is converged.
  • Gain observability into your existing OSPF network using a Read-Only JSON API.

Then please email eft@flocknetworks.com.

In the mean time here’s wishing everyone a Happy Solstice and a converged and stable 2020. If you are short of gift ideas I can highly recommend the book what if ?

Nick