Data center routing is evolving. Some might even say, “finally!” the industry has been working too long with traditional routing protocols from the telecom and enterprise network environments. Now is the time for data centers to create routing protocols that efficiently address their particular needs, especially for hyper-reliability, scale and automation capabilities.

In recognition of this, the IETF has recently acknowledged that the routing needs of a data center are different than the WAN, and that it’s time to introduce a new routing protocol for data centers. The IETF’s Link State Vector Routing (LSVR) workgroup, of which I am one of the co-chairs, has been set up to address this.

network general fiber switch thinkstock photos amoklv
– Thinkstock / amoklv

Hybrid routing

There are a number of trends driving this need for a new data-center-specific protocol. Of course, it all starts with the explosive growth in cloud services, which is increasing the size of datasets, with video, big data and internet-connected sensors being a few of the culprits. Web services that traverse these datasets are increasingly able to access more data for improved results and can share large amounts of data between co-resident applications.

All of this puts higher demands on the network to support the processing pipelines of these massive datasets. The aggregate server traffic in a typical data center is currently doubling roughly every 12–15 months.

In order to address this growth in bandwidth, data centers have adopted multi-layered, hyperscale data center switching topologies as opposed to utilizing complex, highly engineered WAN switches. This more cost-efficient and modular approach has also increased traffic flow complexity. A variety of topologies have resulted, such as Clos, multi-tier leaf-spine, hypercube and Toroidal, which impose different demands on how best to distribute flows through data center networks and the need for better routing protocols.

First, and most important, existing routing protocols do not have good support for multipath, equal-cost forwarding. This is especially one of the downfalls of Border Gateway Protocol (BGP), which does not use network topological information to optimize bandwidth, latency and other network resources. This has traditionally been available in a number of Interior Gateway Protocols (IGPs), such as OSPF or ISIS, which used such technologies as ECMP, UCMP, IP Fast Reroute (Loop-Free Alternatives) and Shared Risk Link Groups (SRLGs). However, using these technologies to improve traffic engineering and improve underlay flows is non-trivial in a data center hyperscale environment.

Second, cloud data centers have no public access to high quality, open source IGP routing stacks that can operate at scale. As well, due to the nature of IGP protocol behavior, massively scalable data centers (MSDCs) have to expend a substantial amount of work to modify hardware switch stacks to tunnel control-protocol packets running in-line between hardware line cards and protocol processes.

Third, MSDCs are also concerned about the protocol overhead of running broadcast-based routing protocols across fabrics on the scale of hundreds or even thousands of switching elements. Scaling techniques such as OSPF Areas are hard to configure in these large data centers.

Fourth, network manageability is a key concern and maintaining hundreds of independent switch stacks and configurations sounds daunting.

In response to these requirements, the LSVR workgroup (WG) has been tasked with developing and documenting a new IPv4/IPv6 hybrid routing protocol combining link-state and path-vector routing mechanisms. The LSVR WG will utilize existing BGP IPv4/IPv6 transport, packet formats and error handling of BGP to facilitate link-state vector routing information distribution.

The LSVR protocol will use link-state vector information to calculate the routing table, similar to link state IGP technologies such as OSPF or ISIS. LSVR will effectively manifest capabilities traditionally only credited to link state routing protocols, but now at hyperscale for data centers. SDN is not forgotten, either, and is built into LSVR from the ground up. Finally, LSVR configuration can use simplified IPv6 link-local peering models providing operational configuration and security simplifications.

Data centers are vastly different than the WAN networks that existed 20 years ago. The era of SDN-based control and the shift towards hyperscale IoT, cloud-based and real-time experience services are driving the need for a hybrid routing protocol in data centers. Hybrid routing protocols like BGP-based LSVR, will allow more automation, more dynamics and more programmability at hyper-cloud datacenter scale. Finally, the industry is witnessing innovation happening within the data center underlay, which has remained untouched for the last 20 years.

Gunter Van De Velde is regional EMEA product line manager for Nokia’s IP & SDN technologies