Sometimes a term gets so entrenched that we take its meaning for granted. That seems to have happened with “virtual network”, despite the fact that just what the term means and how one might be created has changed radically over the years. In the last year, I asked almost a hundred enterprise and service provider network planners what a “virtual network” was, and there wasn’t nearly as much agreement as I thought there’d be.
Conceptually, a “virtual network” is to today’s IP network what “virtual machine” is to a bare-metal server. It looks like a network from the perspective of a user, but it’s hosted on a real network rather than being a direct property of it. There are many drivers for virtual networks, which probably accounts for the multiplicity of meanings assigned to the term, but there’s one underlying issue that seems to cross over all the boundaries.
Real networks, at least real IP networks, were designed to connect sites rather than people. They’re also a combination of Level 2 and Level 3 concepts—a “subnet” is presumed to be on the same Level 2 network and the real routing process starts when you exit it, via the default gateway. The base concept of real IP networks worked fine as long as we didn’t have a lot of individuals and servers we expected to connect. When we did, we ended up having to gimmick the IP address space to make IPv4 go further, and we created what were arguably the first “virtual networks” to separate tenant users in a shared data center or cloud.
Another problem that’s grown up in recent years is the classic “what-it-is-where-it-is” question. IP addresses are linked to a network service access point, which is typically the gateway router for a site. A user of the network, in a different site, would have a different address. In mobile networks, having a smartphone roam to another cell means having it leave the place where its connection is made, so mobility management uses tunnels to follow the user, which is a form of virtual networks.
The what/where dilemma can also complicate security. IP networks are permissive in a connection sense, which means that they presume any address can send something to any other, and this has created a whole security industry. Prior to IP the dominant enterprise network protocol was IBM’s System Network Architecture (SNA), which used a central element (the Systems Services Control Point) to authorize “sessions” within the network, with a session being a relationship between network parties, users, rather than network components. This security industry added to the installed base of IP devices to make it harder and harder to change IP in a fundamental way, which has again boosted the notion of virtual networking.
Then there’s the big issue, which is “best efforts”. IP does support traffic engineering (MPLS, for example) but typically in the carrier network and not the user endpoints. A branch office and even a headquarters location doesn’t have MPLS connectivity. Traffic from all sources tends to compete for resources equally, which means that if there are resource limitations (and what network doesn’t have them?) you end up with congestion that can impact the CxO planning meeting as easily as someone’s take-a-break streaming video.
There have been proposals to change IP to address almost all these issues, but the installed base of devices and clients, combined with the challenges of standardizing anything in a reasonable time, has limited the effectiveness of these changes, and most are still in the proposal stage. So, in a practical sense, we could say that virtual networks are the result of the need to create a more controllable connection experience without changing the properties of the IP network that’s providing raw connectivity.
Building a virtual network is different from defining what the term means. There are two broad models of virtual network currently in play, and I think it’s likely these represent even future models of virtual networking. One is the software-defined network, where forwarding behavior is controlled by something other than inter-device adaptive exchanges, and where routes can be created on demand between any points. The other is the overlay network where a new protocol layer is added on top of IP, and where that layer actually provides connectivity for users based on a different set of rules than IP would use.
The SDN option, which is favored obviously by the ONF, falls into what they call “programmable networks”, which means that the forwarding rules that lace from device to device to create a route are programmed in explicitly. Today, the presumption is that happens from a central (probably redundant) SDN controller. In the future, it might happen from a separate cloud-hosted control plane. The advantage of this is that the controller establishes the connectivity, and it can fulfill somewhat the same role as the SSCP did in those old-time SNA networks (which, by the say, still operate in some IBM sites).
As straightforward and attractive as this may sound, it still has its issues. The first is that because SDN is a network change, it’s only available where operators support it. That means that a global enterprise would almost certainly not be able to use the SDN approach to create a custom connectivity service over their entire geography. The second is that we have no experience to speak of on whether the SDN concept is scalable on a large scale, or on whether we could add enough entries to a flow switch (the SDN router) to accommodate individual sessions.
The overlay network option is already in use, in both general virtual-network applications (VMware’s NSX, Nokia/Nuage, etc.) and in the form of SD-WAN. Overlay networks (like the mobility management features of mobile networks) take the form of “tunnels” (I’m putting the term in quotes for reasons soon to be clear) and “nodes” where the tunnels terminate and cross-connecting traffic is possible. This means that connectivity, to the user, is created above IP and you can manage it any way you like.
What you like may not be great, though, when you get to the details. Overlay virtual networks will add another header to the data packets, which has the effect of lowering link bandwidth available for data. Header overhead depends on packet size, but it can be as high as 50% or more. In addition, everywhere you terminate an overlay tunnel you need processing power. The more complex the process, the more power you need.
It’s logical to ask at this point whether we really have an either/or here. Why couldn’t somebody provide both implementations in parallel? You could build a virtual overlay network end-to-end everywhere, and you could customize the underlying connectivity the virtual network is overlaid on using SDN.
Now for the reason for all those in-quotes terms I’ve been using, and promising to justify. Juniper Networks has its own SDN (Contrail), and they just completed their acquisition of what I’ve always said was the best SD-WAN vendor, 128 Technology. What 128T brings to the table is session awareness, which means that they know the identity of the user and application/data resources, and so can classify traffic flows between the two as “sessions”, and then prioritize resources for the ones that are important. Because 128T doesn’t use tunnels for a physical overlay (they have a “session overlay” that has minimal overhead), they don’t consume as much bandwidth and their termination overhead is also minimal.
What Contrail brings is the ability to manipulate a lower-level transport property set so that actual IP connectivity and the SLA are at least somewhat controllable. With the addition of Juniper’s Mist AI to the picture for user support and problem resolution, you have a pretty interesting, even compelling, story. You can imagine a network that’s vertically integrated between virtual, experience-and-user-oriented, connectivity and a virtualization layer that’s overlaid on transport IP. From user, potentially, to core, with full integration and full visibility and support.
If, of course, this is a line Juniper pursues. The good news is that I think they will, because I think competitors will move on the space quickly, whether Juniper takes a lead with 128T or not. That means that while Juniper may surrender first-mover opportunities to define the space, they’re going to have to get there eventually. They might as well make the move now, and get the most possible benefit, because it could be a very significant benefit indeed.