What is virtualization as applied to networking? What is “open”? One of the challenges we always seem to face in networking (and often seem to get wrong) is defining our terms. I’ve noted in past blogs that a virtual, open, network means to some the substitution of open devices or hosted instances of device software for proprietary appliances. Others think it’s more, and I’m in the latter category. Could our differences here be nothing more than how we define things? Let’s look a bit deeper and see, using some examples.
We’ll start with a hypothetical router network consisting of a hundred transit routers distributed through a couple-thousand-mile geography. Imagine these routers to be nice, familiar, Cisco or Juniper boxes. Everyone in the service provider network operations space knows how this kind of network works, everyone knows how to put out an RFP to expand it, and if management tools are needed, everyone knows where to get them and how to apply them. This is the traditional network model.
Suppose now that we find some router software that’s compatible with the same standards as the real devices are. We then find white-box hardware, not servers, that can host this software, and we build an open router model using the combination. The network still has the same topology, the same operations procedures, the same management tools. The only question is the white box selection, and we could assume that white box vendors are going to provide their specifications, including expansion capacity. Maybe we miss our familiar sales and support people, and maybe we get a little culture shock when our racks of routers have different colors, but we’re OK here.
We get some servers with the right hardware acceleration features, add the software to it, and we create what we could call the router instance model. If all the interfaces are indeed compatible with the same standards as our real routers, we can apply the same management practices to this setup, but service provider network operations types don’t know how to pick servers and server features, or how to decide what the capacity of one of our hosted instances might be. Servers, after all, are really not designed to be routers. We’re a little off into scary-land, but not totally in the wild, and we can always paint our racks blue to calm things down.
Now suppose that we have a range of suitable servers stashed in data centers, including all of the locations of our original traditional routers, but also including other locations. Suppose that we can dynamically position router instances in any available data center in response to load changes or failures. This is what NFV proposed to do, and so we could call this the NFV model, based on pool infrastructure and agile deployment. To make the most of this, we’d have to deal with questions like whether a failure should result in adaptive reconfiguration as it would among real routers, or in some sort of dynamic scaling or recovery, as it could in a virtual-resource world. We have to think about supplementing our notion of the control plane to address this point. Old network rules might no longer apply, and so we might have to do some aromatherapy to calm down.
Then, think about a model where we have simple forwarding machines, white boxes or servers, that do nothing but push packets in response to forwarding tables that are loaded by a central controller. This is the SDN model. For the first time, we have actually separated the data and control planes, and we now have a completely different kind of network. The old management models, the old operations models, the tools we need, and so forth, are all likely to change, and change significantly. If we look around our ops center or network equipment racks, we see a brave new world. Our grizzled netops people may need a lot more of a morale boost than smelling the pretty flowers now.
For those bold and brave enough, we can then contemplate the next step. Suppose that we have a black box inside, connecting to our edge routers/devices. There is no specific structure, no model at all. We have floating control, data, and management functions described as a set of interface specifications. These make us an IP network because that’s what we look like, and we have a service level agreement (SLA) to show what we’ll commit to do, but inside we’re as mysterious as that stranger in the shadows. This is the black box model, where anything goes as long as you can look like you’re supposed to look to users, and do what you’re supposed to do.
Which of these is “virtual networking?” Which is fully exploitive of the cloud? We can get a lot by doing some groupings.
Our first group includes the first three models (traditional, open device, software instance), and what characterizes it is that while the nodes in the network change their implementation, the network itself remains the same. We’ll call this the node group. We make the same assumptions about the way we’d decide on routes for traffic, and we handle failures and congestion in the same way. All of these models are device networks.
Our second grouping includes the NFV and SDN models, and it represents a soft group. In NFV’s case, we soften the topology constraints—you can have nodes pop up or disappear as needed, which isn’t the case with device networks. In SDN’s case, you have a softening of the functional association with devices—the control plane moves off the forwarding device, and the dumbing down of the nodes could make it possible to have “empty nodes” deployed in a pool, to be loaded up with forwarding rules and run as needed.
Our final grouping includes only the black box model, so we’ll call it the black box group. The significance here is that only the connections to the outside world—the properties—matter. This may seem, on the surface, to be either a cop-out on implementation or a box that contains all my other models, but it’s not. The black-box group says that any implementation that can meet the interconnect requirements is fine. Thus, this group proposes that a “network” is defined by a hierarchical “intent model” that shows how the various technology-specific pieces (presumably created from one or more of the prior models) combine.
Since the black box group admits current (legacy) network structures, it might seem that it shouldn’t be considered a virtualization approach at all, but I think that it’s the key to how we should be thinking about network virtualization. We’re not going to snap our fingers and find ourselves in a brave new (virtual) world, we’re going to evolve to it. It’s almost certain that this evolution will come through the introduction of “enclaves” of new technology, which means that the overall model has to accommodate this. We need defined interfaces between the enclaves, and we need a mechanism to harmonize enclave-processes into network-processes at the management level.
This universal need for black-box harmonization doesn’t mean we don’t have to understand the value of the other groups under it. I think that the node group of virtual-network models offers the greatest ease of migration, but the smallest overall benefit. You save on capex by the difference between legacy device prices and new-node prices, and that’s really about it. I think the soft group, the group that represents both defining discrete functions rather than devices, and recognizing control/data separation, is really table stakes for network virtualization.
It’s interesting to note that the virtualization models in my critical-focus “soft group” include SDN and NFV, two of the early approaches to the problem. Given that neither of the two has swept legacy networking aside, it’s fair to ask how they can be as critical as I’m suggestion, or if they are, why they’ve not transformed networking already. I think the answer lies in that higher-level black-box approach. Google adopted SDN as the center of its content network, and they did so by making it a black box—surrounding SDN with a BGP edge that provided that critical enclave interface I mentioned above. SDN and NFV worked on the enclaves without working on the black box, and as a result they missed the broader issues in assembling networks from technology-specific pieces.
You could obviously add in some black-box-layer thinking to both SDN and NFV, and there has been some effort directed at doing that. It’s always hard to fit a fender into a car when you didn’t start with how fenders and cars related, though, and so there are aspects of both SDN and NFV that would probably have been done differently had the broader picture been considered. How much tuning might be required to achieve optimality now is difficult to say.
What isn’t difficult, in my view, is to say that the current industry practice of turning functional specifications into a box-centric architecture, as has been done with 5G, isn’t helping us achieve the most from new technologies in networking. It’s not that 5G specs demand boxes, but that the natural path from the diagrams in particular is to assume functional blocks equal devices. That specific problem hit NFV, and I think our priority with 5G in general, and open-model 5G in particular, should be to avoid that fate.