Is there one good definition for disaggregation, in the context of networking? I’ve said in a number of my blogs that “commercial disaggregation”, the separation of software and hardware so you can charge an annual license fee for the latter, isn’t disaggregation. I’ve also suggested that just having software and hardware separated so you can replace a monolithic proprietary router with a monolithic open router, isn’t really disaggregation either, though it may be useful (as we’ll get to below).
What can we use as a definition? Maybe we have to look at the challenge in a different way, and maybe that will let us draw a picture of what disaggregation should be rather than what it’s claimed to be.
The value of “disaggregating” traditional network devices has to come not from how we take things apart, but how we put them back together. Properly done, disaggregation builds a bridge between “hosted” and “white-box” technology, between cloud-native and monolithic. It could even bridge “connection services” like VPNs and so forth, and OTT services like content. In short, it could move us forward significantly on the path to a new network model.
The most concrete articulation we have of a new network model based on disaggregation is SDN. SDN’s founding principle is to separate the control plane and forwarding plane, with the latter implemented in simple commodity white boxes and the former centralized in a master control point, likely redundant. The process of figuring out routes is largely left to implementation; the controller decides what forwarding entries to send to the white boxes, and so decides routes. Likely, the controller would base the decision on information gathered from the boxes themselves, information reflecting trunk state and load.
SDN has gained favor within the data center, but not so much in the WAN. Part of the problem is concern over controller faults, and part over the question of how you’d regain control of a device, or an entire network, if the controller lost connection to it. A maverick route could cut off pieces of the network. In any event, there’s not been a significant amount of work done on large-scale WAN SDN, but SDN offers an intriguing step that warrants more consideration.
If you separate the control plane, as SDN does, would you have to centralize route management? It seems like this is at one extreme of a wide range of options, the other being a fully distributed control plane with one element per white-box, but running in the cloud rather than on the devices. It also seems like the IP control plane shouldn’t be considered the only option to control forwarding, and in fact that multiple options might share even a single white box.
White boxes could also be re-aggregated. NFV says that a virtual device could be created by chaining hosted VNFs. Why wouldn’t it be possible to create virtual devices by aggregating white boxes? A virtual device is handy as a means of simplifying operations. Think about being able to manage all the ports and trunks at a particular site as a single router. Why not, given that the stuff all terminates in the same place? You could even virtualize multiple sites and treat them as a single device. Why? Think about the relationship between OSS/BSS and NMS. Many operators have a means of reflecting the network back into the OSS, but why reflect a bunch of separate routers when the service treats the network as a whole as an asset?
Of course, you could disaggregate and re-aggregate below the box level too. Think of having multiple virtual routers sharing a single box, or sharing a collection of re-aggregated boxes being treated as a single virtual device. We could combine this with the multiplicity of control planes and create something that was both an Ethernet switch and a router, constructed extemporaneously from whatever ports/trunks were appropriate.
You can also create virtual devices that embody features beyond simple routing, which is how you’d be able to exploit the ability of a truly disaggregated solution to rise above Level 3. CDN features that allow the selection of the optimum cache point for a video URL, rather than resolving to a fixed location, are one example. 5G UPF implementations, with their tunnel management, are another.
The virtualization-centric notion of disaggregation and re-aggregation offers an opportunity to build network node functionality and network management visibility almost orthogonally. What the network does and what it looks like don’t have to be congruent. That’s a highly useful concept when it comes to both service features and service lifecycle automation.
In this model, any feature related to connectivity could be absorbed into an integrated control plane, and assigned on a per-service or even per-user basis. VPNs could be maintained in total isolation or supported the way they are today, via a common virtual-router infrastructure. Same with CDNs and 5G, including each network slice. It’s these properties that make this networking model so well-adapted to the network-as-a-service approach.
There are a couple of obvious questions about this, of course. The first one is whether we can know anything about the specific architecture needed to support this model, and the second is whether anyone is doing much to support the architecture. I’ve already noted the vendor progress toward the goal of network-as-a-service, the best current indicator for progress toward fully realizing disaggregation benefits, HERE. Now, let’s look at the specific things that would realize disaggregation potential.
The first thing you need in an architecture for this networking model is a fully abstracted forwarding plane with partition/slicing capability. You need a single language to control forwarding so that it’s easy to write forwarding-control applications for the separate control plane. You also need the ability to make a static assignment of some ports and trunks to a given application, or allow applications to share control of ports/trunks via a “mediator” function that resolves any conflicts. The “language” of forwarding at the top of this abstraction needs to be consistent, but it could resolve to different chip- or chassis-level languages below.
The second thing you need is a low-latency, high-availability, “service mesh” to link the control plane elements with each other, and with the forwarding plane. Separating the control plane creates a general risk of loss of control if the connection with the forwarding plane is lost. The problem of latency also creeps in, creating a risk of loss of context for control-plane decisions because of a lack of current state data from the control plane. There’s also an increased risk of collision of instructions created when there’s latency between control and forwarding. All these risks increase if the control plane is extended either geographically or functionally, meaning wider deployment or a larger number of higher-layer service features.
I’m putting the term “service mesh” in quotes here because it’s not clear just how much of a full-feature implementation of a service mesh would be necessary for this mission, and what the tradeoffs would be relative to latency. Traditional service meshes add considerable latency, and that might make them unsuitable for connecting control-plane elements with the forwarding plane, or each other.
The third thing needed is an authoritative, singular, model of the network or network domain. You can’t have multiple control planes creating their own images of the state of the network and have any hope of either efficient operation or control over collisions in commands. Think of this as “state-as-a-service”, something that any control-plane element can access to determine conditions. It’s likely that the same singular model would be used to mediate access to forwarding elements where shared access is provided.
The final thing we need is an intent-model-based framework for managing our virtual views of features and services. If you’re going to have a network whose virtual devices are set by aggregating any number of real elements and features, you need to have an elastic management vision. The key to that is to provide for a hierarchy of virtual devices, each of which composes the higher-level management view, and each of which conforms to its implicit or explicit SLA through internal management. This same capability would allow one management jurisdiction to do the “inside” management and another the “outside”.
NaaS figures into this because it’s a service-level representation of capabilities that could be derived from exploiting disaggregation. While I think all truly disaggregated networks would be NaaS-capable, not all NaaS solutions would derive from disaggregation. I think monolithic router vendors could deliver NaaS without disaggregating, for example. What this means is that disaggregation may have to be justified at each intermediate step, because it’s not the only way to a truly revolutionary service model future. The most interesting question for 2021 will be who might find a way to do that.