In the last week, I’ve had a lot of comments from both operators and vendors regarding the issue of function hosting. It’s pretty clear that networks are evolving to a state where functionality is distributed between custom devices (like routers), dedicated devices (white boxes), and general-purpose hosts (servers). Some have suggested that we’re heading for server-based networks, period. Others think that what’s actually going to happen is a division of “the network” into a distinct control/data segregation that will leave us with custom devices in the data plane and servers in the control plane. Where are we heading, and why?
For decades, hardware engineers have been telling me that there’s a simple rule in designing any device, which is that the more specialized its functions, the more specialized you’ll likely want to make its chips. We see this literally every day when we look at our computer screens; the display on higher-end systems uses a GPU rather than a traditional CPU, and specialized chips are also used for AI (often the same GPUs). The important point to be gained from this is that those network tasks that are limited and specialized are very likely to be performed by specialized hardware, not general-purpose servers.
The data plane of a network is made up of these kinds of tasks, so it’s my view that what we can expect to see in networks starts with the separation of the data plane function, the packet-pushing, from “feature behaviors”. The data plane then evolves to support high throughput, low latency, and high availability. This is quite close to the model that was suggested for SDN, where we tend to separate packet forwarding from handling control packets, passing the latter to a central controller. We’ll get to the limitations of that, but for now let’s just say that the bottom layer of the network is a specialized packet separation and forwarding task. Data packets are forwarded along the bottom layer, and control/feature packets are sent upward to another layer.
Obviously, we then have to ask “What layer?” Is there a single overall controller, as there is with classic SDN, is there a feature hosting pool like that NFV seems to define? Rather than simply accepting a historical model (particularly one that’s not all that widely implemented and so doesn’t need to be preserved for financial reasons), let’s try to make a logical decision.
Current IP networks actually have multiple types of “control packets”. Some of these are related to the feature coordination among adjacent devices, some have broader scope, and some are even end to end in nature. For the latter, there’s also the point of just what the “end” is. Traffic management would typically focus on endpoints as source/destination pairs, but a lot of our current focus in network standards and design is on the feature of “experience networks” like content delivery, mobility management, and so forth.
SDN is able to tolerate latency associated with central control because it has central control; because topology and routing are maintained in one spot, there is less adjacent-node exchange going on. If we wanted a general solution, we’d need to put adjacent-node control local to the nodes. This would also make stuff like “ping” processing more accurate in its ability to reflect actual transit delay. That doesn’t necessarily mean that you’d need a more general processor in the packet devices; you could use a separate chip for control-plane handling or even have a server with a rack of white boxes.
I’ve noted in past blogs that we tend to use the term “control plane” in a perhaps-too-general way. IP itself has a control plane, and so does 5G and other “overlay” services and their features. In nearly all cases where a control plane involves regular packet exchanges, there’s a mechanism to detect packet or link loss that involves running a timer and resetting the link if it times out. The timeout period establishes a limit on the amount of latency that can be tolerated in connection with control exchanges; too much and you can end up with a link reset just because of the latency.
Latency is the enemy of most real-time exchanges, because as it increases you create a growing period of uncertainty related to the fact that changes in conditions aren’t immediately propagated to all the parties. As this applies to feature hosting, it means that if it’s possible to design the protocol for an expected latency, a fairly large one might be tolerated if the end users of the features don’t find their experience impacted. For this class of application, and for the features associated with the same level of tolerance, this means that we’ll see emphasis shift.
Since latency is manageable here, we can presume the control protocols would manage it (TCP does, for example). We can also assume, since we’re out of the data plane, that the application could be handled with a traditional or RISC CPU. Thus, the question with these applications/features is more one of resource efficiency. Hosting at the extreme edge poses the risk that there is nothing out there to share resources with, in which case it might be best to host the features on whatever device is required to push packets there. If you pull inward, toward the core, you have more users/applications/features to draw on to share hosting resources, and thus you’ll get more efficient hosting.
How deep can you go? This is a balance of forces, in my view. As you get deeper, the gains in efficiency decline because hosting efficiency rises not in a linear way but according to an Erlang curve, which plateaus sharply. I’ve tried to model this out, and it appears (note emphasis) that you reach optimum efficiency at roughly the level of a metro area. There does not appear to be much benefit to trying to spread resource sharing for a given feature beyond that level. This doesn’t mean that some applications might not benefit significantly from a very wide resource pool, but this is almost always related to a distribution of points of access. Gaming or metaversing are applications that could be spread widely, but that’s because what they’re engaging with is widely distributed, not because it’s more efficient to draw on distant resources.
What we can infer from this is that above the data plane we have what might be three layers. The lowest layer is what I’ll call “adjacent control plane”, and this is where things like the IP control plane resides. Above that is the “distributed control plane”, and this is where broader-ranged features like those of 5G reside. At the top is the “experience feature plane”, which is where experience hosting takes place. As we move upward, we move from an area dominated by latency issues to one dominated by hosting efficiency.
One size, or one strategy, doesn’t fit all in network function/feature hosting. That’s the key point to remember, and so it’s dangerous to plan for or even think about feature hosting without looking at this structure and the issues that create it.