One problem that networks have posed from the first is how to optimize them. An optimum network, of course, is in the eye of the beholder; you have to have a standard you’re trying to meet to talk about optimization. Networks can be optimized by flow and by cost, and most experts have always believed that the same process could do both, and algorithms have evolved to provide network optimization since the dawn of the Internet.
One challenge with optimization is the time it takes to do it, particularly given that the state of a network isn’t static. Traffic uses resources, things fail, and errors get made. IP networks have generally been designed to “adapt” to conditions, something that involves “convergence” on a new topology or optimality goal. That takes time, during which networks might not only be sub-optimal, they might even fail to deliver some packets.
A new development (scientific paper here) seems to show promise in this area. Even the first of my references is hardly easy to understand, and the research paper itself is beyond almost everyone but a mathematician, so I won’t dwell on the details, but rather on the potential impacts.
Convergence time and flow/cost optimization accuracy are critical for networks. The bigger the network, and the more often condition changes impact cost/performance, the harder it is to come up with the best answer in time to respond to changes. This problem was the genesis for “software-defined networks” or SDN. SDN in its pure form advocates the replacement protocol exchanges between routers to find optimum routes (“adaptive routing”) by a centralized route management process (the SDN controller). Google’s core network is probably the largest deployment of SDN today.
It’s centralized route management that enables algorithmic responses to network conditions. Centralized management requires that you have a network map that shows nodes and trunks, and that you can determine the state of each of the elements in the map. If you can do that, then you can determine the optimum route map and distribute it to the nodes.
Obviously, we already have multiple strategies for defining optimum routes, and my first reference says that the new approach is really needed only for very large networks, implying that it’s not needed for current networks. There are reasons to agree with that, IMHO, but also reasons to question it.
The largest network we have today is the Internet, but the Internet is a network of networks and not a network in itself. Each Autonomous System (AS) has its own network, and each “peers” with others to exchange traffic. There are a limited number of peering points, and the optimization processes for the Internet work at multiple levels; (in simple terms) within an AS and between ASs. If we look at the way that public networks are built and regulated, it’s hard to see how “more” Internet usage would build the complexity of optimization all that much, and it’s hard to see how anyone would build a network large enough to need to new algorithm.
But…and this is surely a speculative “but”…recall that the need for optimization efficiency depends on both the size of the network and the pace of things that drive a need to re-optimize. The need will also depend on the extent to which network performance, meaning QoS, needs to be controlled. If you can accept a wide range of QoS parameter values, you can afford to wait a bit for an optimum route map. If you have very rigorous service SLAs, then you may need a result faster.
We know of things that likely need more tightly constrained QoS. IoT is one example, and the metaverse another. What we don’t know is whether any of these things actually represent any major network service opportunity, or whether the other things needed to realize these applications can be delivered. An application of networking is almost certainly a swirling mix of technologies, some of which are on the network and not in it. The failure to realize a connectivity/QoS goal could certainly kill an application, but just having the connectivity and QoS needed for a given application doesn’t mean that application will automatically meet its business case overall. We need more information before we could declare that QoS demands could justify a new way of optimizing network routes, but there are other potential drivers of a new optimization model.
Network faults, meaning the combination of node, trunk, and human-error problems, can drive a need to redefine a route map. If you had a very faulty network, it might make sense to worry more about how fast you could redraw your map, providing that there wasn’t such a high level of problems that no alternative routes were available. My intuition tells me that before you’d reach the point where existing route optimization algorithms didn’t work, you’d have no customers. I think we could scratch this potential driver.
There’s a related one that may be of more value. The reason why faults drive re-optimizing is that they change topology. Suppose we had dynamic topology changes created some other way? Satellite services based on low-orbit satellites, unlike services based on geostationary satellites, are likely to exhibit variable availability based on the position of all the satellites and the location of the sources and destinations of traffic. These new satellite options also often have variable QoS (latency in particular) depending on just how traffic hops based on current satellite positions relative to each other. Increased reliance on low-earth-orbit satellites could mean that better route optimization performance would be a benefit, particularly where specific QoS needs have to be met.
Then there’s security, and denial-of-service attacks in particular. Could such an attack be thwarted by changing the route map to isolate the sources? There’s no reliable data on just how many DoS attacks are active at a given time, but surely it’s a large number. However, the fact that there is no reliable data illustrates that we’d need to capture more information about DoS and security in order to make them a factor in justifying enhanced route optimization and control. Jury’s out here.
Where does this lead us? I think that anything that enhances our ability to create an optimum route map for a network is a good thing. I think that the new approach is likely to be adopted by companies who, like Google, rely already on SDN and central route/topology management. I don’t think that, as of this moment, it would be of enough benefit to be a wholesale driver of centralized SDN, though it would likely increase the number of use cases for it.
The biggest potential driver of a new route optimization model, though, is cloud/edge computing, and the reason goes back to changes in topology. You can change traffic patterns by network faults or changes, but a bigger source of change is the unification of network and hosting optimization. The biggest network, the Internet, is increasingly integrating with the biggest data center, the cloud. Edge computing (if it develops) will increase the number of hosting points radically; even without competitive overbuild my model says we could deploy over 40,000 new hosting points, and considering everything 100,000. Future applications will need to not only pick a path between the ever-referenced Point A and Point B, but they’ll also have to pick where one or both points are located. Could every transitory edge relationship end up generating a need for optimal routes? It’s possible. I’m going to blog more about this issue later this week.
The strategy this new optimization algorithm defines is surely complex, even obtuse, and it’s not going to have any impact until it’s productized. The need for it is still hazy and evolving, but I think it will develop, and it’s going to be interesting to see just what vendors and network operators move to take advantage of it. I’ll keep you posted.