OK, I admit that I have a significant interest (some might call it an obsession) with digital twin technology. I think it’s justified, though, because I truly believe that the next critical step in information technology is its integration with the real world, with our work and our lives. To make that happen, to make it even possible, demands we be able to synchronize IT with real-world systems, allowing IT to then influence things. That means digital twin technology.
Some of the real-time applications of digital twin technology may not seem immediately like real-world stuff, and one such application is networking. Networks, utility grids, and similar thing are obviously real-world, though, even if we focus not on the people who are involved but on the cooperative element behaviors that form their routine operation. In fact, we could argue that things like air corridors, highway systems, assembly lines, companies, and smart structures or cities, are too.
For cooperative systems like networks, a digital twin brings in the important notion of the whole, the mission, the context of things…the sort of thing that air traffic controllers call “getting the picture”. That’s the essential basis for cooperation, and you can argue (and I would) that without it you can’t understand or manage cooperative systems because the element behaviors influence the behavior, and behavioral goals, of other elements. Adaptive routing has to reflect this interdependence by spreading notices of change (topology updates) until the network has “converged” on a new state.
A lot of the things that have been cited as AI benefits in networking (see my blog HERE) could not be realized at all except as an AI cooperation with a digital twin of the network. In fact, digital twin technology in combination with traditional network operations tools and practices, could realize more of those benefits than AI alone could, and I’ve described my view of the process in the blog referenced in the last sentence. Thus, a network digital twin is a logical first step in network transformation.
There is vendor interest in network digital twins already. Ericsson has a nice primer on the topic, and so does Nokia. Forward Networks has an ebook with good insights as well, and NVIDIA did a piece on using one in IT training. Looking beyond vendors, there was an IEEE call for papers on the topic last year, and an often-cited IEEE paper on the topic was published in 2021. The IETF also did a concepts paper on network digital twins.
Despite the fact that this is hardly a new technology and that it already has some vendor support, I don’t as yet have any comments from enterprises or operators suggesting it’s in use. I did have a nice chat with a very savvy operator technologist on the topic, one that at least raised what a big operator sees as issues and opportunities.
The first point my savvy friend made was status synchronization of a network digital twin is a challenge, but what’s really difficult is doing something with what you learn. The value of the whole concept, in fact, is directly proportional to the granularity and speed at which you can exercise network control. The operator technologist said either SDN or extensive use of MPLS TE with explicit path selection is essential to actually leveraging a network digital twin.
The next point was a bit more complicated and perhaps subjective in application. Network complexity determines whether the goal is to define multiple alternate network states to be selected, or to analyze conditions to calculate an alternate state. A complex network, said the techhie, isn’t necessarily one that has a lot of trunks and nodes, its one that has strong interdependency factors. If a problem in one spot has a very limited scope of impact, then the network is not complex in the alternate state sense. If a new state is likely to have extensive impact, then the network is considered complex.
By way of example, the least complex network is one that is fully meshed with large-capacity trunks, and where there are no obvious external factors likely to create multiple simultaneous failures. In such a situation, you’d expect an alternate state would likely require only minimal reconfiguration. A complex network example is a network with few alternate paths between nodes, which would mean that any new routing would likely impact many nodes and trunks, and where external factors like power or weather could generate distributed simultaneous failures.
In the simplest networks, digital-twin technology may have little benefit, while in complex networks it may be essential to effective operation. As networks move from simple toward complex, the optimum use of digital twins evolves from simulation to predefine alternate states to dynamic reconfiguration. The simulation might be a place where AI would be used, and obviously AI could be used to support dynamic reconfiguration.
The final point is profound and often overlooked. It’s essential that the network digital twin remains in contact with the network. All of these proposed benefits depend on having the digital twin of the network properly synchronized with the network, which isn’t as straightforward as it might seem. How is the telemetry passed if there’s a failure, and how is control exercised. My contact points out that in the complex configurations where digital twinning would be most valuable as an operations tool, it is very likely that some event/status telemetry would be lost, and that management connectivity would likely be compromised.
Having the network fall back on adaptive connectivity is one possibility, but it delays things and risks having adaptive mechanisms undoing or at least competing with digital-twin control. Using broadcast mechanisms or an alternative channel (wireless is what my tech friend suggested) is better.
This last point is a potential issue for all digital twins. The worst thing that can happen to a digital twin is that it fails to mimic the state of the system it represents, creating what in AI would be called a “hallucination”. The second-worst thing would be to be unable to manipulate the state of the real world as intended. In any digital twin system, but in particular systems intended to sustain communication, we’ll need a reliable way to prevent this disconnect problem.