Service Independence, AI, and OSS/BSS Evolution

One of the hottest topics in telecom netops is the role AI could/should play, and whether that role would have a decisive impact on profits. To answer the question, there’s a tendency to just think about what AI can do, but obviously we have to match capability against requirements in order to assess benefits. How do you build a service, as an operator? How do you manage it? These may seem to be simple questions, but the notion of service has evolved considerably, and you could argue that further evolution is essential to the telcos and cable companies, and to all who depend on them for connectivity. Thus, it’s essential to AI netops. Could it realize the goal Nokia set, a highly autonomous network? Could it transform, or perhaps save, the OSS/BSS role? Maybe.

The first network services were “embedded” in the sense that the service was created by a series of devices and trunks, and the devices were built around the properties desired for the services. We had this model in effect, arguably, up to the time of data networks. Think of this model as simply connecting boxes and letting them do what comes naturally, and it worked because all these early services were simply connections, physical paths that were there to exploit by the devices. Essentially, the features of the services were the features of the devices that connected them.

Data networks don’t simply create a physical path, they use the trunks to coordinate device behavior and signal for features, what’s called a “protocol”. The nodes in data networks actually started off as computer systems, and while specialized devices evolved quickly, they remained programmable, and the protocols used to create cooperative service behavior also evolved. For a time, the 1990s in particular, there were many different data network protocols in use, but in recent times these networks have converged on the IP protocol developed in the Internet, and broadened their mission from data to a platform for every form of service—real-time and stored content delivery, telephony, etc.

This evolution is really responsible for the challenges we have in service creation, deployment, and management. The cooperative behavior of devices is essential because any random set of devices might be involved in any specific service relationship, and any degradation, failure, or loss of synchrony/harmony will break the service. Management had to evolve from the early model of just fixing things that, when broken, created obvious symptoms, to fault isolation, reconfiguration, and remediation.

The traditional approach to service management is prescriptive, a term often used in DevOps to describe systems that control behavior by issuing a specific set of commands under a specified set of conditions. Network management systems of this type use a command language to control devices; the Simple Network Management Protocol (SNMP) is a standard language that influences the content of a management information base (MIB), which then controls the device.

The challenge for prescriptive management, in either the DevOps case or that of networking, is that the commands are complex and the need to retain overall device cooperation makes it all to easy to mess up one little thing that goes on to mess up everything. AI or ML are obvious response, and there have been many systems developed for netops using machine learning to build a rule set of what to do when a given thing happens. These work well for simple systems, but as the number of nodes/trunks/conditions increases, there’s a risk that all possible patterns of fault have not been recognized, and that a “new” combination might be mistaken for an old one that’s only a best-match to the rule.

The alternative, again from the DevOps world, is descriptive management, which just means that you describe not what to do but what your desired outcome (think “goal state”) is. Since you can describe a goal state across a complex set of devices and layers, this can significantly reduce errors and enhance overall network QoE and availability. It’s also the pathway to optimum use of AI or ML in netops.

The problem with descriptive management is that there has to be a translation analysis done between current state and goal state. What has to be done to make one into the other? In some systems, DevOps in particular, there are a specific number of identified fault states, and each is linked to a remediation strategy. This approach lets an operations team plan out what might happen (or, more often, has happened in the past) and a desired approach to restoration. If this is done manually, it poses the same risk of incomplete identification of fault states, but this could be mitigated by having an AI entity given the specifications for a valid operating state, and having it figure out what other states could exist, then identifying how to restore each to a valid state.

Something like this is easiest to do if you break up a network into segments that are highly interactive within and minimally so externally. In the Internet, think of a single autonomous system, and in an enterprise network perhaps of an Intermediate System, a LAN, etc. If AI can operate on each of these segments first, then overall as needed, the task is much easier to organize, validate, and monitor.

“Intent modeling” is a concept often cited as a key to this sort of segmentation. You consider each segment as a “black box” with properties that are visible from the outside but whose specific internals are not. AI can then manage against the external property goals and what it does to meet them is invisible, so you don’t have to worry about its impact on other parts of the network. Enterprises who have reported AI success within netops have over an 80% chance of using it in a segmented way, but only about 15% have specifically used intent modeling principles at this point.

Better yet would be the combining of AI with intent models and digital twins. If we assume that a network is a cooperative segment relationship, then each segment and the network itself could be represented as a digital twin. This would not only facilitate control of cooperative elements, it would also facilitate the ML process by letting the digital twins test out simulations of various states of the infrastructure, then evaluate whether any of the intent-model interactions were outside their SLAs. A good digital twin could replace experience or history, making accurate training relatively easy.

An AI revolution in netops is possible, I think, but only by framing it in a more organized way of making cooperative ecosystems of devices work toward a common objectives set. AI alone can’t provide that without vastly increasing the training difficulty and the risk, but intent modeling and digital twins can. I think a lesson here is to think of AI as we’d like to, as an alternative to human intervention, and recognize it needs the same sort of aids as a netops staff would if it’s to do its job optimally.

A side effect of this approach would be to formalize the separation of service management from network operations, with the former task being the creation of service templates. This, applying intent model principles, could connect an OSS to the “service management” interface of an element or service, allowing a customer service rep to examine service state without being able to diddle with network state. That could create a separate role for an OSS versus an NMS, and perhaps re-align the whole OSS/BSS mission.

Email and RSS:

Our Commitment: All the Facts, Always the Truth