One of the applications of AI that CIOs believe in is the use of AI to track problems, meaning management and operations. The majority of the applications CIOs have presented to me are in the networking area rather than IT, meaning servers and software, and that may be because Juniper and Cisco (long-standing rivals) have been jousting in the network AI space for years. With the proposed acquisition of Juniper by HPE, the question of the role of, and scope of, AI in network operations and even IT operations is perhaps coming to the fore.
While netops AI has made the business case a significant majority of the times, the projects and applications are still relatively rare. Less than 15% of enterprises say they use AI in a netops mission, despite the fact that over 80% say that they believe one of the most valuable roles for AI is in operations, and especially problem determination. Why then isn’t there more actual project activity? Two main reasons, say enterprises.
One reason may be vendor commitment to the concept. Only a fifth of enterprises overall say that their network vendor encourages AI-based netops, but that makes up almost all of the enterprises who have actually adopted it. It appears to me that if a dominant network vendor promotes AI netops, it’s likely to be justified. However, a quarter of enterprises have turned down their vendor’s AI push, and while half of the enterprises report using a vendor who offers the capability, most of these say the vendor isn’t pushing AI netops.
That may be due to the second reason, which is a problem of scope. Operations missions relating to problem determination and correction are hampered by any gaps in “observability”, meaning the availability of data from all the elements that contribute to QoE. Gaps can be created by a product that isn’t providing the right (or any) telemetry for AI analysis, by limitations in what AI can absorb as information, or by a broader problem with the scope of QoE overall.
Most vendor AI isn’t great in multi-vendor situations, just as enterprises are increasingly likely to find network management tools tend to be best in single-vendor deployments. Management is increasingly likely to be a differentiator for vendors, though it’s also true that vendors who inhibit multi-vendor features may have a harder time breaking into a new customer network. According to both vendors and enterprises, though, the current mindset is more defensive, meaning that it’s more important to use management exclusivity as a means of discouraging excursions into competitors’ products than to use inclusiveness as a tool to encroach in another vendor’s territory. That’s because only 3% of enterprises (which flirts with statistical insignificance) say that they’d change vendors for management reasons alone. Surprisingly, it’s almost as common for users to report a vendor observability suite misses some of their own products as it is to find they won’t support products from another vendor.
A bigger observability gap can be created by the use of VPNs and managed services, and by any hosted elements of network features or services. Applications and their hosting also contribute to QoE issues, and for the users having the experience it’s often difficult to tell whether the problem lies in the application/hosting, the network, or the user’s own devices. Since all of these elements tend to report different parameters in different ways, it makes it difficult to collect information in a way that allows an AI tool to handle it all.
Of over three hundred enterprises who have commented on observability and the potential for AI, only 23 had actually done a lot of development work on their own, or at least laid out strategies. There is a narrow consensus among these enterprises (18) that what’s missing is the notion of “hierarchy” of the sort we heard about with “intent modeling”, though enterprises are more likely to describe what they think is needed rather than use the term.
Application infrastructure, they say, is made up of a number of cooperative systems, each made up of a series of closely related elements and managed through a common tool and process set, usually one provided by the primary vendor. They believe that each of these systems should be managed at the primary level in that independent way that best suits the technology make-up. The goal state of each system, the state that capacity and experience planning has set, is what the management systems try to achieve.
According to these users, each system should assert a “system management API” that reports changes in the state of the system, meaning any alteration from the goal state or return to it from another (presumably impaired) state. Developers in the group cite the example of the “observer” design pattern, which reports changes of state to interested higher-level elements.
Where does AI come in? According to this group of users, it could come in in two places. First, AI could be an element in the higher-level management process that explores system state changes as reported through the system management API. Second, it could be a part of the system-level management for one or more systems.
The reason for this, say enterprises, is that it’s rare that management analysis and intervention would spread across system boundaries, except when something happened that meant a system was down and could not be restored through in-system action. They see an attempt to have AI collect all data for analysis as something that runs up AI cost and complexity without offering a realistic benefit. Having AI work within a system means the number of elements and the observability issues are manageable.
At the high level, the top management element might not even need AI under these conditions, and AI could likely be more lightweight because it’s dealing with fewer variables; only state changes are reported. I’ve done intent modeling in a prior project and it proved to my satisfaction that the majority of high-level management analysis could be handled by simple state/event logic of some sort (tables, graphs).
What these enterprises concluded was interesting. AI, they said, is best for operations where there’s a surplus of complexity and a lack of hierarchy or subdivision. If you have complex networks but the complexity is divided, it’s likely possible to use more traditional management tools for each subdivision, and then again at the top. Where that’s not possible, then AI is a good way to ease burdens. This fits how some vendors have approached AI, like Juniper, but it may call the “full stack observability” approach into question.
Network operators have a different view, largely because they have highly complex networks that aren’t easily subdivided. The sheer scale of infrastructure makes any operations center wonder if it can keep up. However, despite the fact that operators agree they’re more likely to benefit from AI in netops, they don’t seem to be jumping into it any faster than enterprises. The issue for operators, as they admit, is the very complexity that justifies AI netops in the first place. It appears that when your autonomous systems within your network are too complex, vendors seem less likely to be offering useful AI tools. That means that, for operators, there may be a “complexity sweet spot” between netops missions too simple to require AI, and those too complex to make even vendors confident.
Vendors like Nokia may have to aim for that sweet spot to make a success of AI in the telco world, but I think they’ll also have to pull in the notion of system domains, hierarchy, and intent modeling to accommodate the range of applications operators may have, and the variety of equipment they deploy. If they can do that, they can advance their own plans, and help operators too.