More on AI Agents and NFV Mistakes

A bit ago, I commented on a LinkedIn post that asked whether we are repeating, with AI agents, the mistakes of NFV. My comment was “The issues here are deeper, I think. The essential problem with NFV was that operations is inherently an event-driven, stateful, process and the architecture mandated by the NFV ISG wasn’t that. Agentic AI is in a sense a component in an application, and in order to frame its architecture you have to decide what the overall application model is. I think that we should assume any network mission for AI is stateful and event-driven, in which case we need to decide how the agent interactions can operate within that framework.” I got some emails asking me to expand on that comment in a blog, so here goes.

What we’d call network, device, or service management or operations depends on controlling a complex system of discrete elements that have to cooperate to serve a common mission. These elements each have their own state or set of internal conditions, and each will “see” to varying degrees the state of other elements in the system. In order to make this process work, whether it’s handling a protocol or managing overall behavior, you need to get a notification of any changes in state to everything that needs to know it, including any overall management/operations system. This is what I mean by “event-driven” and “stateful”.

In my career, I’ve designed and/or implemented dozens of state/event systems. The classic way of doing this is to create a “state/event table”, which defines, for each state/event combination, the process to be run when it’s detected and the “next state” to be set. This, of course, requires that you identify the discrete states and events first. Another more modern approach is to create a graph with the “nodes” representing states, and lines representing the response to events, which the destination of the line being the next state. I personally find the tabular approach better if there are a lot of states and events, but that bias might be based on my personal experiences with more tables than graphs.

When I got involved in NFV, attending a meeting in the Valley in the spring of 2013, I spoke several times to advocate two things. One was an NFV model that was a subset of current cloud-computing state of the art, and the other was an explicit state/event structure. People collected around me on breaks to talk about this, and a group of them became the foundation for the CloudNFV concept that formed the first proof of concept approved by the NFV ISG. The problem was that the “end-to-end architecture” of NFV that emerged from ISG activity (the source of things like MANO, VNFs and PNFs, and so forth) didn’t embrace this at all. In fact, it laid out what on paper looks like a pipeline approach.

The cloud/state/event approach would have allowed all the elements of NFV to be essentially “stateless functions” that kept state/event information for every NFV service in a database, and copies of any function could be loaded and run as needed, which made everything resilient and scalable. The ISG model was neither; for example, there was a Management and Orchestration (MANO) element that was centralized. To my view, this made NFV implementation brittle and inefficient, and I think doing it the right way could have made NFV a success.

AI agents that play a role in an operations mission, or that form a part of application/service logic, should be viewed in the cloud/state/event way, too, for the same reasons. Why? Because it’s inevitable that if you divide AI functionality into specialty agents, you’ll create applications that unite them in a common mission, requiring that they be linked to each other and to non-AI elements. In fact, the development of A2A, which means “Agent to Agent” APIs is proof of this. However, it’s also a trap if A2A binds the application/service workflows the way that the NFV model bound service operations.

The issue here is that while there are surely AI and even AI agent applications that don’t require stateful behavior, there are surely those that do. I offer, for example, the fact that all network and IT operations is inherently stateful, and that in fact anything that applies AI agents to real-world process control is stateful. We can do a “do my income tax” agent without stateful behavior, but not an agent to optimize the storage of goods for sale, because the latter has to accommodate asynchronous stocking and sales to and from multiple sites, and thus requires real-time state management.

So, you may well ask, why not build A2A into a centralized state/event process, making that an intermediary steering element between agents using A2A? The reason is that for this to work, you’d have to write the AI agent to be used that way, to receive a data element representing current state and event data and to then do some fairly atomic thing. In particular, you could not have the agent do something that referenced or changed other parts of the overall state/event data internally, meaning that it bypassed the state/event logic. In our example of storage-of-goods optimization, you need a single source of truth regarding each point of storage and each type of goods, so having multiple agents potentially diddling with it can’t be allowed.

Introducing events into an AI model is a challenge, unless the model is triggered by the events. Making an AI model analyze a state/event system can also be challenging, because while you could use something like MCP to integrate external data sources, you don’t want AI to be polling for real-time status, so event recognition may be critical. Yes, that means you’d still need an API, but it likely would also mean you’d need to design not only each agent but the whole agent-to-whatever architecture to permit facile processing.

I’m not saying A2A or AI agents are bad, just that they have to fit with a model of the overall application in general, and in particular they have to support real-world, real-time, state/event systems. I think you could fit A2A into that, but I don’t think it would be the ideal vehicle. My own implementations of the state/event approach relied on having a database element that represented the parameter values and state information for a given cooperative mission (a “service”, an “application”), and an event element that carried all the data the event needed for processing. The AI agent might reference other stuff, but to make this work, that stuff should never be something that could be updated both by the agent under discussion and elsewhere; that should only be done to data in the database that represented the cooperative mission.

An AI agent is a component of a multi-component process, no matter what it does. Like any component, it has to fit the process, and not just in functionality but in terms of implementation/architecture. NFV didn’t define how that could be done, and AI agents have to avoid that mistake to be useful to the network operators.

Email and RSS:

Our Commitment: All the Facts, Always the Truth