We’ve looked this week at the potential paths for AI agent payoffs, for real business cases. We’ve looked at the forces enterprises say are really behind any changes in network technology or spending. Lets now look at the specific goals and technologies the drivers lead to, in order to make those prospective business cases. Where will these drivers, notably AI agents, take networks?
We need to get a point out of the way here, which is why I’m focusing on AI agent self-hosting and not on cloud-hosted agents. The reason is that enterprises have only limited use for the AI services offered by the giants, because of concerns about data security. Yes, they could use agents for missions that didn’t involve business-critical data, or any of their own data at all, but these agents would be, they say, nothing but renamed retreads of AI chatbots. Aimed at generalized productivity enhancement, they offer relatively little benefit, and because they can’t use that business-critical data that enterprises say make business cases, they don’t generate any significant WAN traffic.
OK, onward to self-hosted agents. Every enterprise starts AI network planning with the data center. The first, and perhaps most important point, in enterprise AI thinking is that AI and traditional data center servers and storage will be co-mingled in any realistic deployment, and that means to them that Ethernet is the vehicle to support AI networking, period. Some enterprises say that they don’t think that enterprise clusters of AI would be large enough to justify InfiniBand because they plan to deploy multiple clusters if they have a lot of independent agents to deploy rather than a single giant one, but most simply say that Ethernet is the answer.
The reason for that lies in the three models of AI agent operation enterprises recognize, and how they view their potential for making an AI business case. Of the interactive, embedded, and workflow models of agent operation, enterprises think that the workflow model, then the embedded model, are the ones most likely to drive longer-term AI deployments. Both these models are expected to be tightly coupled to existing applications (embedded ones are inside them) and to corporate data repositories. Where current application hosting and data repositories are grouped by class of application, that would result in multiple agent clusters. Since those non-AI elements are almost certainly Ethernet-connected today, and the same sort of traffic that AI will generate is carried among them, Ethernet is the answer, period.
What changes with AI agents, say enterprises, is the scope of data involved. They see AI as offering a broader more “business-contextual” insight-creation capability than traditional software components, and they believe that breadth comes about in large part through AI’s ability to link in related information that traditional applications did not access. As a result, AI agents would not only increase the volume of horizontal traffic, but also the breadth of the elements being accessed. Previously independent application hosting clusters would then likely be more connected. To enterprises, horizontal traffic in general would require a change in data center network topology, and AI agents would require a radical change. Think of meshing as the top data-center networking priority for the AI agent age.
Almost everyone who’s deployed multiple AI servers knows that meshing within an AI cluster is absolutely critical because you can’t afford to add latency to the model’s internal data paths, for fear of slowing a final result to the point where it impacts the mission overall. What enterprises say is that horizontal traffic in the data center has, for decades, influenced them to think in terms of creating more paths among data center switches, eliminating the old bridging model that really preferenced vertical traffic. With AI, this gradual evolution to more meshing is expected to become a sudden thrust toward something much more like a fabric, an any-to-any design.
Enterprises also say that the shift from a vertical traffic aggregation model to a fabric will have to be accompanied by other changes to accommodate the intermingling of AI agents and software components. Current software moves transactions and events via a “service mesh” (Istio), “service bus” or “service broker”, but this is too much overhead, enterprises say. Even at the software level, tighter coupling is needed.
Fabric interconnection is therefore likely to be accompanied by a shift toward RoCE (RDMA over Converged Ethernet, where “RDMA” is “Remote Direct Memory Access”) as a means of lowering operating system and middleware overhead in interconnections. All of this requires Ethernet’s priority flow control and congestion notification features, which are more and more common anyway due (again) to the growth in horizontal traffic, which is typically more latency-dependent and availability-dependent than vertical traffic anyway.
The RoCE/RDMA piece may be the most significant piece of all of this, because it encourages a cluster of servers to be viewed as a common shared-memory, multiprocessor, system. Some enterprises say this is already influencing operating system and middleware selection, for AI and for any systems that are linked to AI for workflow integration or data exchange. A middleware model to optimize this sort of connectivity is unknown to most enterprises, but increasingly sought.
The next question enterprises are working to address on the AI networking front is the issue of distributed AI. Distributed AI, to enterprises, means AI models running symbiotically but not as a single model. Many AI missions, including nearly all those associated with real-time applications, involve a world model that enterprises believe is likely to be a model-of-models, creating a hierarchy of processing that will increasingly involve a hierarchy of AI elements. Just as a smart city is a model made up of smart building models, which in turn are likely smart-office-suite models and so forth, any real-time system can best use AI if limited local AI models handle events with very short response time requirements, and pass off events that can wait a bit to larger models designed to host AI at a better economy of scale.
This question is proving difficult for enterprises because little is being published online about the network relationship among distributed AI models. Enterprises’ own view is that this relationship would be created by the same means as used to link elements of distributed real-time processes today, which is event exchange. This presumes that the model-of-models approach to distribution, which is intuitively accepted by enterprises, is the best approach. A few (single-digit percentage) of enterprise users of AI agents recognize that if you assume the RDMA approach to interconnection within a cluster, it is not unreasonable to assume that a shared-memory approach might be extended beyond a cluster, which could mean high-capacity, low-latency paths for token movement in the WAN. Since most enterprises don’t see this approach, and those considering it admit it’s not proven out at this point, I’m inclined to think that token exchanges over the WAN isn’t a big opportunity.
That raises the ultimate question, which is the impact of AI agent use on the WAN. Distributed AI, as I just noted, isn’t seen by enterprises as having a big impact, because they dismiss the notion that it would have to be based on tightly coupled model hosting points. Here, again, I’m seeing enterprises drawing conclusions by associating AI agents with software components. Adding a component to a workflow impacts the user-to-application relationship only if the addition changes application functionality to the point where the data relationship with the user is altered. Enterprises at this point don’t see AI agents doing that; the data appetite of AI agents answering questions might be changed, but the impact on the size of the questions and answers, not so much.
Here I have a slight disagreement, but mostly due to what might be an increase in data gathering rather than user/application exchanges. I believe that real-time AI will require the analysis of video in order for world models to gather information about the real world. Would that analysis be purely local? Would responses to real-time conditions have to be returned in video form? How much latency could be tolerated there? I think there is potential for WAN impact here, but probably not for at least three years.
The perception you’d get from listening to vendors, AI and networking, is that a revolution is imminent. I think a revolution is coming, driven in part by AI but mostly by what AI would do, and mostly related to its ability to spread IT empowerment to real-time applications in a new, more intense, way. Imminent? Not according to enterprises, and I agree. What hype wave has ever come along to promise delayed gratification? No, it has to be immediate to be effective, and we’ve accelerated nearly every trend, even those with little chance of coming about at all, into current-day timelines. Well, we’ll see if this is different, but I don’t think so. Expect big changes, but not right away.
