As I’ve said a number of times recently, enterprises are still in the throes of learning how to deploy their own AI. The process is complicated because of shifts in the AI model technology (smaller, less demanding models), the ways AI can be trained (foundation models or “bare” models), the framework for use (hosted, hybrid, service), the application approach (chat or one of three agent models), and how current data can be integrated (RAG, MCP, and other proposals). Not to mention a general lack of good, unbiased, tutorial material. The number of enterprises with real-world experience is growing, though, and so are their comments.
Of 74 enterprises who offered me insight into their own AI hosting experiences, 58 were hosting a single cluster of AI servers, and 16 were hosting multiple, distributed, AI servers. The former group has gained only four adherents in the second half of this year, and the latter doubled in size. In the former group, 45 had started with a single sales/support chatbot app aimed at lowering the cost of human-agent support and improving accuracy. Today, 31 of the single-cluster players stayed exclusively with that mission but the remainder of the 45 had added other (AI agent) applications. The average number of servers/GPUs in a cluster was 86, the maximum was 256, and the minimum 16.
The distributed approach has generated a different deployment. The largest number of servers/GPUs deployed in a concentration was 6, the average maximum cluster was a bit over 2. Agent AI hosting tended to augment current server clusters with some specialized AI and were tightly coupled to the servers that ran the applications and connected the data that the agent tools used. AI agent elements tended to deploy based on “local” business cases and so there was little or no interest in sharing AI resources among them. The 58 cluster-centric enterprises built AI resource pools, and the other 16 did not. If we see AI agents dominating the missions for self-hosting of AI, I’d expect to see the percentage of cluster-centric deployments fall. If we see a greater focus on price-sensitive GPU chips, I’d expect that de-clustering to be more likely.
The question of cluster-or-distributed has major impacts on both the AI/application platform and on the network. Distributed AI tends to pull hosting toward the same technologies used for the applications related to (in workflow or data sources) the AI agents, both in terms of networking and orchestration. Cluster AI tends to create an AI enclave, with its own focused networking sometimes even with its own databases.
The biggest problem that the distributed-AI model users cite is scalability. They tend to build their mini-clusters with a notion of the scale of resources needed, but often (particularly for those who jumped in early) find that their usage forecasts were flawed. Since each AI agent mission tended to have its own business case and resource set, a need to expand the resources would often stress the “I” side of their ROI calculation considerably. Of the 16 distributed users, four said they had already started to do some mini-cluster-sharing to accommodate scalability needs, mostly where the AI agents served related missions and used some common data sources.
There are some common issues for the two groups, too. One is the interaction among the components of an AI agent application, meaning any core application code, databases, and network connections. The problem is that GPUs and AI hosting resources overall are expensive, so you’d like to be able to optimize them. It’s difficult to really know how AI applications impact AI resources at best, and even more difficult to understand how other related resources might impact, or be impacted by, AI applications. Congestion at any point causes delay, and delay can stall an application that’s sucking up GPU time. If you assume completely distributed, application-specific, AI agent hosting, these issues are minimized, but even there they’re not eliminated.
You can see this in the attitudes of the distributed agent hosting group. As I noted, so far only a quarter of the distributed hosting group have started to pool hosting, and those have kept the pools confined to highly related applications that share database resources. Fewer than half the group even say they’re considering pooling, and most of those who are not say that they’re concerned about issues of congestion and latency impacting both performance/QoE and resource efficiency, meaning cost.
The interesting thing here, to me, is that while AI hosting is often seen as the creation of shared resource pools, there’s a lot less enterprise literacy on just how that’s done, managed, and even costed out than there is when traditional software is used. Roughly two thirds of the cluster group uses Kubernetes for both missions, but of 311 enterprises who commented on the issue of GPU pool management (most of which didn’t yet do it), only a quarter said that Kubernetes would likely be their preferred approach, and most of this group said they’d need to study the issue as a part of making their self-hosting business case. Of 152 who said they’d gotten presentations on AI resource pools from vendors, 78 mentioned hearing about specific issues on deployment, orchestration, and management. Did they miss that part, or wasn’t it presented?
I think a big part of this is due to the bias in publicity on AI that arises from “click fever”. What sites want is ad revenue, which means clicks, which means stories that more people want to click on. Given a choice between a story about how AI will kill you, or steal your job, and one about how to orchestrate GPU clusters, which do you think most people would click on? Even publications that are more insightful and target IT professionals have not pushed the issues of self-hosting, and their resolutions, as much as you might expect.
Why is this a problem? Because if enterprises have to get the details on self-hosting that makes them comfortable, confident, in proposing a project, they may not get to the point of inviting in vendors for a talk. Sure, they could go to vendor websites, but indulge me with an experiment. Go to the website of a vendor who offers AI server products and see if you can find details on orchestration and management. Marketing has to prepare “suspects” and turn them into prospects, which sales can then turn into customers. We seem to be missing that step here.
It’s not just enterprises who are missing things, either. I got 47 telco comments on AI hosting, and almost two-thirds said that they weren’t confident about the best strategy (cluster or distributed) or about how to deploy, orchestrate, and optimize AI resources. Is it just me, or are we missing something that should be fundamental?
