Elon Musk isn’t everyone’s idea of the ideal pundit, particularly for AI topics, but sometimes he says something that has a kernel of merit, or is at least worthy of examining. Take this comment on X, for example: “Devices will just be edge nodes for AI inference, as bandwidth limitations prevent everything being done server-side.” On the surface, this could be taken as downplaying the future of edge devices, or it could be saying that edge devices will play a larger role. The interesting thing is that both could be true, not as alternatives but at the same time. At the very least, Musk’s “just be edge nodes” is wrong, because AI edge functions are part of the most critical evolution we’re likely to see in AI.
What we see as AI today is a curious mix. On one hand, the most visible element of AI is the enormous-data-complex LLM generative chatbot service that most of us use daily (whether we want to or not). On the other hand, AI is increasingly built into devices like phones, playing a hidden and much more limited role, but one that’s expanding rapidly as device refresh cycles get new AI-centric phones and PCs out there. Of course, all the this-and-that-hand stuff begs the question of what’s between, which is actually where the future of AI is going to be found. That nobody much is talking about it doesn’t surprise the group that’s most interested in it, the enterprise IT professional.
Enterprises, as opposed to people, have always shared their AI skepticism with me. From the first, they saw AI not as the giant chatbot or the photo-enhancing chip, but as being valuable in the context of existing business applications, particularly business analytics. When AI was embedded in these applications, or integrally coupled with them, it was possible to make a business case. That was true even though business-coupled AI is like cloud computing in that it has to somehow preserve data security and sovereignty. That means it’s more likely to require self-hosting, which means an investment in hosting, training, and sustaining it.
But cloud computing, while not the place “everything is moving”, is still a powerful tool. So is even a hosted form of AI. It’s just that it needs to be used judiciously, meaning that if we assume that AI will be as widespread (or even more widespread) than traditional applications, it will have to be first fragmented by mission and then distributed according to the same economic principles that govern cloud justification.
According to a recently announced MIT study, 95% of AI pilot projects fail to lead to rapid revenue acceleration or measurable impact on P&L. That’s actually pretty close to what I’ve gotten from enterprises, but I don’t think that the MIT data on the projects hits the right points. One clear issue is that it doesn’t really cleanly separate the various AI models, and the way it’s done and the period of analysis suggests to me that it’s based on traditional generative AI, meaning AI that is pre-trained on wide-ranging (almost always Internet) data and likely cloud-hosted rather than self-hosted. It almost surely doesn’t focus on the only AI model enterprises have really found useful, the AI agent.
Generative AI, in the form of large language models (LLMs) pose an ugly tradeoff for enterprises. If they rely on hosted AI services, they confront issues of cost, security, and sovereignty. If they elect to self-host, then the investment in a multi-rack AI cluster, the training needed for it, and the sustaining effort, combine to complicate the business case. Can they find enough applications of AI to justify the investment, and how do early applications get supported without raising the risk?
Agents are different; they don’t require a lot of resources to run the relatively small models needed, and they are often bundled with software tools or provided in the form of foundation models, pre-trained for a mission and needing only access to current and historical enterprise data. Since this sort of AI is easier to host, it’s distributable. Yes, in the form of AI-enhanced chip-based device technology, but also in intermediate points. You can run agents in a server or two inside a normal data center rack, on a PC, even in a remote piece of a network. The “Hal” of the future isn’t some gigantic virtual brain you can dumb down by pulling a board or two, it’s essentially a highly hybridized cloud application. “Highly” because what we’re likely to see with AI is a whole set of cooperating agents distributed from point of use to point of business storage. This means that agents are inherently part of workflows, linked via APIs and messages, and that the role of an agent and where the agent resides in the AI food chain will have to be considered, and carefully.
Placement of components in a cascade workflow today has to balance three essential things—latency, governance, and resources. Where latency is important, the elements of processing for a given message have to be proximate to the source. However, AI capabilities grow as resources applied to AI increases, and that means deeper placement is likely mandatory for economic reasons. Governance requirements limit whether the AI element can be hosted out of company control (in the cloud or other shared pool of resources) owing to data security mandates. That, in turn, depends on just what data the agent accesses, including message content.
So how does this relate to AI overall? For AI chip companies, it means that there are likely to be a lot more chips deployed in the future, but the chips will surely be far less complex and expensive. For the edge points, AI functionality will be embedded in CPU chips (CPU/GPU), and for close-to-the-edge agents, smaller language models will need less horsepower, to the point where we could logically expect to see the raspberry pi version of AI deploying on premises, local to workers with AI phones or PCs. That’s also likely a combined-chip mission. But GPUs for more demanding applications will likely be smaller than what we see from top-end Nvidia chips today. Profits will likely be lower.
The cloud-service version of generative AI as used today will likely host agent technology that depends on generalized training; think applications that rely on government economic, demographic, and other research data, where company information isn’t involved so governance isn’t a barrier. That means that if the current AI giants can get away from the notion of AI in the form of a know-it-all personal friend, they can expect to earn a return on their investment.
The real money will be in agent foundation models. The whole notion of agency in AI, and the whole business value proposition, is based on the presumption that there are a general set of business opportunities that arise from the fact that many business processes are common across multiple firms, even across verticals. If you train a model to handle these things, you can feed it your own data and expect good outcomes. This approach makes AI much like third-party packaged software, which was business-changing because you could amortize the cost of creation across a lot of sales.
Foundation models are also a key to distributability. An AI-equipped phone is useless if you have to trawl through the Internet to train it. There are key functions that an edge AI element can perform, and some of them are preparatory for the intervention of deeper models capable of deeper insights. That’s the future of AI, plain and simple.
