Inference: Can AI Achieve it and Scale it Down?

Nvidia’s CEO is talking about “inference” in AI, about making AI really able to think in a way at least similar to the way people do. Is this just another attempt to sustain the hype wave, or is it a realistic and important shift in the way that the AI giants and pundits are considering how “artificial” intelligence (AGI) differs from the real thing? Nvidia CEO Jensen Huang says AGI has already been achieved, but whether that’s true or not is hard to assess, and it’s even harder to say if it really matters.

Enterprises have told me many times that they’re not looking for AI to become human in order to make a business case. AGI, to them, means that AI can reason things out rather than be told everything. It’s the difference between being able to, for example, write code to fulfill a business need, versus translate a specific approach into programmatic steps. The core of this, many say, is the notion of “inference”.

Inference is the application of prior knowledge and experience to forecast the way something works or could be made to work. In AI, the notion is that you’d have a foundation model that has been trained on something general, you’d give it a specific situation to analyze (likely by having it “read” a digital twin or analyze sensors or video), and then ask it to pitch in and do the right thing to answer a question or solve a problem. Enterprises think that’s critical in building AI agents, and it’s also what enterprises mean when they think of “autonomous AI”. They aren’t seeing AI systems running around doing stuff without human supervision, but rather doing a contained set of things within boundaries set by the application, and by the workers who built it.

Current AI is all about training, which means that what it can do is limited to what has been at least discussed, if not done, already. We do have some applications, in the image analysis space in the health-care vertical in particular, where some would argue that we’re already applying inference, but physicians tell me that these are still about training in the old traditional sense. They say that what reading a radiographic image is really simply pattern-matching. One pointed out that the transitive property in math (things equal to the same thing are equal to each other) could be considered a basic form of inference, but for most it’s too basic to be truly a deduction, an inference.

If we could create an AI system that was capable of true inference, it would be able to serve as an expert within it range of knowledge, just like a human could. This says nothing about whether it was conscious and self-aware, nothing about whether the same system could be an expert in other areas, or how that might come about. None of that is critical to enterprises at this point. What is critical is a way of approaching their notion of AI agent value, and they think that’s a form of inference.

Many tech types will cite examples of agent value from the IT and network operations space they’re most familiar with. How could an AI agent manage a network or data center? They see it as a process of observation and inference, a combination perhaps of machine learning and AI. They need to be able to translate this vision into trials, and to do that they need confidence at both the practitioners’ level and the approvals level.

One of the challenges of getting all this confidence is the fact that the AI stories almost exclusively favor the cloud-chatbot model of AI that anyone can use, and to a degree use free. Enterprises have consistently told me that “acclamation” is important in justification; if you can cite a bunch of stories on something, it’s easier to get buy-in for it. Of course, citing real successes from other familiar enterprises would be better, but lacking that good ink will serve well. There aren’t many such stories out there. Even with somebody like Nvidia, at one of their events, cites things that relate to self-hosted AI, it generates only a little buzz, and often only in association with things like robots.

There are two reasons for this. One is that those “practitioners” and “approvers” make up a very small audience, and publications’ revenue depends on clicks. There could be millions of ardent technophiles out there ready to read a story about AI running a factory full of robots, but (as one CIO tells me repeatedly) there are only 500 CIOs in Fortune-500 enterprises. The other reason is that it’s a lot harder to write a story about useful, real, applications of AI to things like operations than to spin a robot yarn, where you’ve got a couple of generations who’ve read Isaac Isimov to populate your prospective audience.

All of this is hiding potential answers to the real question about inference, which the resources needed to support it. If we can make a giant AI data center eating a gigawatt inference-capable, there aren’t many agent applications that we could address with it. Small language models are limited versions of LLMs; would a small inference model even be possible, and how small might it be? It would sure be nice to know that.

It would also be nice to know whether limited inference could be trained into a foundation model, one that could then be used to act on smaller batches of local data. The point is that in real-world missions, the places where inference is likely the most valuable are local to the processes, which limits their physical size and power requirements, and even raises the possibility that they’d have to be portable/mobile.

The AI giants would propose an alternative, which is that inference running in their giant data centers and connected via new high-speed, low-latency links, would serve well, without as much of those annoying small-inference-model questions. Yes, if we could deploy the links and if no local solution were possible. It is likely that early inference missions would be served that way, because it’s likely that just as we’ve seen with things like chatbots and LLMs, improvements in technology will gradually let us shrink the inference engines to something more broadly useful. If, of course, we can get someone to work on something as boring as that, given the hype-ridden state of AI overall. Whether we’ve achieved AGI is way less important than whether we can scale inference down enough to leverage it in real-time missions.

Email and RSS:

Our Commitment: All the Facts, Always the Truth