As we move into 2026, one of the biggest questions in AI isn’t about language models, but about robotics. How much of the value of AI, particularly the notion of generalized intelligence, depends on being able to link our AI entity to a generalized automated entity, a robot? I’ve blogged on things related to this for some time, and so I’ve collected thoughtful comments from 32 people who actually work on process automation in some form. It’s eye-opening to assess their views, and their visions for the future…even for politics.
My robotic experts uniformly believe that without a strong linkage between advanced AI and process control in some form, it’s doubtful that AI could realize even half its potential. They point out that AI alone is a kind of “thinker” or “pointer”, and that real-world systems require “pushers” who can actually manipulate real-world elements. Thus, they see AI first offering insights, which it does now, and aiding those whose job is to come up with ideas. That’s only roughly 15% of workers, they say. Today’s AI could also facilitate process control in applications where industrial automation is already used—factories with programmable machine tools, assembly lines, etc. However, they point out, applications here may not pay back given that automation concepts have already been applied; AI would have to demonstrate distinct benefits over existing solutions.
The big question, according to all 32 experts, is how AI might extend process control beyond its current limits. They point out that the easiest path to AI justification is reduction in human costs, meaning replacement of or augmentation of human efforts so as to require less of that effort to get a job done. This means either using AI to drive real-world systems to do human work, or drive human workers to do the stuff better.
The first step? Sensory emulation. Experts agree that AI value depends on highly effective analysis of conditions, and that appears to depend on giving AI the ability to “see” and “hear”. Human sight is a worker’s most powerful way of gaining information, with hearing second. It makes sense that AI, to be as effective as possible, would have to tap into both. Given that we have computer voice recognition today, and that even high-end cameras can recognize people, birds, animals, vehicles, and aircraft, we shouldn’t expect a major problem in addressing all this effectively with AI. A lot of progress has already been made.
What’s missing here, and what’s actually our second expert-defined second step, is contexualization. Workers, and people overall, are capable in large part because they can contextualize. We see things better when we’re looking for them, hear them better when we’re listening for them. Work, and most human activity, depends on taking steps in context. Generative AI tools tout their ability to deal with context as a differentiator, so we do have the notion of context in AI already, but experts say what we have is not enough.
A “digital twin” or “world model” is the framework experts accept as the gold standard for contextualization. You build a computer model of a real-world system, use sensory data to synchronize it with what it’s modeling, and then use the model to make control decisions, which you then use to instruct process automation elements. We know how to make digital twins, with or without using AI in the process, but a generalized toolkit to build models designed to facilitate physical tasks is still needed according to the experts.
SO is our final piece, which is improved process control elements. This is where “robotics” comes in, but it’s also the most complex of the issues in the view of the experts. When people hear the word “robot” they think automatically of humanoid robots, and experts say that’s not the way to look at things today, and perhaps won’t be for a very long time.
To make meaningful improvements to process control elements, we need to combine work on our first two points as the first step. We need good sensory tools to gather data. We need a hierarchical notion of a digital twin, one that has an element context, a process context, and a workplace context all linked. Experts point out that human-based processes will usually work this way, with workers, supervisors, managers, and so forth all dealing with their appropriate levels of goals and tasks. One popular description is in the form of three steps. First, the automation element has a local model to allow it to avoid unfavorable situations, and to define and assert a set of capabilities, which it can then delivery on command autonomously. Second, the process element has a context of tasks, which it can then dissect into steps that map to element capabilities. Finally, the workspace overall has an element that controls how processes accomplish the systemic goal.
A self-driving vehicle offers an example. The vehicle itself has a local context that allows it to avoid collisions, and a set of capabilities that include acceleration, braking, turning. The navigation system has a context, a “route” that links vehicle capabilities to a goal destination, and to conditions like weather and speed limits along the way. That system’s route is determined by the highest level, which takes current location, destination, and overall conditions to pick an optimum path and dictate the turn points that make it up.
We would not expect that self-driving cars would be controlled by humanoid robots. The mission does not require it, and the cost would be prohibitive. Most human tasks that we’d characterize as “work” would fit into the same classification, but experts say that eventually they’d expect humaniod robots that, while not necessarily look like the robots of science fiction, would be capable of asserting specialized capability sets that matched the current mission. The same device, with an “auto repair” personality, could fix a car and (with a “landscaper” personality) do yard work. In some cases, it might have attachments that would facilitate a given mission, replacing traditional limbs in a humanoid configuration.
In a sense, robotics pose a greater threat to jobs than generative (or even broader) forms of AI. Thinking, intelligence, has to act on the real world in some way to generate benefits. It can work through people, human action, or through robotics. The former approach obviously has more limited benefit but potentially lower costs if robotic advances are slow, but should we come up with truly high functionality in robotics, it could make the robot-and-AI combination truly socially disruptive.
Nvidia seems to be accelerating its support for robotics, which I think illustrates that they see the truth that robots unlock a whole new, and enormous, potential source of benefits to drive AI spending. They also unlock a bigger political debate, and that may prove more of a problem than the benefit justifies unless Nvidia and others can make major progress quickly.
