I had an interesting conversation with an AI expert from a major AI company, and if I add it to some parallel points from enterprise conversations on AI, I think it leads to some insights into what AI might (or might not) contribute to boosting technology value, and spending.
Tech spending by enterprises falls into two categories—maintenance and modernization, and benefit-driven new applications. The first category sustains technology that current business operations depend upon, and the second deploys new technology to address new ways of doing business, subject to meeting corporate requirements for return on investment. Over the last three decades, we’ve seen the second of these contributions decline, because new applications of technology that can meet ROI targets are harder and harder to find. As I’ve noted, the “low apples” in project opportunity have been picked. To get a big upsurge in tech spending, we need a big upsurge in projects, which means in benefits.
The difference between the low apples we’ve picked and the higher, unpicked, ones isn’t apple variety or even ladders, it’s desks. The workers we’ve empowered up to now are largely (60%) working at desks, at least regularly. They’re involved in planning, analysis, transaction processing, and other activities we’d normally associate with office work. The ones that have been missed (the other 40%) are out in the real world, doing real-world stuff. If we want to make them more productive, we have to somehow give IT systems a window into that real world, and in real time.
We’ve used industrial/mechanical tools to augment or replace humans before, and according to my expert the places where this has worked are places where the “job” includes “tasks” that are amenable to simple autonomous treatment. You can replace a bunch of shoveling people with a backhoe, move goods on conveyors or assembly lines. We’ve largely done those things at this point, and the 40% are doing things that remain, jobs that require more judgment and insight than a simple system can provide. So we need something that can do at least a bit more.
What my AI expert says is that the thing we need to do that isn’t artificial general intelligence (AGI, like “Hal” in the movie 2001) but what’s coming to be called a “world model”. There are two forms of world model emerging. One is a pure virtual world, something like an AI generated alternate reality, and the other is a visualization of digital twin technology, meaning that it’s a virtual world that’s synchronized at some level to the real world. Each of these has its mission in empowerment.
A world-model-as-digital-twin can model the way our missing 40% actually works, potentially offering guidance to make it more efficient, or even tying in automated movement of goods and tools to worker movements. It could warn them of impending problems, including safety risks. For public safety workers, it could tell them where to go in detail, whether they’re policing a street or fighting a fire.
The pure virtual world model might seem like defining a live version of Dungeons and Dragons, but it’s actually essential in training autonomous systems, even training its digital-twin companion applications. Autonomy relies on mimicking human vision, because vision is the most effective way to gather information about surrounding conditions. Feed a world-model vision of a situation, including a risky one, and you can test how the an autonomous system would respond to it.
The insight my AI expert brought me is recognizing that AI systems that operate off digital twins will have to be trained to work in the real world. Yes, you could create a set of digital-twin model inputs to drive a simulation of something, but how well would that translate to real-world behaviors. “You can train a digital twin of an assembly line by creating the right sequence of sensor inputs, but if you want something that operates autonomously in a real-world situation, it will surely depend on visual analysis, so you have to train it on visions.”
This creates a really interesting division in the strategies for empowering our missing 40%. In some cases, what they’re doing is linked to an automated process, like a worker on an assembly line. The context of the line itself sets the worker behavior expectations. In other cases, the worker is setting the context by how they work, and so it’s essential to be able to “see” the worker to understand how to help the worker.
It gets even more complicated if you look at the first case in a more detailed way. Yes, we could model the movement of a car, for example, down an assembly line, establish where it was, what was supposed to happen there, and so forth, with bunch of sensors. But should we do that? Should a digital twin of an assembly line be able to spot something like a vehicle coming loose from the line, spot an inconsistency in the appearance of the vehicle at a given point and the part being presented there for attachment? Can this be done without visual analysis? If not, then we should assume that both the cases benefit from visual analysis, meaning the ability to analyze a video of the work environment. If that’s true, then we’re saying that world models for both operations and training will be needed for almost any of our new empowerments.
How does this visual-centric approach deal with other sensor data that might be incorporated in a digital twin? How does it deal with the intrinsic context of an automated-mechanical system like an assembly line? My expert proposes that these are then integrated with a world model via policy statements. In addition to an AI model of the workplace virtual world, we should expect that some standard language would be used to introduce these policies. Training would generate policy recommendations, perhaps, that would be offered in this language, and if accepted on review would then help govern the worker/environment interactions.
What I’ve learned from the conversation I had is that while I’ve arguably been on the right track with regard to digital twins and worker empowerment, I’ve not dug down as far as I should have. For example, my expert makes a point that the best way to judge a system designed to empower a worker in real-world activity is to look at the limiting case, which is to presume that a possible goal is to introduce an autonomous device as a “helper” or even as a replacement for the worker. If you can define what such a device needs to do, you can convert that to a combination of automation steps and human actions in any mix that’s appropriate.
World models seem essential both for the empowerment of the 40% we’ve missed, and the support of a new level of autonomous processes that might, in the end, replace some of them. My own modeling, which is hardly conclusive, suggests that about 15% of that 40% could in fact be replaced with autonomous elements, meaning 6% of the workforce overall. This is a far cry from putting everyone out of business, I know, so if it’s true you can bet it won’t get much publicity. But we get to an optimum future by facing reality, not telling stories.