AI Risks, with Autonomy in the Spotlight

Is AI going to wipe out your job, wipe out humanity? Few enterprises take the latter seriously, and less than 15% of those who comment on these topics to me see much of a threat to their own jobs. That doesn’t mean that AI isn’t perceived as a risk. Light Reading did a nice piece on this last week, and I got some enterprise comments on it, many relating to the enterprises’ long-standing view of AI agents.

As the article points out, the most realistic if fears about AI deal with mistakes with what could be called “systemic” consequences, not with robots running amok. However, it’s clear both from the article and from what I hear from both telcos and enterprises, that there are physical risks too. This is one reason why enterprises have consistently been cool on the notion of turning big chunks of real operations over to AI. One telco pointed out that a car crash or plane crash caused by network features running amok could well injure or kill more people than a wild robot could.

I hear more about what we could think of as “risk creep” as time goes by, not because there are more risks but because everyone is working harder to make real AI business cases, and risk analysis is a part of that. Citizen AI, the by-far-dominant form, bypasses business case review because it’s expensed, so it doesn’t involve “Investment” to assess in an ROI-target sense. Really helpful AI agents, though, demand self-hosting and thus demand project discipline.

Enterprises have long seen three distinct models of AI agent use—interactive, embedded, and workflow-bound. It’s the first of the three that poses the most risk, according to well over three-quarters of enterprises who commented, and it’s also the agent model that’s giving telcos the most concern. The problem with the interactive model of agent is that it’s hard to bound its scope effectively. The goal of accepting conversational input makes it harder to set limits, which means that you have to somehow be able to limit things on the back end. For agents that provide data, the challenge in that is to limit what sort of data can be returned without limiting the ability to respond to legitimate requests. For agents that do stuff, the problem is ensuring that what’s requested isn’t dangerous in some way.

The benefit to embedded and workflow agents comes from the fact that both are naturally contextually limited by the data they get and by the controls that are exposed to them. Agents that process conversational inputs are more vulnerable to issues because of that breadth of interpretation I noted above, but agents structured as software components operate within the boundaries of the component relationships.

Enterprises have limited experience with conversational agents used in any mission but one that’s customer/prospect-facing, but a dozen or so told me that they had seen two things about conversational agents in a more active role—that they had more “hallucinations” and that they did, on occasion do things that were considered “wrong”, a few times potentially destructive. This, they said, was due to either too general a prompt or one that was poorly structured.

All of this plays into enterprise and telco reservations about pure autonomy, particularly in operations missions. They point out that rogue agent behavior could well put the managed system into a state that human action could find difficult to remedy, and that in any event the bad state could have major consequences while remedial action was being taken. One thing almost all agreed on was that any operational role given an AI agent should be handled in a way that logged all the steps taken. Some suggested that each step should be a halt-and-review point, but others said that operations personnel would tend to pass a step without really reviewing it.

All of this seems to apply to real-time applications of AI agents, too. There are already stories about how AI in military applications has made an error that put lives at risk, but I can’t verify this. In the medical space, some healthcare professionals have told me that AI review of records or radiology was sometimes dangerously wrong, but that in most cases it was caught in human reviews that were mandatory under facility rules.

My own experience with AI, in the form of chatbot services, seems to bear out some of this. I have found that for narrow topics, with careful prompts, the error rate is minimal and the quality of the results is very high. I also found that as the topic broadens or the prompts are less carefully designed, a major error happens often enough to be a problem. A whole series of inquiries on the highly technical process of video color management, for example, delivered results so good they beat documentation. A single query on a broad economic point generated a completely wrong response.

Over the last year, the percentage of enterprises who said that they feared the risk of autonomous agents has risen from 15% to 28%, and the percentage of projects that included specific measures of risk mitigation went from 11% to 29%. Some of this, I think, arises more from the greater number of agent projects being proposed, and the expansion of interest in autonomy, but some is also due to the socialization of risk factors among enterprises. The experience of a trusted peer has been, for three decades or more, the greatest single influence on enterprise tech project plans.

The other factor here, one that most enterprises actually cite, is that even if autonomous AI agents make fewer mistakes and pose a smaller risk than human action, the standard for AI is higher and so the AI risk less likely to be acceptable. Management understands human error, its causes, and how to manage it. AI error is another matter completely.

It’s important to note that job risk is real, at least in the sense of work-risk, whatever enterprise tech types might think. My own experiments with AI in research missions shows that it can do useful, credible, things even now. I know a lot of people who are capable of better, but also a lot who are not, and obviously the latter could be at risk, particularly if they report to more insightful types who develop AI literacy. But here enterprises raise a valid point; how do you develop senior types if AI has taken the jobs of the juniors? Is a decision to replace entry-level types a hope that AI will eventually render knowledge jobs in general obsolete? Who would then act as a check on AI autonomy? Think about it.

Right now, other forms of risk dominate, primarily the risk of a major AI error. Fewer than ten percent of enterprises say they would even consider autonomous operations agents in 2026, and only 15% say that for 2027. Nobody has commented on when they think it would be a fifty-fifty proposition, and my own analysis of their comments tells me that there is way more than enough doubt to make any estimate of when agents might be fully trusted. That, in turn, says that real-time applications of AI will have to deal with this risk issue, and that to me means even greater focus on embedded and workflow agent models in the foreseeable future.

Email and RSS:

Our Commitment: All the Facts, Always the Truth