In a lot of ways, the IT we have today is derived from, and little different from, what we had in the 1960s. We still think of applications, input-process-output, and reports and documents generated by applications as the way to integrate IT with business processes. Why do we do this? Because we’ve not really thought of any other way? Because we have an enormous mass of those documents still hanging around, in physical form or online? Maybe these constraints are slipping away.
IBM, in one of its “Think Newsletter” releases, asks the question “What if your PDFs could talk?” For operators hoping that AI could increase network traffic, that may be a more relevant question than it appears. IBM’s question references its newly released Granite-Docling mini-model for AI, which is directed at converting documents, from PDFs to potentially any brochure or structured visual/text form of material, to a machine-readable form. From that form you could then have another AI model speak/describe it. You could even, in theory, embed calls to another model.
There’s more. You could also view a document as an on-ramp to an AI agent. You can build an AI tool/agent-whatever-you’d-like-to-call-it, and share it with others. Not share its output, but share the AI structure, meaning that who you share it with would be able to interact with the document/model/agent as though they’d created it. I think you could extend this capability to converted documents using the IBM tool, too, which means you could smarten up existing material.
Think about this a minute. Here you have AI models that become in effect the output in a general form. You don’t have a simple dumb report, you have a report you can talk with, expand on. In effect, what can be created is a kind of portable application, one built on AI. This has great potential, and potential risk.
One thing that’s clear, and that’s that this kind of capability can be misused or misconfigured. Obviously, one way to manage that is to use tools and techniques that absolutely cannot update anything. Another is to build some sort of security into the application, so that it can’t be shared with people not authorized. This isn’t a new problem; any AI agent or any form of AI that uses RAG or MCP or anything similar has to include security/governance safeguard.
Another thing that’s clear is that this capability could mean that simple reporting could evolve into an “active document” model, which means that a former passive assimilation of knowledge from a document (or a simple online equivalent, like a PDF) could become what’s essentially a no-code application, generating traffic during its use. Widespread use of active documents could thus become a source of network traffic, between the users of such documents and the model/source.
There are a lot of applications of this sort of thing, both internal and external. You could easily imagine a kind of chatbot created by this approach, a manual or brochure you can ask questions to, and get answers from. Internally, it could be used to allow for at least some interaction with any sort of regularly generated material. It doesn’t have to expose everything (presuming it’s not designed to) but it could expand on basic material much the same way as public AI chatbots can today. Of course, it could reference self-hosted company-governed data too, so it becomes a broader tool.
I think it’s also worthwhile to think about the potential of hierarchy here. Documents regularly reference other documents; web pages link to other pages. An active document could reference other such reports, and this could become a kind of Web 2.0 implementation, a way of conversationally navigating a topic, even one that’s very broad.
The hierarchy notion also raises another interesting point, which is that you could represent a company, a facility, a real-world system/process, as an active document. Think of it as a digital twin you could converse with. The things you get from real-world systems, like events, might be active documents that can be talked with to facilitate handling. Likewise, commercial interactions could be structured this way. Every output, potentially, could carry its own help file, its own potential for modification and clarification. Don’t like the level of detail in a report? Ask for more, or less. Ask for different sorting and subtotals, different headings and fields. All, in theory, might be accommodated.
Yet another hierarchical consideration is that different levels of active document might be supported by different levels of AI. You might have a local model that handles basic formatting and presentation (“make a stacked bar chart of columns one, three, and five through eight”) and another that delivers any different information requested and permitted, and so on through multiple layers. Hybrid cloud applications, with a cloud front-end to a data center transactional/database system, could be built this way.
All of this would be possible by making extensions to the active document concept, and the result would be to substitute smart documents for applications in the traditional sense, making AI central to almost everything. But would it be practical? If we presumed that by “AI” we meant the sort of giant-GPU-power-sucking monster we have today, surely it would not be. If we assumed that AI is simply a tool, not the end goal, then we could assume that the tool, like the tool of computing, would evolve to maximize its value.
Computers used to be the size of a walk-in closet; the same power can fit on your wrist today. Could AI advance similarly? I’d be unwilling to bet it could not, because who in the 1960s would have believed that Dick Tracy’s wrist radio was something almost everyone could have for less than a hundred bucks? Actually, what we have now is way more than Mr. Tracy contemplated.
Some of my AI expert contacts tell me that it’s already possible to build AI models that are a hundred thousand times as cost/energy efficient as the originals were. How much further would it have to go? My rough modeling suggests that only an order of magnitude improvement would be required. That’s hardly unreasonable to expect.
What about network impact? Obviously, active documents would consume more bandwidths than a passive document, but their use might also increase the dynamism of bandwidth consumption. Could this create a justification, a value, a business case, for NaaS? Could it also create a GPUaaS opportunity for telcos? Think about it.
The future of AI, at least insofar as active document concepts could drive it, depends on a significant commoditization of AI power. Bigger and more expensive GPUs are not what we need; we need populist AI to build population-scale applications and business cases.
