Absolute truths are hard to come by these days, and one area where enterprises tell me they need those truths badly is AI. Generative AI in particular is being portrayed as the most significant development of our times, a pathway to reducing climate change, improving our health and finances, and enriching everyone’s life in general. It’s also viewed as the greatest risk mankind faces, a threat to our jobs, and subject to what could only be considered “hallucinations” that threaten every application. OK, it’s no exactly surprising that AI, like everything else, tends to be touted as either the single-handed savior of humanity or the instrument of its demise. Hype sells.
At the more technical level, we have two basic problems. First, we don’t understand AI technology, which makes it more vulnerable to hype. Second, AI technology is evolving faster than anything I’ve seen in all my years in tech, so when we try to understand it, we’re shooting at a moving target, a fast-moving one in fact.
I’m not claiming to be an AI expert, but I’ve had decades of exposure to AI and robotics, and I’ve developed some software myself that attempted to query “knowledge” and generate responses. With those qualifiers out of the way, it’s my intention to analyze AI and generative AI and try to dig out the important truths that enterprises tell me they’re not finding.
Obviously, “artificial intelligence” is the science of using software to do things that biological organisms would do with their brains. We tend to restrict the term to the process of mimicking human thought processes, but most AI experts I’ve chatted with agree that what we’re actually doing is replicating simpler thinking, and that what we’re trying to do is to approach human behavior.
The most meaningful way to start looking at AI is to take a broad, top-down, mission bias. We can classify AI systems as being “narrow/weak” or “broad/strong”. All of the current AI systems in use would fit in the “narrow” category, because the broad form of AI is designed to equal or surpass the power of the human brain, making it (perhaps, subject to public policies) the goal of AI development rather than the state of the art.
While we often see the abbreviation “AI/ML” which stands for “artificial intelligence/machine learning” the truth is that almost all the AI we see these days relies on a form of machine learning. Early ML development was based on subject-matter experts who worked with knowledge engineers to “teach” the system, which usually meant programming a neural network. Deep learning systems allow the AI software to digest information in structured or unstructured form and reach “conclusions” based on that process, without specific human oversight. This is the current state of the art for AI, including generative AI.
Generative AI is a deep learning application set that developed around 2010 and was first applied to things like image and speech generation. With generative AI, software is “trained” on a specific set of data that (going back to our high-level top-down view) could be narrow or broad. The image and speech applications are examples of a narrow form of generative AI, and things like ChatGPT and other chatbot models are examples of a broader set of training data and a broader application, like generating documents or answering a wide range of questions.
What all generative AI tools do is to “learn” a set of information relationships to create what we could call a “model” of the information space, and then to apply that model to new data to create a new information element in the same space. You might create a “van Gogh” model, for example, and then ask the software to mimic his style in painting your own portrait, by supplying a photo of yourself or letting the software capture your image in real time. What separates generative AI from other deep learning applications is that it creates examples from a model rather than classifying data from a model. They’re usually built on a combination of two technologies.
The first technology is called “Generative Adversarial Networks” (GANs), which combine a “generator” and a “discriminator” neural network. The generator model is seeded from the query/input and it generates an “exemplar” element set, which is then passed to the discriminator model. There, it’s compared with real-world examples from the training process (using a knowledge base) and the results that pass that step are then both considered output and at the same time re-input to the discriminator model set to improve results.
The second technology is called “transformers”. Transformers are neural networks that can learn context and meaning, and so are well-suited to textual material. In a sense, transformers can create text by building on the seed material but using structures and context learned from the training. An encoder element pulls information elements from the seed (the query, usually) and extracts contextual chunks that are then fed into a decoder element. These are assembled to create the output. Transformers are the heart of the most recent evolution of generative AI.
There are a variety of generative models, also called “large language models” or LLMs, the most popular of which are GPT (Generative Pre-trained Transformer”), LaMDA (Language Model for Dialog Applications), and (for images and art) Dall-E. Most generative AI is “pre-trained” meaning that the model has been trained with a specific knowledge set and the user doesn’t have to train it. We’re seeing a lot of “somethingGPT” AI applications that use GPT modeling software but apply it to a different knowledge set for training, and so offer results based on the various training data options.
There are two take-aways from all of this, regarding the use of generative AI. The first is that the process is mechanical, which means that the result is created by analyzing seed elements using a contextual model that predicts results based on the training data. That raises the second take-away, which is generative AI is highly dependent on the training data. There’s an old adage in software design, called “GIGO”, which means “garbage in, garbage out”. If you input wrong information, you get bad results. That’s true in generative AI, but the problem there is that it may not be obvious that you’ve input wrong data. The Internet is a vast collection of information, some of which is true and real and some of which is totally false. Most of it is subjective, meaning whether it’s true or false depends on interpretation. Since the generative AI tools we see today aren’t “supervised” by humans, they do the interpreting themselves.
The publicized generative AI failures are almost surely due to this interpretive point. Asking a generative tool to produce your own biography and most will say they don’t have enough information. If you ask about a well-known figure, you will get a result that’s probably accurate, because there are many sources of information on that person in the training data. In between, you’ll get what might be good or bad results, because mentions of the individual may not include a lot of specific biographical references, and so life activities might be interpreted, and sometimes from casual comments. There may also be variations in how the person is described.
Most enterprises have played with generative AI, but the majority of the applications have been in basic document development of some sort, including advertising copy, product manuals, press releases, and in some cases chat and email response generation. One of the largest “professional” uses of generative AI by enterprises is in software development, where it’s used in code generation and code review. In the majority of these applications, enterprises treat the generative AI system as a kind of junior worker who produces a draft that’s then reviewed and modified by someone with greater skills. In this sort of application, generative AI gets consistently good reviews.
Only a few enterprises say they’ve tried to use generative AI on private knowledge sets, like company databases or computer or network logs. The results here have been consistently more problematic, with the majority of users saying that generative AI produced too many errors to be relied upon, and that the mechanism by which the results were generated could not be audited to assess the risk of an error. IBM’s AI tools and consulting services are the only ones that were consistently rated highly for private knowledge set applications by the enterprises I’ve chatted with.
This point raises another important point, perhaps one of the most important points for enterprises who want to use AI. As you can see from this blog, generative AI is an application of neural networks, and there are many proven AI products out there based on neural networks but not on GAT or transformers, and not generally trained on data like the Internet. For private knowledge set applications, these may be just as good or better than generative AI would be.
Generative AI is getting better too. There are a growing number of knowledge sets being used with one of the primary models (like GPT) to train an AI product, and these specialized offerings can do a much better job in the specific area they’re designed to support, which range from image production to financial analysis. There are also enhanced neural networks and models evolving that would enhance the way generative AI works, including improving the accuracy rate. However, a Harvard study and a technique they developed was found (see here) to have improved truthfulness of generative AI from 32.5% to 65.1%. Even the latter “improved” number is a long way short of what enterprises told me they needed to see in order to trust generative AI for things like operations support in IT or networking.
How much of a problem are generative AI errors or, as the AI community calls them, “hallucinations”? Here’s what Google’s Bard says: “It is also important to be aware of the potential for hallucinations when using generative AI models. If you are not sure whether a piece of text or an image was generated by a generative AI model, it is always best to err on the side of caution and assume that it may be a hallucination.” That’s self-confirmation of user concerns, and it’s important for everyone who uses or considers generative AI.