I got an interesting question from an telco planer recently, on the ever-popular topic of generative AI. What they were interested in was the infrastructure requirements associated with deploying a generative AI service, and also what they might want to consider as the basis for a “private AI” offering. I think it’s a worthwhile question to answer on a blog, so here goes.
The first point I want to make is that enterprises, as I’ve noted in a prior blog, are divided on the risks of generative AI. Over half of enterprises who contacted me on the subject of generative AI said the public versions of the technology posed a security and compliance risk, and even public cloud services that host private generative AI models are a threat to just under a third of enterprises. Self-hosting seems to be favored by the same enterprises who favor self-hosting business-critical applications and the associated databases; they’re worried about the risk of hacking. However, enterprises seem more concerned about cloud-hosted AI services than cloud-hosted applications because they’re uncertain whether the software might be using their data for training, and thus might expose something.
The problem enterprises are facing is that they like the potential of the big-name public generative AI tools like ChatGPT, but quickly realize that there’s no hope they could justify the resources needed to run a private copy, and especially to train it. The self-hosting approach, then, is limited to more modest AI tools, but even there the enterprise AI advocates admit that there’s a lot more to it than meets the eye. One commented that they had used open-source database and other tools for a decade and, yes, it took effort to build and maintain the code, but their early exposure to generative AI hosting was “a whole new world”.
Once you get an enterprise to say that they’d like to self-host generative AI, you tend to hit the Great Uncertainty. The majority of enterprises don’t know where to start; only 22% said they’d even looked into the full range of options for getting the software to host. Of that group, less than half could describe what the “typical” requirements for hosting would be, even to the extent of just drawing out an architecture model with boxes and connections labeled. If you press all enterprises on the question of the software, well over two-thirds say they would prefer open source, but less than 10% named a single open-source model.
Generative AI models tend to fall into two primary categories, the Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). GANs have been most popular in image analysis and generation, and VAEs for the other applications, but crossovers are possible. A lot of the open-source stuff out there also relies on an open machine-learning library. TensorFlow from Google and PyTorch from Facebook seem the most popular with enterprises, or at least the most recognized.
What are the popular open-source AI models? Enterprises (the few) who have actually looked at the question say that TensorFlow/Keras (from Google) and PyTorch (Facebook) seem the most straightforward. DCGAN is another TensorFlow offshoot that’s said to appear a bit more complex, and since it’s a GAN it’s probably better for image work. The same is true of NVIDIA’s StyleGAN. There’s growing interest in VQ-VAE (OpenAI), which uses the Facebook PyTorch library.
Running any generative AI model in-house is going to be a major challenge, requiring the installation of a whole battery of GPUs. One enterprise who actually did it said that the process was a lot like crypto mining or blockchain processing at scale. They also said that finding people who knew how to train and apply any generative AI tool was a major challenge, something you could expect to pay perhaps 50% over the rate for a typical software developer in order to hire and retain.
Most enterprises don’t get this far, not surprisingly. Those I’ve gotten information from tell me that after looking at the complexity of self-hosting, they elected to “start with a cloud hosting” option. But even those who have looked at the cloud option admit that they’re starting to wonder if “generative AI” is even the right answer. “We’ve come to accept that generative AI is a subset of deep learning, and that other deep learning tools might be better for our mission.”
Only 8% of enterprises even make the distinction between generative AI and deep learning, but all of those that do tell me that one big benefit of deep learning is that you don’t need to build a hyperscale data center to run it. In fact, you may not even need GPUs. Some vendors, including HPE, are focusing on AI as an application that runs at the edge, making it a sort-of-adjunct to IoT, and obviously you can’t build a large GPU complex there. If you dig into the way that enterprise equipment and software vendors are handling AI, you find that most are blurring the whole picture into “AI/ML” rather than pushing generative AI specifically.
That might actually be the right approach for those vendors, and for their customers. It’s hard to get accurate information on generative AI hosting options and who’s taking them, because the number of enterprises who’ve given me responses is just too small. What I do have suggests that enterprises who dig into self-hosting generative AI have largely determined that it’s not justifiable, at least not now. Those who have looked into cloud hosting via a service from someone like Google or Microsoft (the two named most often) are still on the fence regarding whether the business case will pay off.
I wonder if even this group might be overlooking the deep learning superset option. There’s been so much hype around generative AI that it’s possible that there’s an artificial management push for it rather than a reasoned decision to find an optimum AI strategy. Two enterprises told me that their IT organization was told to “evaluate generative AI” and that they took the assignment literally. In both cases, their evaluation is still ongoing. The remainder of those who tried generative AI used public tools.
All of this is confusing, to me (and likely to those reading this) and to enterprises themselves. The only firm conclusion I have is that self-hosting generative AI is far from being a policy, though it’s still an interest. That interest seems to spring from security/compliance concerns with the use of a public or cloud-hosted model, and those concerns have arisen in the evaluation phase, before any real commitment has been made. The problem I see is that the business case for generative AI beyond public-service use is still being assessed, and so it’s difficult to get any accurate information on what self-hosting options would be acceptable.
Generative AI seems to have broadened enterprises’ overall scope of confusion on AI. The real AI/ML taxonomy is complicated enough, and add the hype and AI-washing into the picture and it’s no wonder users are still confused. What I am seeing is a growing belief that the biggest benefit of kicking AI tires via cloud services is to avoid investment in a field that’s the mother of all moving targets. Building out resources to do the wrong thing is never a good idea, and it could easily happen with AI development pace as fast as it is.