Enterprises have been telling me that they’re finding the cloud to be a lot more expensive than they’d expected. That has a lot of potential consequences, not the least of which being the risk to cloud provider revenues or the risk to the careers of a lot of developers. What’s really interesting to me is that the great majority of these enterprises were only now embarking on the fact-finding piece of their story. Why is the cloud more expensive? Just bad expectations, bad application design, poor feature choices, or what? A bit of all of the above.
One of the big problems is the expectation that “moving everything to the cloud” was the stand-tall-at-your-professional-association-meetings champion strategy. Most enterprises admit that they believed that theory, at least to a degree. Since moving everything to the cloud would be stupid if it were more expensive, the theory begat the notion that the cloud was automatically cheaper. That limited the pushback on cloud projects, and a lot (as much as a full third of all projects, according to enterprises) of the stuff that got done should never have been approved.
Obviously, at some point, enterprises started to get smart on this. Moving everything to the cloud gradually evolved into creating cloud-hosted front-ends to the “everythings” that were apparently destined to run in the data center after all. The goal was to gain as much of the agility and resiliency benefits of the cloud as possible, without pushing stuff into the cloud that was simply not economical to run there. This front-end role had the effect of improving cloud economies for those who adopted it, but it also meant that the “front-end” cloud piece of an application was developed independently. Cloud software was new software, and this realization in turn gave rise to one of the many cloud’s fuzzy concepts, that of “cloud-native”.
InfoWorld did a piece on this recently, which includes the traditionally aggressive analyst forecast that “by 2025 more than 95% of application workloads will exist on cloud-native platforms”, up from 30% in 2021. Even leaving aside the question of what a cloud-native platform is, this seems a bit unrealistic to me, but let’s face it, even what an “application workload” might be is a question in itself. The article identifies risks for cloud-native development, one of which is cost overruns.
The value of risk-and-benefit pieces on cloud-native hinge on how you define it. InfoWorld provides a definition of “cloud-native development”, which is “…the process of designing, building, and running applications in a cloud computing environment. It leverages the benefits, architectural patterns, and capabilities of the cloud to optimize performance, scalability, and cost efficiency. In other words, we deploy everything to provide cloud-like capabilities, no matter where it runs, cloud and not.”
This adds up to the declaration that “cloud-native” design is about more than the cloud, that it’s about some agile application model that permits but doesn’t mandate cloud deployment. Can we somehow derive a vision of that model? Can we tell how it would impact cloud spending? We have to somehow link this model to the front-end/back-end approach of today, both technically and financially.
My own modeling of cloud opportunity says that the broad cloud opportunity for front-end missions is not 95% of workloads in the cloud, but a maximum of perhaps 55%, and likely more like 45%. But you still don’t have a clear technical picture of what that front-end model looks like, and it turns out that’s the biggest issue in cost overruns for the cloud.
It’s often helpful to look at the extreme ends of a range of things to see what’s going on overall. The “cloudiest” end of cloud-native is functional computing, where applications are divided into microservices that are loaded and run as needed. Since application features are totally decomposed, you can make changes to one without impacting the others, and since individual microservices are small pieces of code, testing is pretty easy. The other end of our cloud-native spectrum is containerized applications, which are much more monolithic-looking. They may be scalable and resilient, but because they were authored that way not because it’s an intrinsic property of the architecture, as is the case with microservices and functional computing. Let’s look at this scale to understand how it impacts cloud costs.
If applications slide toward the functional end, they are almost always more expensive to run. True functions are loaded when needed, meaning that they’re usage-priced, and the per-usage costs add up quickly. Even more persistent microservices are more expensive because cloud instance costs are applied per component, and they add up there too. Functional computing and microservices are a great strategy for stuff you’re not going to push a million messages through, but not so good when that might happen. The InfoWorld article’s comments on cloud cost overruns focus on this particular model, IMHO.
As you slide toward the center of our range, you’re creating larger software components, so perhaps we should think of this part as being featurized computing. You model software as a set of features, which might in turn be made up of functions/microservices but are pre-assembled into a load unit. This means that your code is still agile, but it also means that you have to scale and replace entire features rather than little microservices. It also means that it’s going to be cheaper.
Keep sliding, now toward the container end of the scale, and you start to see a model that looks very much like the way you’d build applications in the data center. Containerized applications often look very much like “regular” data center applications, so what we’re seeing is a further collectivizing of software functions into a bigger load unit. However, you can still scale and repair/replace because the software is designed to allow that. Some developers tell me that the bigger “features” can even be made to be as replaceable as microservices if you use some back-end database state management rather than holding data within the load units. Obviously this would be the cheapest of all to run.
The problem I see in all of this is that we’ve conflated two things that are actually very different. One is the modular structure of the application. Anyone who’s ever done modern programming already knows that you don’t write huge monolithic programs, you write classes or functions or subroutines or whatever you want to call small functional elements. You then collect them into what I’ve been calling a “load unit”. The key to true cloud-native is to think and write in terms of microservices. The key to optimum cloud costs is to load in terms of collections of the components you’ve written, selected and designed to be as scalable and resilient as possible and maintaining the ease of change inherent in the notion of a logic hierarchy rather than a monolith.
The real problem with cloud cost management at the software level is that enterprises aren’t being encouraged to think this way. They think development units equal deployment units, that you can’t develop microservices and deploy persistent containers. That forces the cloud to trade between optimal development and optimal execution, and the trade isn’t necessary. You can write software one way, and deploy all along my scale of options. Depending on where you expect to run the software, you assemble the load units differently.
So does this allow you to “move everything to the cloud”? No, of course not. At some point in the way application logic naturally flows (input, process, output; remember?) you reach a point where you can’t really scale or replace things easily, where data center techniques work best. Or a point where database costs, or access costs, make cloud hosting way too costly. But with the separation of development and deployment units, you can restructure your application for wherever that point happens to be. In my model, that’s what gets you from 45% in the cloud to 55% in the cloud.
Applications designed this way can be assembled and tested in various groupings, and the statistics associated with the tests can be used to get a pretty good handle on what the structure will cost when it’s deployed in the cloud. In any event, it can be deployed and tested there. You should always go with the most concentrated structure of load units that will meet business goals, because every iota of agility and resiliency you try to add beyond that point will raise costs without raising benefits.
This isn’t how we look at cloud applications today, but it’s how we should be looking at them.