Why We Need to Rethink “Cloud-Native”

I’m an avowed opponent of industry terms that have no stable definition, particularly when that lack perpetuates myths and hype. And, yes, you can argue that “AI” or “artificial intelligence” is such a term, but there’s an earlier one that I think is particularly destructive to the telecom world. It’s “cloud-native”.

The accepted definition of cloud native is something like “a software approach that fully exploits the hosting model that cloud-computing offers, maximizing its benefits.” OK, that sounds good, but it begs a number of questions. Does it have to exploit all the features of the cloud, most of them, some of them, or what? How do you define “fully”? But the biggest, I think, is how do you do it, in a software architecture sense?

Many of you know I worked on an open-source project called “ExperiaSphere”, that I launched to create an implementation of what I believed “cloud native” should be. Not the only one, for sure, but one that by framing the goals in software would offer both a proof it could be done and a way of answering those earlier questions I posed. I won’t talk about the project here, but rather about the principles, and they were designed specifically to drive the evolution of services and service management in a telco world.

Let’s start with basics. A service of any sort is a collection of resource commitments that create the feature set being sold. The service-in-waiting, then, is a recipe, a set of instructions about how the resources are committed. The resources are a pool of cooperative elements, devices, servers, connections, and so forth. These assert features, and those features are the pantry that a service sale draws on to follow the recipe that defines the desired outcome.

Service sale? So there’s also a process, the process of selling a service, billing for it, tracking the accounts, building new recipes, committing resources to a sale, releasing them when the term expires, restoring operation if a resource fails and has to be replaced…a bunch of processes, then. In the old-line monolith world, these processes would all be applications, with a queue of inputs and a bunch of outputs. That’s what the original OSS/BSS systems were, in fact. That’s not cloud-native.

How do we make this cloud-native? We have to stop thinking of commercial paper like orders and bills as drivers, and instead think of them as byproducts. What drives these processes, all of them in fact, is events. Things that happen, signals that request an outcome. The challenge in this whole service ecosystem lies in handling these events within the context of the business—financial constraints, legal constraints, resource constraints, and even the constraints set by those service-driven commitments of resources. All these constraints encourage us to think of the service ecosystem as a set of models, and that includes both the resources (financial and otherwise) and the processes themselves. An event is processed based on the state of the things that constrain it, whatever they are, which makes the structure of cloud-native applications one of state-event systems. Send an event to a process model, and it handles it based on its state.

A sale is an event, so it goes to a sales-event-handler, which draws a recipe from a file and dispatches it to the resource process as an event. If the state of resources and resource policies permit, the result is a commitment, which is the instantiation of the sale/model on the resource set. Now, that model is a state/event process too. If it gets a fault event from a resource, it acts on it by either replacing the failed element or reporting a failure upward to a billing process, which also got an event notifying it that a service had been instantiated. You get the picture.

Essentially, cloud-native stuff is model driven state/event stuff. The processes are responses to events that are linked (in a table or graph) to each state/event combination in each model. The models contain everything needed to process an event, so the processes at all these intersections are microservices; you spin them up when and where you need them. Thus, there is no “OSS” or “BSS” or “NMS” in the traditional sense; all these are really just a bunch of microservices floating in state-event-model hyperspace. What connects them is the state/event relationships driven by events that weave through them to reflect stuff going on in the real world. An application of old, then, is really just a structured event flow and the processes it connects.

By convenience, though, you can talk about things like service management, resource management, accounting management, the whole FCAPS thing and the whole TMF thing. The policies that constrain all the ways that events are processed in various states of various models collectively define an “application” set. My own view is that there is a “business model” that frames the business flow, and under that there’s a service model set and its processes, a resource management set and its processes, and the business-level models and processes that reflect customer management, accounting, purchasing, inventory, personnel and payroll, and the rest. But a customer, a router, an invoice, and so forth are all models or events in their own right. Accounting and sales manage customer models or supplier models.

The properties that we looked for in that basic cloud-native definition fall out of this. You can spin up as many instances of a given state/event microservice as you need, where and when you need them. The commitments in resources are elastic both for the services and for their management. Same for resources. Same for software. It’s not a bunch of machines chugging along, it’s almost an organism, something that expands and contracts almost like breathing.

You might wonder about this, of course. Why does Tom think he knows how to do this? First, I’ve done a lot of event-driven software, as a programmer, an architect, and running teams. Second, I’ve followed the processes of people who have done it wrong, like the ONAP work. Telco types think in monolithic terms. You can convert a monolithic design into microservices, though, and if that’s how you define cloud-native, then you’re there. It’s not the right way and it won’t really meet the goals because architecture constrains implementation. Think back to the old days of IT; read a record, process, and write. Then turn that into read a queue of requests…and so forth. Now, divide that into microservices and tell me it’s cloud-native, optimally scalable, resilient, and elastic. Just dividing stuff up doesn’t make it elastic or agile, and until the people in telco-land get this, cloud-native is forever beyond their reach.

Email and RSS:

Our Commitment: All the Facts, Always the Truth