A Second Look at “Cloud Native” for Telcos and Enterprises

There’s been a lot of telco interest in “cloud-native” architectures for features and functions. The NFV ISG redid (to an extent) its model to accommodate that goal (see this excellent summary), and it’s also a goal of the Nephio project that targets the use of Kubernetes to deploy telco service elements. The problem is that defining what “cloud native” means is a slippery slope, and none of the telcos I talk with seem to have a consistent definition. This could threaten the movement toward it, by disguising the direction to take and the progress made.

Oracle says “As defined by the Cloud Native Computing Foundation (CNCF), Cloud native technologies empower organizations to build and run scalable applications in public, private, and hybrid clouds. Features such as containers, service meshes, microservices, immutable infrastructure, and declarative application programming interfaces (APIs) best illustrate this approach.” Amazon is even more general: “Cloud native is the software approach of building, deploying, and managing modern applications in cloud computing environments. Modern companies want to build highly scalable, flexible, and resilient applications that they can update quickly to meet customer demands. To do so, they use modern tools and techniques that inherently support application development on cloud infrastructure. These cloud-native technologies support fast and frequent changes to applications without impacting service delivery, providing adopters with an innovative, competitive advantage.”

I asked some telco friends to pick some terms that they associated with a “cloud-native” implementation. The top one was “containers” with 70% picking it, followed by “component-based” (68%), “agile” (61%), “microservices” (49%) and four other terms that couldn’t collect 30% between them. Interestingly, if you ask cloud developers/architects, they pick “microservices” (88%), “stateless” (78%), “service mesh” (67%), and “functional” or “serverless” (54%). Not surprisingly there’s not a lot of congruence here, and it’s my personal view that the differences are largely due to “cloud-native-washing” among telco vendors. For your interest, the top view of enterprise executives, held by 59%, is that “cloud-native means it never ran anywhere but the cloud.”

If you ask telco planners why cloud-native is important, their top response by far is that it’s essential to ensure reliability and availability, and the second choice is that it supports higher performance. Two-thirds of telco planners think that mobile infrastructure could and should be built based on cloud-native, hosted, virtual functions. The other third thinks the majority is nuts, so I guess we have even less convergence on this point.

What’s the right answer? Well, you’ll be disappointed to hear that you can make a case for all the choices, for none of the choices, and for everything in between. The problem here, and one that may even impact the lack of a consensus view among telco types, is that the value of and optimum architecture for the cloud in hosting service features depends on what the service feature is.

Let’s start with some realism. There is no way that a server and software can perform a data-plane operation as well as a specialized device like a router. The difference may not be compelling for missions that don’t involve a lot of traffic and that can accept a greater latency burden, but the difference is there nevertheless. You could, perhaps, argue that virtual functions could host RAN data plane activity, but based on what real telcos have told me, that view is not widely held by the telco community. Five of nine telco CTOs I’ve heard from say the network data plane may be a white-box candidate, but not a virtual function candidate. I also believe that even CPE/CLE per-customer devices are better supported by white boxes or appliances than by hosted functions, but this view of mine has support from only three of the nine CTOs. Sorry, CTO majority, but I disagree respectfully.

Where hosting makes sense is in the control plane. Control packets in networks exist at multiple levels, but in mobile networks the “control plane” means a level of mobility management and registration above all the components of the IP stack. You can think of IP control packets as representing “events” and the control plane’s packets as being more transactional. Servers handle transactions all the time, and even IP control packets might be candidates for server processing if they weren’t tightly coupled to the chips and devices that push packets in the data plane. Think IoT events as a comparable issue.

What about in a role like “firewall”? I’m of two minds here. On one level, we run personal firewalls on desktop and laptop computers and even on phones, so the mission isn’t totally out of range for generalized compute chips. On the other hand, site-wide firewalls are traditionally appliances and while NFV targeted these applications, they sort of ended up as being hosted on “universal CPE” or uCPE, which is another name for “appliance”. I think that many specialized chips, both FPGAs and switching chips, could be pressed into firewall service and provide performance and latency benefits.

My two points here are raised to explain why I think there’s a different perspective among telco types on what “cloud-native” means. Some people in the office of the CTO have worked on standards like NFV, and they’ve taken a broader view (“containers”) because that body accepted a wide set of missions for server-hosted virtual functions. Others in the same office have worked on the mobile side and see the mobile control plane as the function target. That group thinks in terms of microservices.

They may both be wrong.

What’s the difference between a “microservice” and a “container?” A purist will tell you that containers host things and microservices are things, which is pretty fundamental, but in popular usage a container is typically a bit of a monolith and a microservice is typically a set of stateless components. “Stateless” means that its outputs are a function of its inputs, meaning that a microservice doesn’t store information between process executions or get it from somewhere else. The same input would always produce the same output in a true stateless process. Transaction processing is really not a stateless function in most cases, which is why I said “more transactional” in my description of control-plane activity. Many control-plane functions are actually event-like but some are rather transaction-like in that you send something and expect a response, and the response depends on the status of an element outside the component that’s handling the event.

Containers can also benefit from being “stateless” in that they don’t hold data from execution to execution, because this approach means that you can instantiate one as needed to replace a failed element or to scale. However, containers are not “functional” because they do rely on external information, supplied in a database for example. Is that bad? Not necessarily, but it means that there is a container model that somehow fits in between “monolith” in the traditional sense and “microservice” in a similarly traditional sense. You can’t say that a traditional monolith is cloud-native because it’s not agile/elastic, but you could say a semi-stateless container is cloud-native. That means we can’t simply assign “cloud-native” status to anything that’s containerized. If we want to understand what properties we should value, we have to look at what kind of applications are typically implemented using “stateless” techniques and what they need in the way of hosting.

I’ve done a fair amount of protocol software development, and these applications are almost always based on what’s called “state/event” logic. The system, which the software implements, has a series of specific operating states, and events are interpreted in these states in different ways. In my software implementation, the junction of a “state” and “event” (think of the two as rows and columns in a matrix) is a pointer to a software element that is often stateless, meaning a microservice. However, my stuff typically bundled these “microservices” with the state/event matrix and the logic that queried that matrix to trigger the processing of an event. You could implement that in a container, and of course you could implement the matrix and its own processing software as a container and each of the components the matrix pointed to as real, independent, microservices.

OK, this is interesting. I opened by saying that a container was something an element ran in and a microservice was one of the elements. We could in fact say that a containerized network function was “cloud-native” in every respect if it was a bundle of the state/event matrix and the process elements it referenced. We could also say that we could distribute those microservices, host them as “functions” or “serverless” elements, and the result would be cloud-native. And we can also say that either of those structures could have been created well before there was any such thing as “the cloud”. (I created both myself in development projects thirty years ago or more). Thus, we may be making too much of this whole cloud-native thing, or perhaps making the wrong thing from it. Cloud-native is really a modern wash of an old design concept used in event-driven applications. The sort of thing the cloud is really good for includes, and may in fact be dominated by, applications that are event-driven. Yes, we use the cloud for transaction front-ends, but the front-end piece isn’t really doing transaction processing, it’s doing transaction prep by handling GUI events.

Time for a reality check. The application defines the optimum functional architecture. Hosting platforms define how that architecture maps to resources, and application software architectures optimize both, trading cost, performance, agility, and other factors. Cloud-native is just a (hyphenated) word? Maybe so, or maybe it’s the broad term for the application software architectures that best exploit the features of the public cloud. What are they? What they need to be, and that’s going to vary considerably across the spectrum of applications. One size, one meaning, won’t fit all.

Email and RSS:

Our Commitment: All the Facts, Always the Truth