A recent Light Reading story on the topic of the “cloud-native” movement of telcos includes some interesting comments from telcos themselves. One was “It’s a long journey, and we have very important milestones in front of us.” But is it a long journey, what are the critical milestones, who’s responsible for meeting them, and is the journey even necessary? Let’s see.
One obvious problem with the whole notion of a cloud-native telco is the ambiguity associated with the term “cloud native” itself. The simple definition is “an application that’s built and run to leverage cloud computing services or architecture.” More detailed definitions include a requirement that the application be built on microservices, but just what is a microservice? More definitions.
For decades, programming languages and practices have encouraged “modularization” of applications. Even back in the early mainframe days (the 1960s), programmer trainees were taught to think of their programs as having three “modules”, one each for input, processing, and output. These might then be divided into “subroutines”. The goal was to avoid “spaghetti code”, program logic that twisted around itself, and would today be called a “monolith”. In modern languages like Java, Rust, C and C++, small reusable feature/function elements were provided in libraries for reuse. In some ways, the truth is that it would be difficult to write anything other than modular software with modern languages, in the programming sense. What, then, is needed beyond that for cloud-native?
The extreme view would be that all the modules-now-microservices have to be independent elements, hosted separately, linked via network-connected APIs. This approach works for some human-interactive classes of applications, but I’ve developed network software for decades and I can tell you that it’s not practical for the implementation of complex network services. There’s too much overhead and latency. Even a cursory look at the simple input-process-output separation of the past shows that if each of the three is truly separated in hosting and then network connected, I’d add network latency to every I/O process. Imagine reading a packet from a trunk, then sending it on another trunk to another site for route determination!
There’s nothing wrong in saying that a routing process consists of “get-packet”, “determine-route”, and “push-packet-along”, but to require these to be separate services, separately hosted, is silly and impractical. What has to be done is to separate gross functionality into elements that can and perhaps should be separated, and then divide those gross functions into features which should be modularly developed but bound together for efficient operations.
The problem for telcos is that the first gross-functionality division, a “functional diagram” or architecture, is usually defined by standards. Those standards in the telco world define interfaces that support movement of information, and traditionally those are physical interfaces. In addition, a functional description of something is necessarily at least somewhat implementation-specific. Describe a protocol handler for example, and you might describe the actions to be taken for each message type, which in some programming languages is a “CASE” statement that lists message types and then actions. But almost all protocol handlers are really implemented as a state/event table or graph, and a functional description would lead to a different and almost universally rejected model instead.
Software really has to start with the notion of a design pattern, a framework for a specific function/feature. If we defined packet-processor as “get-packet”, “make state/packet-type determination of action” and “run-action” we’d be OK. The middle process could map, in theory, to a nested-CASE, state/event table, or graph implementation, all of which have recognized design patterns. But if you try to dig deeper into that middle term, you are almost surely going to constrain implementation, and that should never be done. The problem is that standards-writers typically aren’t software architects and can’t immediately recognize when a functional model is at risk to becoming an implementation model. Standards are the real problem with “cloud-native” telecom. Without the right standards formulation, a document like the NGMN’s Cloud Native Manifesto: An Operator View can’t be applied properly, and that document doesn’t offer any design-pattern vision for how a common functional division of a telco function should be made.
Let’s take a mobile network as an example. Standards divide it, reasonably, into a RAN, an “edge” or metro function, and a core. We can assume that the network has to establish a persistent reference label or address for the two entities engaged in information exchange, but also has to accommodate the situation where one or both the entities are moving around between cells in the RAN. A software type, looking at this, would likely see a need for location-based/location-independent routing in general, meaning that the persistent reference label is really a kind of session ID and the current location of the parties in that session has to be maintained for proper handling. The standards define a very specific implementation, one that is not as generalized. To me, that’s bad.
I think we’re going to hit the wall with this issue in 6G, one reason being the desire to support wireless/wireline/satellite convergence. If a smartphone flips from one to the other of these connection models, the path the packets it generates will take will be different, and so would its own “address” at the network level. How do you sustain connections, then? In traditional mobile networking terms, a packet sourced by a convergence-alternate network option would arrive with a different address, perhaps from a different PGW, than the original.
None of the comments made by operators in the story relate to these points, and yet the past telco initiatives (IMS, LTE, 5G, and NFV, for example) have all fallen prey to them. And, since backward compatibility is always a goal in telecom standards and a software-only approach is pushed by operators for 6G, there’s a strong tendency to carry past burdens into the future.
The biggest factor I think has to be addressed by the telco world eager to embrace cloud-native (we’ll get to whether that’s true in a minute) is the need to define connection services as overlays on IP connectivity and on routers, meaning that having devices that purport to be user-plane handlers is a no-no. That device is a router, so you either define router features to address things like mobility or you rely on end-point (smartphone, whatever) hosting of those things. This whole area needs to be ceded to the IETF, which is unlikely given the pressure of the mobile infrastructure vendors.
So let’s get to that last, deferred, question. Do telcos need to embrace cloud-native? No. They need to embrace service feature agility and then the hosting practices that support it. Cloud-native behavior would fall out of such a course of action. Without that course of action, cloud-native would likely be impossible to attain, and certainly not helpful to creating better telecom infrastructure, services, and profits.