If you look at the things network operators say have helped them control costs the most, opex reductions top that list for well over 90% of operators. In contrast, only 67% say that capex reductions have been a help. The problem is that almost all of the operators who say that opex reductions have helped them the most say that they’re running out of opex savings. Of that group, 59% say that they will have taken all their forecast opex reduction moves by the end of 2024, and 89% say by the end of 2025. Are we really out of gas on reducing opex, and if so what might it mean?
As I’ve noted before, the opex savings that have dominated operator cost management over the last 20 years have focused on reducing outside craft activity and customer support contacts. Remember when you could dial “Information” and when you could get a real human by calling for support? Few areas still offer those conveniences these days. The thing is, you can only cut this sort of thing so far because you can’t reduce headcounts in any given area below zero, and in many areas you reach a point where further cuts threaten to increase churn, which is even worse than opex in its impacts.
There’s always been another kind of opex out there, what I’ve called “process opex”, that relates to the cost of maintaining the network infrastructure deeper than customer access. There’s a vision of process opex as being centered on the network operations center (NOC), and that’s true in that the NOC is where process opex is coordinated, but there’s also a host of people distributed around the service area to handle the essential tasks of diddling with real gear and real connections. The total opex associated with this is comparable to capex for many operators, and while it’s been cut by about a quarter in the last 20 years, it’s still a major cost component and one that 73% of operators say could be a viable candidate for cost reduction.
Why only “could be”, given that operators are eager to find opex to cut in order to avoid cutting capex and potentially under-investing in infrastructure? The problem is that process opex seems to be a function of the overall network and network management models, and it’s thus tied to the infrastructure itself. You can tweak the edges, but to make fundamental changes you’d have to rethink how you build networks, which is something operators are really, really, uneasy about doing. Could there be a way, or ways, that these changes could be made?
The biggest problem in process opex is process complexity. The number of operations interventions needed in network infrastructure tends to be proportional to the number of element connections, which is geometrically larger than the number of actual elements. Operators say, for example, that the number of element connections in a network of twenty elements is over 150. They also say that things like 5G, which introduces a lot of functional components, makes infrastructure more complex. So do other things we depend on, like content delivery networks (CDNs). If the number of network components that actually push traffic grows by 5%, the number of components that coordinate network and service behavior will grow by 10% to 15%, say operators.
One of the big drives behind the “virtual function” initiatives we’ve seen is the theory that controlling complexity is facilitated if coordinating functions are hosted rather than supported through discrete appliances. I can host a hundred functions on a dozen servers, reducing the number of devices to operationalize. There are still more element relationships because virtual functions are still connected functions, but less hardware to manage. However, the number of function connections actually turns out to be higher, not because they’re virtual but because service control is becoming more and more complex as we try to improve QoE and add features. That’s largely erased any opex benefit to virtual function adoption, according to operators in general and mobile operators in particular.
Autonomy, adaptive networking, or whatever you’d like to call forms of self-management and remediation, is another hopeful concept that’s not quite done as much as operators have hoped it would. For decades, IP networks have been built to manage path control through discovery processes that detect failures or congestion and move traffic around to optimize conditions. One thing this does is to reduce the number of times when a network problem requires immediate human intervention, and that in turn reduces the number of operations professionals that need to be deployed across infrastructure scope to handle the issues. If you have more time to fix something, you can pull a tech from a greater distance. That’s still true, but with more complex “relationships” between functions/features that make up a service, you need more and more adaptive behavior sets to create to keep things running. What backs up the data path doesn’t fix the failure of a control-plane element.
We’ve seen a gradual shift in thinking from all these hopeful strategies, a shift that’s led toward the idea that virtualization and abstraction could be used both to simplify infrastructure and to facilitate autonomous management. The idea is to collect multiple related elements into a virtual super-element, and to combine it with self-management tools to allow it to repair itself. Networks built from this approach, called “intent modeling” could presumably gain all the benefits in operations efficiency without having to rebuild the actual infrastructure.
An intent model is a black box that advertises properties to connected elements, and that is responsible within itself to remediate problems that would cause it to violate its SLA. Only if there is no remedy possible does it report a fault, and since the whole of the network is a nest of intent models, that report is made to the next level of the intent hierarchy, and it’s then the responsibility of that model to remediate. Each of the infrastructure elements is an autonomous system that’s managed to its own SLA, and that’s responsible for not only real infrastructure within it, but any other model elements subordinate to on whose SLA it’s dependent.
Abstraction of pieces of infrastructure, whether they’re divided geographically or functionally (or both), can reduce complexity because at the service level the network is simpler (consisting of those abstract pieces) and because each abstract piece is self-remediating. Operators tell me that they believe this approach would work for them, and could reduce process opex, but the problem is that network infrastructure modeling based on abstract intent-modeled elements isn’t what vendors are preaching and what operators are used to. Among operators, knowledge of and interest in this approach is confined largely to the office of the CTO, when responsibility for operations is (of course) part of the operations group, an organization with no common report with the CTO other than the CEO of the company.
There are actually more complications, at least potentially, than operators see. I believe that to make the intent-model approach work, you’d need to have a form of service modeling that can handle the necessary abstractions. TOSCA is pretty good at that, so there’s at least a candidate. I also think that you have to assume that the model for each intent-based abstract element would have to provide state/event logic in some form, in order to handle the event-driven nature of management processes overall. TOSCA can be made to do that, but it’s not a perfect solution. Finally, I think that it’s impractical to assume that each element would be based on an independent management solution, even if it presented both these required features, because maintaining it and integrating it would be difficult. We are likely a way off from that happy situation, or even the potential for it.
In the end, operators are almost surely going to want, and need, a form of process opex optimization. There is little hope that current services will result in greater revenue, so higher profits means lower costs. It’s very difficult to achieve that through capex reduction, even if you assume open-model networking, because of the problem of writing down current gear to achieve any harmony of infrastructure under the new model. At the least, a strategy for process opex would offer operators some time to develop other strategies on both the revenue and cost sides. One such strategy, which we’ll take a look at later this week, is re-architecting the actual network to make it both more capex-effective and opex-effective. Is opex-optimized infrastructure possible? We’ll see.