Can We Ever Get Network Software Right?

It seems like we can never get past the question of the Network Functions Virtualization (NFV) ISG and the value and meaning of its approach to feature/function hosting. For those who’ve read my blogs regularly, you may find this repetitive, but not everyone has and there’s been some LinkedIn comments that suggest that some who read the current material don’t have the context that earlier blogs provide. I don’t propose to provide that here, at least not in the form of reprising old blogs. Anyone who likes can use the “Search” function on this site (look up NFV or ISG) and find them. Instead, let me cover what NFV should have done based on my experience and on ongoing interactions with operators and enterprise users.

NFV kicked off with a Call for Action paper in the fall of 2012 and a meeting in the Valley in the spring of 2013. I believed from the first that the notion was good but that there were specific issues that had to be addressed. First and foremost, a “network service” was always going to be created based on an evolving infrastructure in which legacy devices would dominate for some time. Second, the services would largely be based not on per-user virtual elements but on “platform services” that were collective in user terms, as the Internet or IP infrastructure is today. Finally, hosted or virtual service elements were going to have to be developed and deployed based on the standards of cloud computing as they evolved, since that would be the toolkit that would dominate development initiatives.

My view, based on an early telco initiative (IPsphere) and my own open-source project (ExperiaSphere) was that the implementation of NFV should be model-driven. The model would include all the variables needed to support the service lifecycle and a state/event structure that would describe how the service would respond to management events, including deployment, faults, and takedown. I used XML as the modeling language, but others (TOSCA in particular) could also be suitable. The model-based approach, in my testing of ExperiaSphere, allowed the elements of service management processing to be instantiated on demand, replaced on demand, reused by multiple services, and so forth.

A model-based approach also supports three different ways of obtaining a feature element of a service. First, you can deploy it, as a software instance in the cloud. Second, you can coerce it from a multi-use set of resources designed to support multiple users (like a VPN) through an established management interface. Finally, you can parameterize it, by sending parameters to a management interface on a real device like a router. This means that infrastructure can evolve from a state where it’s almost all real devices to a state where much of it is cloud-hosted features, without making major changes to software. Only the model has to change, and even that can be accommodated through the use of a “proxy” element that maps the model feature elements to one of multiple implementation options (Nephio does something like this).

Perhaps the greatest benefit of a model-driven approach is that it doesn’t presuppose any particular infrastructure, but it also doesn’t presuppose any particular set of feature relationships or the way features map to services. NFV’s biggest problem IMHO was that it expected virtual functions to map to physical devices. There wasn’t enough granularity in the way you built services as a result. You could have a router, a virtual router, but not a set of features that added up to a virtual router or even a router that was part “real” and part “virtual”.

My point here is that yes, NFV should have been more general than demanding VM hosting for VNFs. Containers, serverless/functional, and even bare metal should also have been accommodated, and so should white boxes and existing devices. A “virtual function” should be an instance of a general class of that function, a class that can map to any implementation that satisfies the general requirement set. A function is an intent model, and the implementation is opaque. Therefore, a given function should map to a variety of implementations, with the most suitable chosen when it deploys. If knowledge of implementation flows out of the “black box” of the function, then it’s not really going to exploit the cloud fully.

Another issue that needed to be considered, and that was only given a nod, was that “coercive” model I mentioned. There are a lot of services that are offered to multiple users, even in multiple forms, from a common infrastructure. You can’t really “deploy” a 5G or IMS element, or at least we have to consider the commitment of the element by a single service/user to be non-exclusive and something that really happens only the first time the element is used. How do you address this sort of service without losing generality of your model?

The answer, I think, is that you define a set of states and events that are presumed to be supported across all virtual functions. The NFV approach to management, which started with the E2E architecture paper of late spring 2013, created interfaces and a stateful management process. I never liked that approach. My own ExperiaSphere project used state/event logic, and defined a specific set of states and events that represented the service lifecycle progressions. Every service object, which represented a service instance, consisted of a set of feature objects, and every object (service or feature) had a state/event lifecycle process set. If you needed to “activate” a service that actually required one or more elements (virtual functions in NFV terms) to be deployed, then the “activate” event would cause that to happen if the service or VNF wasn’t in the “active” state. If it was, you could simply ignore that event.

All this should have been reflected in the design of NFV from the first, but the problem was that the architecture was essentially fixed before any implementation options were presented in a form in which they could be tested. The first NFV proof-of-concept, which I submitted, tried to address all these things in a prototype implementation, but by the time it was approved the overall architecture had been fixed.

I think the final lesson here is simple but profoundly important. You don’t define an optimum cloud application by simply hosting it in a container or as a serverless function. You have to design it to be cloud-optimum, cloud-native. It’s nearly impossible to do that without some specific software architecture experience and a familiarity with cloud computing and the trends in cloud technology. NFV was launched and sustained largely by standards people who knew devices, not by software or cloud people. The basic approach shows that, and attempts to modernize NFV have still been driven the same way. The same thought processes also gave us ONAP, and it has similar problems. Neither of the two are likely to be fixed at this point, but hopefully we’ll learn something from their issues and start doing software-based features and service management the right way.

We need to do that. The future of network services is increasingly tied to hosted features and functions, and to insightful management automation. That’s going to require not only software to implement those things, it’s going to require a software architecture to build them on. There’s no substitute for a good architecture, and we need to recognize that…now.

Email and RSS:

Our Commitment: All the Facts, Always the Truth