We are again seeing stories and comments around “what’s wrong with NFV”. That’s a good thing in that it at least shows awareness that NFV has not met the expectations of most who eagerly supported it four years ago. It’s a bad thing because most of the suggested ills, and therefore the explicit or implied remedies, are just as wrong.
Before I get into this I want to note something I’ve said before when talking about NFV. I got involved with the ETSI activity in March of 2013 and I’ve remained active in monitoring (and occasionally commenting on) the work since then. I have a lot of respect for the people who have been involved with the effort, but I’ve been honest from the first in my disagreement with elements of the process, and therefore with some of the results. I have to be just as honest with those who read this blog, and so I will be.
The first thing that’s wrong is less with NFV than with expectations. We cover technology in such a way as to almost guarantee escalation of claims. If you review the first white papers and attended the early meetings, you see that NFV’s intended scope was never revolutionary, and could never have supported the transformational aspirations of most of its supporters. NFV was, from the first, focused on network appliances that operated above Level 2/3, meaning that it wasn’t intended to replace traditional switching and routing. Much of the specialized equipment associated with mobile services, higher-layer services, and content delivery were prime targets. The reason this targeting is important is that these devices collectively amount to only about 17% of capex overall. NFV in its original conception could never have been a revolution.
The second thing that’s wrong is NFV’s scope (in no small part because of its appliance focus) didn’t include operations integration. Nobody should even think about questioning the basic truth that a virtual function set, hosted on cloud infrastructure in data centers, and chained together with service tunnels, is more complicated than an equivalent physical function in a box, yet the E2E diagrams of NFV propose that we manage virtual functions with the same general approach we use for physical ones. There has been from the first a very explicit dependence of NFV on the operations and management model associated with virtual function lifecycles, but the details were kept out of scope. Given that “process opex” or operations costs directly related to service fulfillment, already accounts for 50% more cost than capex, and that unbridled issues with virtual function complexity could make things even worse, that decision is very hard to justify, or overcome.
The third issue with NFV is that it was about identifying standards and not setting them. On the surface this is completely sensible; all we need is more redundant and potentially contradictory standards processes. The problem it caused with NFV is that identification of standards demands a clear holistic vision of the entire service process, or you have no mechanism with which to make your selection from the overall standards inventory. What’s a good candidate standard, other than the best one to achieve the overall business goal. But what, exactly, is that goal? How do standards get molded into an ecosystem to achieve it? If you had to write standards, the scope of what you did and the omissions or failures could be fairly obvious. If you’re only picking things, it’s harder to know whether the process is on track or not.
So what fixes this? Not “servers capable of replacing switches and routers”, because a broader role for NFV first tends to exacerbate the other issues I pointed out, and because you don’t really need NFV to deploy static multi-tenant network elements like infrastructure components. You don’t really even need cloud computing. “Standards” or “interoperability” or “onboarding” are all reasonable requirements, but we’ve had them all along and have somehow failed to exploit them. What, then?
First you have to decide what “fixing” means. If you’re happy with the original goals of the papers, the above-the-network missions in virtual CPE and so forth, then you need to envelope NFV in a management/operations mode, which the ETSI ISG declared out of scope. There’s nothing wrong with the declaration, as long as you recognize that declaring it out of scope doesn’t mean it isn’t critically necessary. If you do want service and infrastructure revolution, it’s even easier. Forget NFV except as a technical alternative to physical devices and focus entirely on automating service lifecycle management. That can’t be done within the scope of the ETSI work—not at this point.
This is where open-source comes in. In fact, there are two ways that open source could go here. One is to follow the NFV specifications, in which case it will inherit all of the ills of the current process and perhaps add in some new ones associated with the way that open-source projects work. The other is to essentially blow a kiss or two at the ETSI specs and proceed to do the right thing regardless of what the specs say. Both these approaches are represented in the world of NFV today.
The specs as they exist will not describe an NFV that can make a business case. The specs as they exist today are incomplete in describing how software components could be combined to build NFV-based service lifecycle management, or how NFV software could scale and operate in production networks. That is my view, true, but I am absolutely certain it is accurate. This is not to say that the issues couldn’t be resolved, and in many cases resolved easily. It’s not to say that the ETSI effort was wasted, because the original functional model is valid as far as it goes, and it illustrates what the correct model would be even if it doesn’t describe it explicitly. What it does say is that these issues have to be resolved, and if open source jumps off into the Great NFV Void and does the work again, they can get it right or they can get it wrong. If the latter, they can make the same mistakes, or new ones.
The automation of a service lifecycle is a software task, so it has to be done as a software project to be done right. We did not develop NFV specifications with software projects in mind, and they are not going to be optimal in guiding a project for that reason. The best channel for progress is open source, because it’s the channel that has the best chance of overcoming the lack of scope and systemic vision that came about (quite accidentally, I think) in the ETSI NFV efforts. The AT&T ECOMP project, now combined into the ONAP project (with Open-O), offers what I think is the best chance for success because it does have the necessary scope, and also has operator support.
Some people are upset because we have multiple projects that seem to compete. I’m not, because we need a bit of natural selection here. If we had full, adequate, systemic specifications for the whole service lifecycle management process we could insist on having a unified and compatible approach. We don’t have those specs, so we are essentially creating competitive strategies to find the best path forward. That’s not bad, it’s critically necessary if we’re to go forward at all.
The big problem we have with open-source-dominated NFV isn’t lack of consistency, it’s lack of relevance. If open-source solves the problems of service lifecycle automation, and if it has the scope to support legacy and cloud, operator and federation, then it will succeed and NFV will succeed along with it. But NFV was never the solution to service lifecycle automation; it declared most of the issues out of scope. That means that for NFV, “success” won’t mean dominating transformation, it will simply mean playing its truthfully limited role.
Most network technology will never be function-hosted, but most operator profits will increasingly depend on new higher-layer cloud elements. Right now, NFV isn’t even needed there. If I were promoting NFV, and I wanted it to be more dominant, I’d look to the cloud side. There’s plenty of opportunity there for everyone, and the cloud shows us that there’s nothing wrong with open-source, or with multiple parallel projects. It’s fulfilling the mission that counts, as it should always be.