One of the fundamental principles of NFV, deeply engrained but little recognized, is that NFV is about replacing devices with virtual devices. This was done largely to fit NFV into the scope of prevailing network and service management practices, which of course limits the impact that NFV would have on broader operations and management systems. That’s arguably a good thing, but the same decision has negative impacts too. Does the good justify the bad, and is there a way of magnifying the former and reducing the latter to help NFV cope with modern reality? That’s an important question to address if we’re to hope for any significant positive impact from all the work that’s gone into NFV.
The easiest way to start this discussion is with the concept of the “model”, or more explicitly the notion of abstractions and intent models. A device is a physical box that contains logical features and functions, a box that’s deployed as a unit and managed as a unit. In modern thinking, we could visualize this device as an intent model whose external interfaces and visible features were the representations of the functions and features inside. The device is then an implementation of an abstraction, and a software instance (a Virtual Network Function or VNF in NFV terms) is simply another implementation of the same abstraction.
The nice thing about this approach is that it shows why it’s useful to model VNFs after devices; you could switch one for the other interchangeably and the functionality and management wouldn’t be impacted. However, this benefit requires that the device and the VNF implement the same intent-modeled abstraction. To make this happen, you’d have to define your abstraction explicitly and then map both all devices and all VNFs purporting to be implementations to that same abstraction. If you don’t define an abstraction, or if you define one for each vendor/product combination, you have VNFs that don’t have a common model at all, which means you have to integrate VNFs into a network on an almost-case-by-case basis. Which we’re doing, badly, with VNF on-boarding.
This is only one of the problems with device/virtual-device framing of NFV. Another is that there are likely many network functions and features that are not represented by devices at all, or reasons to want to deploy a feature/function that’s already available as a device-hosted element, but in a different context—alone or as part of a composed virtual device. This would involve shifting the management view of a function, either before deploying it or potentially even afterward, as part of evolution.
An example of this is vCPE. What is “inside” vCPE could vary over time depending on what a customer ordered. Does the customer want to manage what they see as a uCPE-hosted set of VNFs as though they were separate devices? Probably not; they’d want to “see” the vCPE instance and see the features thereof as sub-elements in the overall management view.
The final problem is that there are management tasks associated with at least some of the implementations of a given device abstraction. If you manage a box, there are only limited things you can see about what’s inside it, and probably fewer things you can do about it. It’s hardware. If you manage a VNF, you have to manage not only the software instance but the collection of hosting and connection resources that are associated with it. How do you manage that when the management abstraction you’ve assigned to the VNF doesn’t offer implementation visibility? There are no hosts inside an appliance, no virtual connections.
What this sets up is a two-level management process, where one level manages the equivalent of appliances (which can be instantiated in either real devices or in virtual form) and the other manages the implementation of the hosting and connectivity inside the virtual instances of these appliances. This second management task has to be related to the first, but today’s EMS/NMS/SMS processes don’t know anything about what’s being managed (because that was your goal in the first place) and so they don’t offer any way to manage the second part or relate to it. You end up having to create a management process for the virtualized elements, and that tends to reduce the advantages gained by leveraging EMS tools to manage the VNFs themselves. The more complex the virtual configuration is, the less overall benefits you achieve from leveraging EMS tools.
The final problem is perhaps the worst, and one that was developing even before NFV came along. Any time services are created from a pool of shared resources, there is a risk that the management of those resources will introduce instability in itself. You can’t have shared resources managed collectively by the users sharing them. Imagine a hundred users, each allocating the same resources or manipulating the parameters that controlled resource operations, to their own ends. Too many chefs, in this case, might end up not only spoiling the broth, but creating something quite un-broth-like.
In fact, even the attempt by multiple users to access a common resource management interface could bombard the resources with management commands, creating in effect a denial of service attack on the management APIs. That’s particularly true when it’s possible that some users would attempt to obtain resource state frequently, a situation that prompted a proposal to the IETF to create an intermediation layer between management systems and the things they managed.
Called “i2aex” for “infrastructure to application exposure”, the proposal established a database that would collect and store, via a series of agent processes, the data from resource/device MIBs. Management applications or other elements that needed resource status would then obtain it with a database query. Updates would be filtered through a process that regulated what could be changed and coordinated the changes to prevent collisions.
I incorporated the i2aex concept into my thinking early in 2013, calling the result “derived operations”. I presented it to the NFV ISG and also included it in my ExperiaSphere project. While the IETF never developed i2aex, I think something like it is essential in resolving the inherent management conflicts in NFV.
With derived operations, the MIBs accessed by management systems and applications are constructed by “query agents” that include both the query needed to gather the data for the MIB from the repository and filter logic to control what can be seen or changed. These query agents can be spun up as needed, they don’t have to store information, only access the repository. Because a query agent can combine data from the VNF and data from what the VNF deploys on/with, it can create a meaningful status to project at the traditional EMS level. At the same time, it can (with proper privileges) dig into the lower-level details of hosting. A new MIB can be constructed by building the associated query agent, and this MIB can represent either a totally non-device-associated capability, or a composite capability presented by a uCPE host.
Query agents, as the intermediary in all management actions, are also a good place to insert journaling to record management activity and mediate changes that might collide when multiple users share resources. This contributes to stability and governance, and also helps revolve finger-pointing problems. Because a query agent can reference another query agent, a service feature hosted in another administration can “export” a MIB view that reflects the relationship between the retail and wholesale contributors to the service, keeping the wholesale partners’ infrastructure secrets except where they’re revealed to support an SLA.
The principle of derived operations would allow for the composition of management views, both for virtual functions/devices and for services. It would let operators use existing EMS systems where they’re valuable, and construct new models where the old ones won’t work. This wouldn’t solve all the problems of NFV, but it would solve one of the major ones, and so it’s something that the NFV ISG should develop. The IETF, where the original i2aex proposal was introduced, should also take action to resurrect the approach and offer it broadly in networking. Finally, cloud management practices should consider the approach as a solution to the problem of application views of shared resources. This was a good idea that somehow got sidetracked, and getting it back on track would benefit the industry.