How would you build an optimum service layer and PaaS for Nephio? If I think the project needs to face those tasks, then I should be prepared to comment on how it could be done. The starting point for that, as always, is at the top of the process, which is the notion of a model from which a service could be assembled.
Back in about 2007 when I was asked by some operators to do a quick PoC to demonstrate that the model of creating services from objects (the TMF Service Delivery Framework or SDF activity) would work, I put together a Java application set to do the job. The overall model I used was one that said that there were a variety of “Service Factories” that would publish “Service Order Templates” it could fill. These templates would be elements of Java code (because doing a decomposition software set to process a data model would have taken too long). Any SOT could be sent to a Service Factory for processing. Once the order was processed, there was an instance created as a Java application, and that application could be bound to any compatible Service Factory when an event in the service lifecycle needed to be handled.
This is what I still think that service order (and intent-modeled) processing should look like. Every modeled element is represented by a data model element that will then steer events to distributed processes that, because all their inputs/outputs were into and from the data model, could be stateless microservices. The task of steering the events through a data-model-resident state/event table of some sort could also be instantiated on demand in whatever location was best, so the entire lifecycle management process would be model-driven, stateless, and microservice-compatible.
It’s important to note that this data model is, in effect, a digital twin of the actual VNFs and cloud resources, and not the actual VNFs and resources. Thus, there is no constraint that the VNFs be stateless or that the cloud provide containers or VMs or bare metal or even white-box switches. The service modeling process is thus a kind of metaverse-of-things.
The approach I’m describing has the advantage of being fully distributed and scalable, which separates it from the monolithic approach taken by the two ETSI projects. Even if you componentize something that’s designed to be a serial A-then-B-then-C process you still have a serial process. It would be possible to componentize ONAP to make it scalable, but I think the interface specifications would tend to limit how much you could actually do. The same is true with NFV; when you have to harmonize down to a single interface, you’ve created a scalability and resilience issue.
The PaaS layer I’ve talked about would define both the way that the state/event processes were built, and the way the VNFs themselves were built. Thus, it would have a foot in both of the “twins”, the real thing and the virtual thing. Each instance of a state/event process referenced in the service data model would use one piece of the PaaS, and each instance of a VNF would use the other. There may also be linkages between the two, since it’s very possible that processes referenced in service data models might also be referenced by the implementation of VNFs. That’s why I don’t define two separate PaaS layers; they’re likely overlapping.
The requirements for the PaaS elements related to service modeling arise from the description of the way that works, and I think we could define candidates for discussion and refinement without too much time and effort being expended. For the VNF PaaS piece, it’s going to be a bit more complicated.
There are things that we could identify in a VNF PaaS with fair confidence. We need to have “interface plugs” that will connect with physical interfaces on servers or white boxes, where there has to be an actual connection to an external element. We need “API plugs” that would define connectivity between VNFs that didn’t require an actual discrete physical interface. We need the interface to my “derived operations” databases to either present data via a daemon process or to request it via query to create a MIB. We need the plugins and APIs to connect to “infrastructure drivers” that provide access to the hardware in a generic way, so the same stuff can run on bare metal, in VMs or containers, and in white boxes, and use whatever AI or switching chips might be appropriate to the VNF’s functionality. Since Nephio defines Custom Resource Definitions (CRDs), we need stuff to interpret and handle CRD processing, and we need the APIs that link to the K8S Operator function plugins.
Defining other VNF PaaS functions could be facilitated by addressing the way that Nephio links with existing VNFs. For example, the Nephio diagram shows Netconf used to control CNFs, which suggests that we might want to have our K8S Operator function for VNF control provide the features of Netconf. Should we simply assume Netconf, though, or should we look at those features and develop a full capability set that we can map to it? I think the latter approach would be best, since Netconf is a device control protocol.
The same is true for the APIs used to bind one VNF to another. We could presume nothing more than a socket relationship, but is there a need to have coordination between adjacent VNFs, creating what the NFV ISG called a “service chain”? VNF security suggests there might be, and the ISG didn’t really take up the question of what I’ve called “Packages”, which are sets of VNFs that are analogous to the way Kubernetes deploys applications. Is there a “Package coordination” task? I think there could well be, since cooperative elements within a virtual device need some mechanism to cooperate, and run-time coordination shouldn’t be passed through an external service model for efficiency/latency reasons.
Considering the PaaS layer is important not only to simplify development and operations, but also to ensure that we actually address everything that’s needed. From the top, with an architecture like the one I’ve described, we can pick out the implementation elements using fairly standard software design processes. From the bottom, starting with the details, we are unlikely to create a truly optimum architecture. We can see the evidence of that in the NFV ISG and ONAP work, IMHO.
That raises the biggest risk Nephio faces, which is that in an effort to utilize work that’s been done wrong, it makes those same mistakes. The initial NFV white paper was valuable in laying out the objectives, and the work done already on VNF hosting can teach us lessons, both positive and negative, without dictating that we follow the same path, the path I believe was wrong. We need to learn the lessons and avoid the traps, and with some care I think that’s possible. That’s why I hold so much hope for Nephio.