I post my blogs on LinkedIn to provide a forum for discussion, and on my blog on ONAP and its issues, Paul-André Raymond posted an interesting insight: “There is something more easily understandable about Monolithic architecture. It takes an effort to for most people to appreciate a distributed solution.” I agree, and that made me wonder whether I could explain the issues of distributed lifecycle automation better.
We tend to personify things, meaning that we take automated processes and talk about them as though they were human, as though they were reflections of us as an individual. We are monoliths, and so we think of things like lifecycle automation in a monolithic way. We describe functions, and we assume those functions are assembled in a grand instance—us. We’re not distributed systems, so we don’t relate naturally to how they’d accomplish the same task.
The interesting thing is that we work as a distributed system most of the time. Imagine an army of workers building a skyscraper. The workers are divided into teams, and the teams are grouped into activities that might be related to the floor being worked on or the craft involved. There are then engineers and architects who organize things at the higher levels.
In a highly organized workforce, there is a strict hierarchy. People have individual assignments, their “local” team has a collective assignment and a leader, and so forth up the organization. If a worker has a problem or question, it’s kicked to the local leader, and if two teams have to coordinate something, the joint leader above does the coordinating. This is also how the military works, in most cases.
Can these organizational lessons be applied to services, applications, and other stuff that has to be managed automatically? I think so, but let’s frame out a mechanism to prove the point. We’ll start from the bottom, but first let’s frame a unifying principle. We have to represent the people in our workforce, and we’ll do that by presuming that each is represented by an “object” which is an intent model. These models hide what’s within, but assert specific and defined interfaces and properties. It’s those interfaces and properties that distinguish one from another, as well as their position in the hierarchy we’ll develop.
A worker is an atomic and assignable resource, and so we’ll say that at the lowest level, our worker-like intent model will represent a discrete and assignable unit of functionality. In a network or application, the logical boundary of this lowest level would be the boundaries of local and autonomous behavior. If we have a set of devices that make up an autonomous system, one that’s collectively functional and is assigned to a task as a unit, we’d build an intent model around it.
In a workforce, a worker does the assigned work, dealing with any issues that arise that fall within the worker’s assignment and skill level. If something falls outside that range, the worker kicks the problem upstairs. They were given the job because of what they can do, and they do it or they report a failure. So it is with our service/application element—it has properties that are relied upon to assign it a mission, and it’s expected to perform it or they report a service-level agreement violation.
The next level of our hierarchy is the “team leader” level. The team leader is responsible for the team’s work. The leader monitors the quality, addresses problems, and if necessary, kicks those problems upstairs again. Translating this to a service/application hierarchy, a “team-leader” element monitors the state of the subordinate “worker” elements by watching for SLA violations. If one is reported, then the “team-leader” element can “assign another worker”, or in technology terms break down the failed element and rebuild it.
I’ve mentioned the notion of “kicking” a problem or issue several times, and I’ve also mentioned “assignments”. In technology terms, these would be communicated via an event. Events are the instructional coupling between hierarchical elements in our structure, just as communications forms the coupling in a cooperative workforce. And just as in a workforce, you have to obey the chain of command. A given element can receive “issue events” only from its own subordinates, and can make “reports” only to its direct superiors. This prevents a collision of action resulting from multiple superiors giving instructions (conflicting, of course) to a subordinate element.
At every level in our hierarchy, this pattern is followed. Events from below signal issues to be handled, and the handler will attempt to deal with them within its jurisdiction, meaning as long as the changes the handler proposes remain confined to its own subordinate elements. For example, if a “virtual firewall” element reported a failure, the superior handler element could tear it down and rebuild it. If a “virtual-IP” network element failed, not only could that element be replaced, but the interfaces to other IP elements (representing network gateways between them) could be rebuilt. But if something had to be changed in an adjacent access-network element as a result, that requirement would have to be kicked up to the lowest common handler element.
Each of our elements is an intent model, and since each has events to manage, each would also have specific defined states and defined event/state/process relationships. When our virtual firewall reported a fault, for example, the superior element would see that fault in the “operating” state, and would perform a function to handle an operating fault in that element.
The relationship between superior and subordinate elements opens an interesting question about modeling. It appears to me that, in most cases, it would be advisable for a superior model to maintain a state/event table for each of its subordinate relationships, since in theory these relationships would be operating asynchronously, and a master one for itself. “Nested” state/event tables would be a solution. It’s also possible that there would be a benefit to modeling specific interfaces between subordinate elements, network-to-network interfaces or NNIs in the terminology of early packet networks.
A final point here: It’s obvious from this description that our service model is modeling element relationships and not topology or functionality. The models are hierarchies that show the flow of service events and responsibilities, not the flow of traffic. It is very difficult to get a traffic topology model to function as a service model in lifecycle terms, which is why the TMF came up with the whole idea of “responsibility modeling” over a decade ago.