Is It Time to Rethink Netops?

Enterprises have always managed their networks, but just how that’s done has always had its own twists and turns. The common thinking is expressed by the FCAPS acronym, meaning fault, configuration, accounting, performance and security, and this is what we could call the “prescriptive” thinking. But enterprises themselves seem to recognize some higher-level issues, mostly relating to the relationship between QoS, QoE, and “fault”.

The emerging debate has, say 84 of 389 enterprises I’ve gotten input from, created three loose camps representing primary approaches to ensuring proper network operations. The three are the QoE camp, the preemptive camp, and the prescriptive camp, and the way the jousting among these informal groups shape up may end up determining not only how we manage networks but how we build them.

The most vocal of the groups, the QoE camp, has 32 enterprises who explicitly claim membership. This group believes that network management processes should be considered to be fault isolation processes, and that they’re invoked not because some fault is detected by the hardware but because a user complains. Networks, they reason, are about delivering quality of experience, and so it’s experiences that should be the focus of management activity. A tree falling in the wood, to this group, is nobody’s concern, so don’t bother listening for it. If somebody notices, and reports a negative consequence, you deal with that.

I think the reason this group is vocal is that this approach is favored by the community of technologists who believe in greater “network populism”, greater focus on line departments and less on technology. It certainly gets more CxO attention; of 183 comments on network management I’ve gotten from CxOs, 104 emphasized the need to focus on whether the network was fulfilling its mission, over focus on technical metrics. One CIO said “I don’t care what the latency or packet loss rate is, I care how many complaints I get.” This CIO hastens to say that doesn’t mean you ignore signs of trouble, only that you don’t step in with remediation that might go wrong until the signs start to impact the mission.

What seems to drive this view is the increased role that human error plays in network problems. Among the 32 enterprises who champion this position, all say that human error is a larger source of faults than hardware, software, or service problems. They reason that if netops teams are out there trying to tweak some obscure network parameter to hit an FCAPS goal, they’re likely to break something that users felt had been working fine. Of the 183 CxO comments, this view was held by 128, which is probably even more telling. I’d have to say that from what I hear, this is the dominant camp.

The competing camp? It’s the next one, the preemptive camp. This group, 24 in number among the enterprises claiming a preferred approach, might be considered a variant on the QoE approach. They say that, yes, the user experience alone should be the target netops aims to hit, and so yes, it should be complaint-driven. However, smart management says that not having a complaint is better than addressing one. Thus, you build a network not to be optimally cost-effective, but to be optimally complaint-prevented. Think overcapacity on links and redundancy in devices and you’ll save money in opex and make users (and management) happier.

One adherent to this view describes a “three-two-zero” approach. You always have three paths from any device available onward to their source of applications or data. You never have more than two devices transited in the path of the information, and the failure of any device or any pathway has zero chance of creating a user complaint. This means that you look at systemic health through preventive planning and maintenance, the “P” in FCAPS, perhaps. A fault in the network won’t hurt QoE, it will just turn on or off some lights, and you recognize a failure because it’s inescapable. You fix it by putting something in to replace that whose lights went into a bad state. No need for diagnosis; it’s staring you in the face, and users aren’t impacted so there’s little pressure to act in haste and mess up.

The 28 remaining are probably just explicit advocates of what all those who remain of the 389 who commented would say they do, prescriptive management, business as usual, based on the traditional rules like FCAPS. The approach here is traditional; you have management telemetry that provides insight into network conditions, and for which there’s a normal range of readings to expect. If something pushes outside that range, you take steps to bring it back. Simple.

The abstract organizational-political driver of greater line influence into tech is an external and perhaps driving force to displace this approach, but there’s also awareness of its issues inside netops groups. The problem is the same one users and CxOs cite, which is human error. Networks are very complex and getting more complex daily. One network professional said that in the fifteen years of their career, they saw the number of monitored variables increase from the dozens to many hundreds, and the number of parameters that could be set rise from “around a hundred” to “probably two or three thousand”. The level of interdependence of parameters has also grown, though nobody was comfortable in quantifying how much. The point they raise instead is that moving “A” is much more likely to cause a flood of changes to “B” and beyond than it was in the past. Errors, then are not just more likely, they’re almost inevitable.

Pilots call an understanding of the state of their aircraft overall, and its place in the real world, “situational awareness”. Netops people agree that good netops practices demand you have it, but that network reality in 2025 makes it hard to achieve. They’d hope that this is an issue AI could help with, but stress that autonomous action is a threat to the staff’s situational awareness. “Tell me what’s going on, tell me what seems the best steps to take and what their likely impact will be, and let me decide which and when. Then go forward stepwise, giving me a chance to override,” one netops pro suggests as an AI paradigm that would work. Seems logical to me.

Email and RSS:

Our Commitment: All the Facts, Always the Truth