Customer care is important everywhere, but probably nowhere as much as with any form of “service”, meaning the delivery of a capability, application, or experience through a network connection. The changing nature of the “services” we’re talking about has created major challenges for those who want to manage the customer experience, challenges that extend the general issues that plague customer care that I covered in my last blog. We’ll go over them now.
Service providers’ own list of issues in customer experience management include two specific things that are variations on the broader list of issues. The first is the question of who the customer is, and the second is the persistent problem of inadequate fault identification. The two are closely related.
Services are a combination of distributed elements and features, often involving multiple providers. In some cases, the process of combination is undertaken by one provider who acts for the sum of the players involved, and in others the combination is extemporaneous. Internet streaming video, for example, is a partnership between the streaming provider (an independent player that might or might not be bundled with the network connection) and a broadband Internet provider. The variety of relationships among the service players means that in some cases one provider may be a customer of the other, and in other cases the service consumer may be a customer of both.
This customer relationship question is important because the consumer of the service may be totally oblivious to who’s doing what, and who’s responsible overall. Streaming video providers tell me that somewhere between a fifth and a third of all their support requests arise from issues with the broadband connection and not their service. Broadband Internet providers say the same thing. The underlying problem here is that the service customer tends to be sensitive only to service failures, which is anything that results in their not getting what they expect. I can’t watch my streaming TV shows, so the Internet is down. Or Amazon or Hulu are down. The user sees the service through a single glass, when it’s actually often a complicated relationship.
Interestingly, this is very similar to an issue enterprises report with their own applications. Companies increasingly rely on the Internet to deliver application access to prospects, customers, partners, and even employees. When that doesn’t work, whose fault is it? Who do the users who’ve attempted access blame when they don’t get what they expect? Even deciding what’s a fault can be complicated. Service consumers are unanimous in their view; a fault is the failure of something to meet my expectations.
Service providers have three different definitions. The first, rarely held, is that a fault is any condition that results in any service user experiencing something less than the explicit or implied SLA. The second, fairly common, is that a fault is anything a user complains about, and the third and most common is that a fault is any failure of an element of infrastructure.
The first definition is impractical in application, according to most providers. There’s simply no value to be had in spending money to detect something the user presumably isn’t seeing, and the difficulties associated with proactive SLA monitoring even for business services are profound. We’ll see them in our discussions of the remaining two definitions, and were technology advances aimed at fault determination sufficient, we might see our first fault definition applied to business services, at least.
The difference between the last two definitions is really one of priority. Most service providers tell me that they rely on network management/monitoring to detect faults, because this approach directs them to something that can be fixed. However, most acknowledge that there are situations where nothing reports being broken, and yet something (or in some cases, everything) is actually broken. Configuration problems are the most-often-cited source of this problem, and they’re usually detected through complaints. Thus, operations people have to be stimulated by alarms and also by problem reports from customer care.
Customer complaints, of course, get us back to the challenges I opened with, which means that you have to quickly move to decide what element of a service is actually causing the problem. Problem isolation, relating to alarm events, has been a challenge for decades, primarily due to the fact that alarms are often generated as a cascade arising from a common failure. Network operations centers may get a “storm” or “flood” of alarms because of a single problem, and so root-cause analysis is important to guide remediation to the thing that’s actually broken and to suppress action on what are really derivative failures rather than primary ones. With complex services this is particularly important because the relationship between service elements is often loose, so one element has little understanding of the condition of connected ones, other than whether they are working or not. A simple network issue could then lead to complex analysis of content distribution elements, when nothing there is wrong at all.
According to service providers, there are two major issues associated with fault isolation. One is that the relationship between feature elements in complex services is often implicit rather than explicit, which means that interoperating feature complexity may not even be visible. Obviously that makes examining it difficult. The other issue is that while there may be explicit interconnection of features, there may not be even an implicit SLA or a means of monitoring it at the point of interface.
This is something that’s come up in relation to the concept of intent modeling. Service providers say that an intent model of a feature, which is then used to construct higher-level intent models, eventually reaching the service level, would include both an interface description that contains a management capability, and an SLA that defines what the management interface should assert in the way of state information. This sort of information would facilitate the derivation of service state from the state of the constituent elements, and also provide a way to dive down from service state to identify elements whose state was not as promised, because the service/element relationship would be provided in the model.
I’ve fiddled with this approach myself in a couple of initiatives, so it’s obvious that I like the intent-model approach. One reason I’ve been watching the Nephio project is that it aims to facilitate the use of Kubernetes (the software that’s typically used to deploy hosted features) as a way of managing actual device elements. It seems to me that it could be an on-ramp to creating a unified intent-model approach to creating services from a combination of devices and hosted features.
As good as intent modeling is for services, it’s not sufficient for non-service (meaning product) customer experience management, and it would likely benefit from the local agent technology that I noted in Monday’s blog (referenced above). The problem is that if services are the top of the model hierarchy, the responsibility for mediating the interface and SLA upward can only fall on the user, and for many (most, likely) services the user isn’t really technically capable or qualified to accept that responsibility. If we added a “virtual user” in the form of the local agent process, then I think we can say that customer experience management for any and all services could be handled by intent-modeling of the services themselves.