Changes in how we build data centers and networks, and in how we deploy applications and connect them, are really hard to deal with in abstract. Sometimes a model can help, something to help visualize the complexity. I propose the whirlpool.
Imagine for the moment a whirlpool, swirling about in some major tidal flow. If you froze it in an instant, you would have a shape that at the top is wide, slowly narrowing down, then more quickly, till you get to a bottom where again the slope flattens out. This is a decent illustration of the relationship between distributed IT and users, and it can help appreciate the fundamental challenges we face with the cloud and networking.
In our whirlpool, users are strung about on the very edge, where there’s still no appreciable slope. They’re grouped by their physical location, meaning those who are actually co-located are adjacent on the edge, and the further they are away from each other, the further they are spaced on that whirlpool edge.
Compute resources are distributed further down the sides. Local resources, close to the users, are high up on the whirlpool near that user edge, and resources that are remote from workers are further down. The bottom is the data center complex, generally equidistant from users but at the bottom of a deep well that represents the networking cost and delay associated with getting something delivered.
When you deploy an application to support users, you have to create a connection between where it’s hosted and where it’s used, meaning you have to do a dive into the whirlpool. If the application is distributed, you have multiple places where components can live, and if those places aren’t adjacent you have to traverse between the hosting points. Where multiple users are supported, every user has to be linked this way.
This illustrates, in a simple but at least logical way, why we tend to do distributed computing in a workflow-centric way. We have computing local to workers for stuff that has relevance within a facility and requires short response times. We may then create a geographic hierarchy to reflect metro or regional specialization—all driven by the fact that workers in some areas create flows that logically converge at some point above the bottom of the whirlpool. But the repository of data and processing for the company still tends to be a place everyone can get to, and where economies of scale and physical security can be controlled.
Now we can introduce some new issues. Suppose we move toward a mobile-empowerment model or IoT or something that requires event-handling with very fast response times. We have to push processing closer to the worker, to any worker. Since in this example the worker is presumably not in a company facility at all, caching processing and data in the local office may not be an option. The cost efficiency is lost and resources are likely to be under-utilized. Also, a worker supported on a public broadband network may be physically close to a branch office, but the network connection may be circuitous. In any event, one reason why the cloud is much easier to justify when you presume point-of-activity empowerment is that the need for fast response times can’t be met unless you can share hosting resources that are better placed to do your job.
The complication here is the intercomponent connectivity and access to central data resources. If what the worker needs is complex analysis of a bunch of data, then there’s a good chance that data is in a central repository and analyzing it with short access delays would have to be done local to the data (a million records pushed over a path with a 100-millisecond delay takes over a day in communication latency alone). Thus, what you’re calling for is a microservice-distributed-processing model and you now have to think about interprocess communications and delay.
Purpose of composability, networking, DC networking, is to make the whirlpool shallower by reducing the overall latency. That has the obvious value of improving response times and interprocess communications, but it can also help cloud economics in two ways. One is by reducing the performance penalty for concentration of resources—metro might transition to regional, for example. That leads to better economies of scale. The other is by spreading the practical size of a resource pool, letting more distant data centers host things because they’re close in delay terms. That can reduce the need to oversupply resources in each center to anticipate changes in demand.
Information security in this picture could change radically in response to process and information distribution. Centralized resources are more easily protected, to be sure, because you have a mass of things that need protecting and justify the costs. But transient process and data caches are difficult to find, and the substitution of a user agent process for a direct device link to applications allows you to impose more authentication at all levels. It’s still important to note that as you push information toward the edge, you eventually get to a point where current practices would make workers rely on things like hard copy, which is perhaps the least secure thing of all.
Do you like having refinery maps tossed around randomly? Do you like having healthcare records sitting in open files for anyone to look at? We have both today, and probably in most cases. We don’t have to make electronic distribution of either one of these examples perfect to make it better than what we have. The problem is not the level of security we have but the pathway to getting to it. A distributed system has to be secured differently.
Suppose now that we tool things to extreme; the whirlpool is just a swirl that presents no barriers to transit from any point to any other. Resources are now fully equivalent no matter where they are located and from where they’re accessed. Now companies could compete to offer process and data cache points, and even application services.
So am I making an argument against my own position, which is that you can’t drive change by anticipating its impact on infrastructure and pre-building? Not at all. The investment associated with a massive shift in infrastructure would be daunting, and there would be no applications or business practices in place that could draw on the benefits of the new model. A smarter approach would be to start to build toward a distributable future, and let infrastructure change as fast as the applications can justify those changes. Which, circling back to my prior comments, means that this has to be first and foremost about the cloud as an IT model, and only second about the data center.