They say that Microsoft is acquiring CycleComputing for its high-performance computing (HPC) capabilities, combating Amazon and Google. They’re only half-right. It’s combating Amazon and Google, but not so much about HPC. It’s mostly about coordinating workflows in an event-driven world.
Traditional computing is serial in nature; you run a program from start to finish. Even if you componentize the program and run pieces of it in the cloud, and even if you make some of the pieces scalable, you’re still talking about a flow. That is far less true in functional computing and even less in pure event-driven computing, and if you don’t have a natural sequence to follow for a program, how do you decide what to run next?
Functional computing uses “lambda” processes that return the same results for the same inputs; nothing is stored within that can alter the way a process works from iteration to iteration. This is “stateless” processing. What this means is that as soon as you have the input to a lambda, you could run it. The normal sequencing of things isn’t as stringent; it’s a “data demands service” approach. You could almost view a pure functional program as a set of data slots. When a process is run or something comes in from the outside, the data elements fit into the slots, and any lambda functions that have what they need can then run. These could fill other slots, and so the process continues till you’re done.
This may sound to a lot of people who have been around the block, software-wise, like “parallel computing”. In scientific or mathematical applications, it’s often true that pieces of the problem can be separated, run independently, and then combined. The Google MapReduce query processing from which the Hadoop model for big data emerged is an example of parallelizing query functions for data analysis.
Event-driven applications are hardly massive database queries, but they do have some interesting parallelism connections. If you have an event generated, say by an IoT sensor, there’s a good chance that the event is significant to multiple processes. A realistic event-driven system would trigger all the applications/components that were “registered” for the event, and when those completed they could be said to generate other events that would be similarly processed.
In a true event-driven system you can’t sequence events as much as contextualize them. Events generate other events, fill data fields, and trigger processes. The process triggers, like the processes in our functional example, are a matter of setting conditions associated with what the processes need before they run. Don’t ask for five fields, generate an event when you get each, and when they’re all in you do what you wanted to do with the data.
This is very much like parallel computing. You have this massive math formula, a simple example of which might be:
A = f(x(f(y))/f(z)
This breaks down into three separate processes. You need f(z), f(y), and f(x(f(y)). You can start on your f(z) and f(y) when convenient, and when you get f(y) and the value of x you can run that last term and solve for A. The coordination of what runs in parallel with what, and when, is very much like deciding what processes can be triggered in an event-driven system. Or to put it another way, you can take some of the HPC elements and apply them to events.
If you follow the link to their website above, then on to “Key Features” you find that besides the mandatory administrative features, the only other feature category is workflow. That’s what’s hard about event processing.
I’m not saying that big data or HPC is something Microsoft could just kiss off, but let’s face it, Windows and Microsoft are not the names that come unbidden to the lips of HPC planners. Might Microsoft want to change that? Perhaps, but is it possible that such an attempt would be just a massive diversion of resources? Would it make more sense to do the deal if there was something that could help Microsoft in the cloud market overall? I think so.
Even if we neglect the potential of IoT to generate massive numbers of events, I think that it’s clear from all the event-related features being added to the services of the big public cloud providers (Amazon, Google, and Microsoft) that these people think that events are going to be huge in the cloud of the future. I think, as I’ve said in other blogs, that events are going to be the cloud of the future, meaning that all the growth in revenue and applications will be from event-driven applications. I also think that over the next decade we’ll be transforming most of our current applications into event-driven form, making events the hottest thing in IT overall. Given that, would Microsoft buy somebody to get some special workflow skills applicable to all parallel applications?
In fact, any cloud application that is scalable at the component level could benefit from HPC-like workflow management. If I’ve got five copies of Component A because I have a lot of work for it, and ten of Component B for the same reason, how do I get work from an Instance of A to an Instance of B? How do I know when to spawn another instance of either? If I have a workflow that passes through a dozen components, all of which are potentially scalable, is the best way to divide work to do load-balancing for each component, or should I do “path-flow” selection that picks once up front? Do I really need to run the components in series anyway? You get the picture.
We’ve had many examples of parallel computing in the past, and they’ve proven collectively that you can harness distributed resources to solve complex problems if you can “parallelize” the problems. I think that the cloud providers have now found a way to parallelize at least the front-end part of most applications, and that many new applications like IoT are largely parallel already. If that’s true, we may see a lot of M&A driven by this same trend.