Digipede: Distributing Excel Computations
I put together a series of posts that show you how to grid-enable an Excel workbook using the Digipede Network. I selected an Excel pattern that is fairly common and easy to convert.
With this Excel pattern there is a computation that is run against n number of input files. Because the same computation is run against each input file, the computation is parallelizable and easy to grid-enable. Following the steps defined in each part, I show you how to grid-enable this Excel pattern.
Part 1 - Shows how to set up a Digipede Job in the Master workbook to distribute a Worker workbook and a launch script. Part 1 also defines a template launch script.
Part 2 - Shows how to add a Task level input file to the Job. One input file is distributed with each Task.
Part 3 - Shows you how to start the Worker workbooks computation via a VBA macro.
Part 4 - Shows you how to change the Job defined in the Master workbook to return a results workbook.
Part 5 - Shows you how to use a JobCompleted event to know that the Job is finished and you can do something with all the results files.
Excel Automation Considerations - You didn’t think Excel automation was easy did you? In this section I tell you what the problems are and how to work around them.
Once the workbook has been grid-enabled my model looks like this: (Remember that the Worker workbook will be running simultaneously on multiple compute nodes.)
Let’s consider some numbers
So let’s say you have 100 input files and that the computation takes 10 seconds to process one file. Running the unconverted workbook will take approximately 1000 seconds or 16.67 minutes.
With a grid-enabled Master and Worker workbook setup, multiple machines will process input files simultaneously, so your compute time can be reduced by increasing the number of machines working on the problem. Since the Team Edition comes with 5 agents (enabling five compute nodes), let’s say there are five machines working. Each machine will work on 20 files, at 10 seconds per computation. So with five compute nodes you can run through your input data set of 100 files in 200 seconds or 3.33 minutes. There is a little time overhead to move the input and results files, but that time is negligible when taken in with the over all time savings.
Now an important point about those five nodes. The machines do not have to be a dedicated cluster, they do not have to be dedicated to this job. You can run jobs on desktop computers that people are currently using. The jobs run in the background and the user won’t even realize it. You can grid-enable multiple applications and several can be running on your grid at the same time.
That's all folks
There are other Excel patterns that are well suited for grid computing. The Digipede SDK supports both .NET and COM APIs, so what can be done in VBA with COM can also be done in .NET. Imagine what you could build.