A Day in the Life

A day in my life. Thoughts on leadership, management, startups, technology, software, concurrent development, etc... Basically the stuff I think about from 10am to 6pm.

2/22/2006

Digipede: Distributing Excel Computations – Part 2

Yesterday I created a JobTemplate that moved template level files to the compute nodes. (The Worker workbook and the RunWorker script) Today I’m going to solve the problem of telling the Worker what to do. The value of a Worker is that it allows you to parallelize the execution of your spreadsheet. In the Digipede Network a Job is made up of Tasks and each Task sets up a parallelized version of your computation.

Usually each computation requires different “seed” data. With Excel we can use several different techniques to get the seed data from the Master to the Worker.

In Part I you saw code in the RunWorker script that put a value into a cell on a worksheet in the Worker workbook. I created a Digipede Parameter in the Master workbook to send in a seed value to the RunWorker script. I then wrote the value to the Worker workbook so that I could see that the RunWorker script executed. I can create as many Parameters as I want, but beyond a certain number things will get unwieldy. If you want to use Parameters with Excel you have no choice but to use some other application or script to launch the Worker workbook because Excel does not pass command-line parameters to the workbook.*

When you have large amounts of data to send to the Worker, you can use a seed file, such as another Excel workbook, or you can extract the data from a database. Since I already showed you how to send a Parameter to a Task, I’m now going to show you how to send a seed file. Accessing a database doesn’t require anything Digipede specific. You won’t be moving any files and the key(s) that you’ll need for your search can be sent in as a Parameter(s).

Updates to Excel Master with VBA code-behind: (New code bolded)

' Create a filedef for the Worker Excel file
Set fileDef = New Digipede_Framework.fileDef
fileDef.ID = 1
fileDef.remoteLocationId = 0
fileDef.LocalName = "Worker.xls"
fileDef.Name = "Worker.xls"
fileDef.Relevance = Digipede_Framework.Relevance_JobTemplate
jobTemplate.FileDefs.AddItem fileDef

' Create a filedef for the Data workbook
Dim dataWBFileDef As Integer
Dim strInputDataFile As String
strInputDataFile = "InputDataFile"

Set fileDef = New Digipede_Framework.fileDef
fileDef.ID = 2
dataWBFileDef = fileDef.ID
fileDef.RemoteLocationId = 0
fileDef.Name = strInputDataFile
fileDef.Relevance = Digipede_Framework.Relevance_InputPlaceholder
jobTemplate.FileDefs.AddItem fileDef

.
.
.
Dim strDataFile As String

' Create a job
For i = 1 To 1
Set oTask = New Digipede_Framework.Task

strDataFile = "Data" + Application.WorksheetFunction.Text(i, "00") + ".xls"

Set fileDef = New Digipede_Framework.fileDef
fileDef.ID = dataWBFileDef
fileDef.RemoteLocationId = 0
fileDef.LocalName = "Data.xls"
fileDef.RemoteName = strDataFile
fileDef.Name = strInputDataFile
fileDef.Relevance = Digipede_Framework.Relevance_Input
oTask.FileDefs.AddItem fileDef


Dim taskParameter As Digipede_Framework.parameter
Set taskParameter = New Digipede_Framework.parameter
taskParameter.Name = parameter.Name
taskParameter.Value = i
oTask.Parameters.AddItem taskParameter

mJob.Tasks.AddItem oTask
Next i


To move a Task level file you must create two FileDefs for it. The first FileDef
belongs to the JobTemplate and lets the JobTemplate know that there will be Task level files to move. The second FileDef belongs to the Task and identifies the file to move.

' JobTemplate level FileDef
Set fileDef = New Digipede_Framework.fileDef
fileDef.ID = 2
dataWBFileDef = fileDef.ID
fileDef.RemoteLocationId = 0
fileDef.Name = strInputDataFile
fileDef.Relevance = Digipede_Framework.Relevance_InputPlaceholder
jobTemplate.FileDefs.AddItem fileDef


Notice that the new JobTemplate FileDef has a Relevance of InputPlaceholder.
Whenever you create a JobTemplate FileDef with the Relevance of InputPlaceholder, the Task definition must contain an associated FileDef object.

' Task level FileDef
Set fileDef = New Digipede_Framework.fileDef
fileDef.ID = dataWBFileDef
fileDef.RemoteLocationId = 0
fileDef.LocalName = "Data.xls"
fileDef.RemoteName = strDataFile
fileDef.Name = strInputDataFile
fileDef.Relevance = Digipede_Framework.Relevance_Input
oTask.FileDefs.AddItem fileDef


I did a little trick in the creation of the Task’s FileDef in that I changed the
name of the data file. I told the Digipede Network that I wanted to move a file named “Data##.xls”, but when my file gets to the compute node I want it renamed to “Data.xls”. This makes it much easier for my Worker workbook because I don’t have to tell the Worker the name of the input data file.

Because template level files are expected to be the same for all the Tasks, you have the option to leave those files on the compute node so that you don’t have to move them again. This comes in handy when debugging and is one element of a technique I will discuss later that can save your network bandwidth. The behavior is a little different for Task level files. Task level files are expected to be different for each Task. Therefore they are deleted when a Task completes.

I have also added code to validate the JobTemplate before I submit it to the Digipede Network. This is a good code snippet to have. Because VBA doesn’t allow exceptions using Validate() is your best technique for identifying malformed JobTemplate problems.

' Validate the job before submitting to get validation information.
Dim valItems As Digipede_Framework.ValidationItemCollection
Dim valSeverity As Digipede_Framework.ValidationSeverity

valSeverity = jobTemplate.Validate(mJob, valItems)
If valSeverity = ValidationSeverity_Error Then
errMsg = "Validation Error Count: " & valItems.Count
For i = 1 To (valItems.Count)
errMsg = errMsg & "\r\n " & valItems.Item(i).Message
Next i
MsgBox errMsg
Exit Sub
End If

If valSeverity = ValidationSeverity_Warning Then
errMsg = "Validation Warning Count: " & valItems.Count
For i = 1 To (valItems.Count)
errMsg = errMsg & "\r\n " & valItems.Item(i).Message
Next i
MsgBox errMsg
Exit Sub
End If

Sheet1.Range("SubmissionTime") = Format(Time, "hh:mm:ss")

ThisWorkbook.mClient.SubmitJob poolId, jobTemplate, mJob,
Digipede_Framework.SubmissionOptions_None



At this point the input data file is being copied to the compute node where either
Worker or RunWorker can use it. Part 3 will explore a few techniques for initiating the computation in the Worker.

* As of Excel 2003

Essay Links:

Start
Automation
Part 1
Part 2
Part 3
Part 4
Part 5


Labels:

0 Comments:

Post a Comment

<< Home