A Day in the Life

A day in my life. Thoughts on leadership, management, startups, technology, software, concurrent development, etc... Basically the stuff I think about from 10am to 6pm.

3/27/2006

March 26, 2006

Sometimes it just seems that my day fly’s away. Today I woke up around 9am, talked to the kids until they left for Fairyland, showered, fed the cat, started the laundry cycle (5 loads), picked up toys, vacuumed, coached a hockey session, trimmed some bushes, took the boys to dinner, got the boys ready for bed, fed the cat again, went to the grocery store, and finally time to sit and relax...at 9:30pm. I have NO idea how single parent families do it.

A couple interesting things today. My neighbor’s daughter had a Chinese engagement party this morning. It was really neat. I heard the drumming first, and then I peaked over the fence and realized something private was going on. So I went back into the house. The drumming went on for awhile and then they set off a bunch of fire crackers. Too bad the kids missed it. And just for the record...peeking over the fence only requires that I walk out on to my porch. I’m not THAT nosey.

Another neighbor had joined the kids and me for dinner, on the way home a street lamp blinked out. Our neighbor told us that the street lights blink out because the gases in the light get too hot. She said that it’s a natural cycling process. I tried searching for an online reference that would give me more information but I was unable to find anything. I thought it was interesting and if someone else runs across a reference it would be great if you could post it here.

Today was my last coaching session for the year; I’m also not skating out so no hockey for a few months. I applied to an MBA program last month. I don’t know if I’ll get in but I am retaking both Statistics and Calculus because if I do, I have to have both of those classes completed before the program starts. It’s been 20 years since I last took these classes. When I came to the realization that it had been so long, it was a little surprising. Okay, it was very surprising. Anyway, I had a really good time on the ice today and I’m going to miss it. Not just the hockey, but also the people. On the plus-side when I do go back there will be new faces.

Well, off to bed. Tomorrow will be another full day and I hope a day with a visit from my neighborhood termite exterminator extraordinaire and I really hope a completed second draft of the Excel white paper.

3/23/2006

Grid Computing: On the Move

Grid computing is not a panacea. In technology there is no such thing. What grid computing is is a new way to approach software problems. Can’t kick your transactions through your current systems fast enough. Break-up the transactions into transaction blocks and process them in parallel. Do you have computations that take forever? Parallelize them. Not all applications need the power of a super computer, but some do. And those solutions need that power bad.

It’s very exciting that Sun Grid is up and running and available to the public. Like any new paradigm shift, grid computing needs its early adopters. The early adopters increase the knowledge-base, prove and test the technology, and teach those who follow. The Sun Grid has got the technology community talking. Even though the talk today has primarily been about the DoS attack (Slashdot: Sun Grid DoS’d) this is still great for grid computing. With Sun Grid opening its service up to the public, people are going to play with it. Also good for grid computing. Way to go Sun and a big “Thank You” from this grid proponent.

Labels:

Termite Swarm

I’m trying to finish the first draft of an Excel tutorial. Writing this type of document is hard for me so I wanted to work some place quiet. I decided to work from home and as a result I’ve discovered I have termites. They’re swarming in my office and it’s a little distracting. Not as distracting as the servers at the office. Those are just chronically loud. With the termites I can forget about them for awhile and then one will fly past my face. Close.

Termites have swarmed in the office before and I had someone come out and spray. I thought it was taken care of...guess not. Bummer. I really hate bugs.

3/16/2006

Truth and Honesty

One of the things I love about this Internet phase is the commitment many people have to truth and honesty. It’s absolutely wonderful how quickly the Web 2.0 crowd can come together to help and support people in crisis (I’m thinking Katrina victims, Jill Carroll, the Tsunami victims, etc...), how quickly facts can be ferreted out, and false information debunked. But I’ve got a problem. When the Web 2.0 folks starting talking about technology, I believe, there is a slant not toward truth and honesty but towards prejudice and hate. What am I talking about? Let’s look at Linux vs. Windows. Both have their place in the grand software scheme. Both have their pros and cons. But instead of looking at the differences honestly and being able to admit that “Yes, Windows does have value for some folks” many of the Web 2.0 techies just bash it. There is value in both.

A similar thing happens when talking about languages and databases. Let’s stop this madness. There is NO technology that is the magic silver bullet that will save all of IT and solve every problem on the planet. There is NO company that is perfect and never stumbles. There is NO person who never makes a mistake. People build technology and companies, and people are fallible. That’s just how it is. Sure we want to make sure that large companies like Microsoft, Google, Ford, etc..., don’t step on the little guy. But let’s give these companies a big “Hell yea!” when they do something right.

It would be nice to see all those smart people being honest about the pros and cons of a technology. To look beyond the obvious surface reasons why they think one thing is better than another. To really look at what people need and why they make the choices they do. Let’s get a little more truth and honesty in the technology arena and then let’s use that honesty to move forward and solve problems. After all that’s what engineers do.

3/10/2006

Digipede: Distributing Excel Computations

I put together a series of posts that show you how to grid-enable an Excel workbook using the Digipede Network. I selected an Excel pattern that is fairly common and easy to convert.



With this Excel pattern there is a computation that is run against n number of input files. Because the same computation is run against each input file, the computation is parallelizable and easy to grid-enable. Following the steps defined in each part, I show you how to grid-enable this Excel pattern.

Part 1 - Shows how to set up a Digipede Job in the Master workbook to distribute a Worker workbook and a launch script. Part 1 also defines a template launch script.
Part 2 - Shows how to add a Task level input file to the Job. One input file is distributed with each Task.
Part 3 - Shows you how to start the Worker workbooks computation via a VBA macro.
Part 4 - Shows you how to change the Job defined in the Master workbook to return a results workbook.
Part 5 - Shows you how to use a JobCompleted event to know that the Job is finished and you can do something with all the results files.
Excel Automation Considerations - You didn’t think Excel automation was easy did you? In this section I tell you what the problems are and how to work around them.

Once the workbook has been grid-enabled my model looks like this: (Remember that the Worker workbook will be running simultaneously on multiple compute nodes.)



Let’s consider some numbers
So let’s say you have 100 input files and that the computation takes 10 seconds to process one file. Running the unconverted workbook will take approximately 1000 seconds or 16.67 minutes.

With a grid-enabled Master and Worker workbook setup, multiple machines will process input files simultaneously, so your compute time can be reduced by increasing the number of machines working on the problem. Since the Team Edition comes with 5 agents (enabling five compute nodes), let’s say there are five machines working. Each machine will work on 20 files, at 10 seconds per computation. So with five compute nodes you can run through your input data set of 100 files in 200 seconds or 3.33 minutes. There is a little time overhead to move the input and results files, but that time is negligible when taken in with the over all time savings.

Now an important point about those five nodes. The machines do not have to be a dedicated cluster, they do not have to be dedicated to this job. You can run jobs on desktop computers that people are currently using. The jobs run in the background and the user won’t even realize it. You can grid-enable multiple applications and several can be running on your grid at the same time.

That's all folks
There are other Excel patterns that are well suited for grid computing. The Digipede SDK supports both .NET and COM APIs, so what can be done in VBA with COM can also be done in .NET. Imagine what you could build.

Labels:

Digipede: Distributing Excel Computations - Excel Automation Considerations

Excel assumes a GUI so automation (which is GUI-less) can be a little tricky. I'm going to take you through a few areas of consideration in the automation process.

Microsoft has a document available called "Considerations for server-side Automation of Office" which explains in detail how Microsoft does not recommend running automated Excel objects on remote machines. Please take the time to read the Microsoft document because it's important for you to understand the technological limitations of Excel. Within the Microsoft document is a section called Problems using Automation of Office server-side, I will address each of the issues and how you can work around them on the Digipede Network. I've also encountered other automation problems and I've broken all the issues up into three main groups: Administration Issues, Worker Workbook Issues, and Debugging Tips and Tricks.

Administration Issues

User Identity
Excel is designed to run in an interactive manner. So Excel makes assumptions as to what directories and security configurations exist when it is launched. Running Excel as an automated COM server does not guarantee that the needed settings are there. You will have to configure each compute node to satisfy Excel's needs. We have identified two areas:
  1. Excel requires a valid user identity because it needs the associated profile. When a domain user logs into a machine the operating system automatically creates directories and settings that Excel uses. To solve this problem on the Digipede Network, install each Digipede Agent with a valid user identity. During the Digipede Agent installation on the 'Logon Information' page select 'Specific User' and enter a valid user login for the machine. Make sure that the user identity has logged into the compute node at some time so that the default profile information exists.

  2. To launch an automated Excel instance from the Digipede Agent you will need to make sure that the specified user identity has the correct COM security settings. Follow these steps:

    1. Open the 'Component Services' from within the 'Administrative Tools' or from the command-line by typing dcomcnfg.exe.

    2. Expand the 'Component Services' branch so that you can see 'My Computer'.

    3. Right-click on 'My Computer' and select 'Properties'.

    4. On the 'My Computer Properties' dialog, select the 'COM Security' page.

    5. In the 'Launch and Activation Permissions' group box, select the 'Edit Default' button.

    6. If your user id is not listed in the 'Group or user names' list control, select 'Add' and add it.

    7. With your user id selected in 'Group or user names', select the checkbox 'Local Launch' and 'Local Activation'.

    8. Select 'OK', select 'OK', and then close 'Component Services'.


Resiliency and Stability
During installation Microsoft Office products have the option of performing "install on first use" for some components. If an uninstalled component is needed by an Office instance an installation wizard for the missing component is started. This causes a problem for any automated server-side Office instances because now there is a UI that can't be dealt with programmatically. Developers will need to work with their IT departments to control this problem.

If a Digipede job seems to hang, I recommend running the Worker workbook on the compute node to make sure that this problem doesn't exist. If it does exist get the required components installed on the compute node. You will see similar behavior if you're Worker workbook requires an add-in. Make sure that everything the Worker workbook needs is installed on the compute node.

If your Excel distribution pattern creates an Excel object, it is recommended that all the compute nodes in your target pool run the same version of Excel.

Server-Side Security
There are several levels to security. The Digipede Network will not accept a work request that has not properly logged in. This ensures that only users who have been given access to the Digipede Network can request work on it.

Code level security is the responsibility of the developers. Each developer must ensure that their code is secure and doesn't create problems on the compute node or network.

You will need to configure Excel so that the Macro security level is low, to avoid the security dialog.

Worker Workbook Issues

NOTHING that is written to run on the compute nodes can have a GUI. Not Excel, not the run script, not your code. Dialogs will lock the compute node up while the software waits for a user to hit a button.

Interactivity with the Desktop
Excel was designed around the user interface so you have to programmatically disable all UI functionality. The Microsoft article "How To Dismiss a Dialog Box Displayed by an Office Application with Visual Basic" discusses various messages that may come up and how to disable them. (If you are distributing a Worker DLL, it is also important for the developer to remember to not put message boxes into the code contained in DoWork().)

Sample Visual Basic 6 code to disable Excel GUI:
    Dim myExcelApp As Excel.Application

myExcelApp.DisplayAlerts = False
myExcelApp.AskToUpdateLinks = False
myExcelApp.AlertBeforeOverwriting = False
myExcelApp.FeatureInstall = msoFeatureInstallNone

Disabling display of message boxes will avoid any user interaction problems. To be courteous to the user of the desktop where the Excel work is being done, it is recommended that you first save the current settings, disable everything, and then reset the values on exit.
Update: the above statement has been found to be wrong because each instance of Excel running as a COM server is running in it's own process space and as a result Application method calls do NOT affect other instances. The problems with Application method calls affecting other workbooks can happen when opening multiple workbooks in the same Excel instance.

Reentrancy and Scalability
Excel is an application that can be slow to close down and if you haven't released all the objects it doesn't close down at all. It's not enough to make a call to Application.Quit() you must also make sure to release all Excel objects. And if you are using managed code (ie. .NET) then you will also need to perform garbage collection...twice.

Sample C# ShutDown() method:
private void ShutDownExcel() {

if (mExcelApp != null) {

mExcelApp.DisplayAlerts = true;
mExcelApp.Quit();

System.Runtime.InteropServices.Marshal.ReleaseComObject(mExcelApp);
mExcelApp = null;
}

// Clean up memory so Excel can shut down.
GC.Collect();
GC.WaitForPendingFinalizers();

// The GC needs to be called twice in order to get the
// Finalizers called - the first time in, it simply makes
// a list of what is to be finalized, the second time in,
// it actually the finalizing. Only then will the
// object do its automatic ReleaseComObject.
GC.Collect();
GC.WaitForPendingFinalizers();
}

If you are not cleaning up the Excel objects by releasing them and performing garbage collection, multiple instances of Excel will be left open. This is visible by using Task Manager and sorting by Image Name. Excel does not support an unlimited number of active Excel instances so the compute node will eventually be unable to respond to Excel requests. As the developer you MUST confirm that your Worker workbook closes properly and that Excel is exiting cleanly. More information can be found in the Microsoft MSDN article "Office application does not quite after automation from Visual Studio .NET client".

One other factor to consider is that it is possible to have an active user running Excel. You must be careful not to affect the user's work.

  • Do not make any Application level Calculate() calls as this may cause calculations to start on the active user’s workbook.

  • Do not shut down the user's instance of Excel, so be very careful about calling Application.Quit().


  • Update: the above statement has been found to be wrong.

    Strange Excel Lessons
    Make sure to use fully qualified paths when referencing anything in Excel. Excel apparently doesn't use the current directory as the working directory so you need to set the Application.DefaultPath to control where files are saved. You will also need to open any files and run any macros by using a fully qualified path, otherwise Excel can't find them. It is safest to use fully qualified paths whenever referencing any Excel objects.

    Always control the Worker workbook through a script or an assembly. Trying to launch Excel directly makes it VERY hard to shutdown when it's finished. It's just easier to debug and to control using a launch mechanism. Make sure that the script engine is non-gui. (So use cscript instead of wscript.)

    Debugging Tips And Tricks

    Make sure that Excel is closing down properly on the compute node by looking at the Task Manager process list. When Excel is run automated it launches as not visible by default. This means you need to look on the Processes page to determine if you are orphaning an Excel process. You can quickly find the Excel process by sorting on the Image Name. You should kill all orphaned Excel processes.

    I used a few different techniques in the VBS script to determine what was going on:

    1. Create a log file and write to it to determine what is happening in the automated session.
      Dim fso, f1
      Dim logFileName
      logFileName = "mylog.txt"
      Set fso = CreateObject("Scripting.FileSystemObject")
      Set f1 = fso.OpenTextFile(logFileName , 8, True)
      f1.WriteLine("MyScript: Start")
      f1.Close

    2. If you create a results file you make want to save it before you exit the script, this lets you see exactly what is being generated by the Worker workbook before the Digipede Network moves it.
      ' Make a copy of the results file
      Set objFSO = CreateObject("Scripting.FileSystemObject")
      Dim strResultsFile
      strResultsFile = strPath & "\Results" & TaskId & ".xls"
      strSourceFile = strPath & "\Results.xls"
      objFSO.CopyFile strSourceFile , strResultsFile

    3. Write your code to catch errors, don't let them get passed out. You can then update the log file with the error code.
      Dim strMacroName
      strMacroName = "'" & strPath & "\Worker" & "!Sheet1.StartCmdBtn_Click"
      on error resume next
      myExcelWorker.Run strMacroName
      if err.number <> 0 Then
      ' Error occurred - just close it down.
      End If
      err.clear
      on error goto 0

    I also created a log file for the Worker workbook aid in debugging.

    Public Sub LogMessage(ByVal strMessage As String)

    '------------------------------------------------
    ' open existing file for appending,
    ' if the file does not exist a new one is created
    '-------------------------------------------------
    Dim strFilePath As String
    strFilePath = Application.DefaultFilePath & "\mylog.txt"

    Dim fileNumber As Integer
    fileNumber = FreeFile()

    Open strFilePath For Append As #fileNumber ' Create file name.
    Write #fileNumber, strMessage ' Output text.
    Close #fileNumber ' Close file.
    End Sub

    Conclusion

    Make sure that the run script and Worker workbook actually run on a compute node before trying an all out distribution test. Run it once. Debug it until it runs, then expand. Remember Henry David Thoreau's important contribution to the software industry, "Simplify, simplify, simplify."

    Essay Links:

    Start
    Automation
    Part 1
    Part 2
    Part 3
    Part 4
    Part 5

    Labels:

    3/05/2006

    Links: March 5, 2006

    Are you a Day or Night Programmer? Good observation.

    An argument for rewarding laziness.

    This is a must view!
    I love Seth Godin’s books and here is a chance to hear him speak to Google

    3/04/2006

    Valley Doings: Ookles Barbeque

    I was fortunate to be invited to a barbeque today for Ookles. I got an overview of what Scott and his team is building and I’m looking forward to playing around with it. If you want to find out what they’re doing you need to keep an eye on Scott’s blog.

    One of the things I really enjoyed was listening to the conversations going on between other people. Jared Kim talked briefly about selling his China based online game company; Jared, his sister Julia, and Marc Canter discussed the Korean Web 2.0 scene. Matt, whose last name I don’t know, talked a little about his travels. I talked to Marc’s wife Lisa for a little while, her blog can be found here. (Thanks to Demitrious and Nicki for their hospitality.) All in all it was a very amazing group of people.

    I absolutely love hearing people discuss their dreams and their journeys. I love finding out about new companies and technologies. So even though I don’t feel like I contributed much to the conversations, except perhaps to the Ultima Online conversation, I really enjoyed the party. The world is such a fascinating place. Since I’m fortunate to be involved in the alpha I’m looking forward to getting to know some of the other folks better.

    So good luck Scott and let the testing begin.

    Technorati tags: ,

    3/03/2006

    Grid Computing: Ahhh, But We Do Still Need Servers

    Nicholas Carr wrote a post that argues that with the introduction of grid and utility computing, the server market will decline. In the post Mr. Carr states,

    In this scenario, the core unit of business computing would not be small, inflexible servers but rather large, flexible computing clusters or grids. These clusters in turn would be built not from traditional branded servers but from cheap, commodity subcomponents - chips, boards, drives, power supplies, and so on - that the grid operators would assemble into tightly networked physical or virtual machines. Many of the functions and features built into today's branded servers would be taken over by the software running the cluster.
    Dan Ciruli responds by pointing out,

    Computes just aren't the same. Computes look different on different operating systems. Not all software runs on all operating systems. Different people prefer different toolsets, and they always will. Some OSs are better for some things than others, and people choose the appropriate OSs for them. Yes, we've all read about "write once, run everywhere" software--but a small minority of software actually runs that way. OSs are different, and they will continue to be different. People will continue to write software that takes advantage of particular OSs.
    But I don’t think Dan went far enough. I agree with Dan and I wanted to add a few points as to why.

    Do you remember these famous quotes?

  • ”I think there is a world market for maybe five computers.” – Thomas Watson, IBM, 1943

  • ”There is no reason for anyone would want a computer in their home.” – Ken Olson, Digital Equipment Corp, 1977

  • “640k ought to be enough for anybody.” - Bill Gates, Microsoft, 1981


  • Our human nature forces us to push new technology. To see what new things we can build with it and do with it. (We are curious creatures.) Grid computing allows a company to get more work out of its servers and desktops. If you think of grid computing using current technology paradigms, I agree that it looks like there will be a decline in the server market. But history has proven that when people are given more computing power they find new things to do. What we are really going to see with the increased acceptance of grid computing is an increased need for computing power. Just image the new technologies and applications we will invent with the increased availability of affordable super computing power. Look at what Google can do with their “super computer.” And don’t think for a minute that companies don’t want the opportunities currently available to Google. When we, as the technology sector, accept the grid paradigm shift we will see a surge in the server market, not a decline.

    In addition there will continue to be a need for multiprocessor boxes because some computations are better suited to a threading model than a grid model. Grid and threading technologies are complimentary. The Digipede grid computing model offers a methodology for simplifying some threading cases, but if your application uses micro-threads in a thread pool you will still want a multiprocessor box. And as Dan pointed out there are other aspects to the grid solution; like where your data lives and what OSs and applications your team is already trained in.

    Technology acceptance is organic and I don’t think Geoffrey A. Moore was that far off when he wrote about the “Technology Adoption Life Cycle.” We build on what we know and on what makes us feel safe. That’s how we’re built. There may be a time when servers will become completely commoditized but it won’t be grid computing that makes that happen because the process has already started and grid computing isn’t widely accepted. The only question concerning servers is which company will be able to make that shift from high-end servers to commoditized servers.

    Server commoditization coupled with grid computing makes the next 10 years in tech look as exciting as the PC revolution. It’s a good time to be an engineer.

    Labels:

    3/02/2006

    Links: March 2, 2006

  • Small business resource from CMP here.

  • How to write a scientific paper here.

  • Another interesting vacation destination here.

  • If you want to be a good engineer read this article and follow the suggestions.

  • Interesting article on software pricing here.

  • If you’ve got boys (children) you should read this article. I found it enlightening.
  • 3/01/2006

    Digipede: Distributing Excel Computations – Part 5

    Well, we’re in the home stretch now. We have our results but how do we know when the Tasks and Job are complete? The Digipede Network will tell you. In Part 1 I didn’t show you how mJob was declared. It’s declared as a global like this:

    Dim WithEvents mJob As Digipede_Framework.job
    .
    .
    Dim mFileResultsLoc As String
    Dim mResultsCount As Integer

    Using WithEvents tells the VBA interpreter to expect the object to receive
    events. Digipede will send events to the Master workbook for the mJob object.

    Update to Excel Master with VBA code-behind:
    (Add this code)

    ' Handle the completion of a job
    Private Sub mJob_JobFinished(ByVal sender As Variant,
    ByVal e As Digipede_Framework.JobStatusEventArgs)

    Sheet1.Range("FinishedTime") = Format(Time, "hh:mm:ss")

    Dim errMsg As Variant
    If e.Error Is Nothing Then
    If e.JobStatus = Digipede_Framework.JobStatus_Completed Then
    ' Do Work Here – Create a report file
    Dim oResultsBook As Workbook
    Dim strFilePath As String

    Dim iIndex As Long
    For iIndex = 1 To mResultsCount
    strFilePath = mFileResultsLoc & "\results" &
    Application.WorksheetFunction.Text(iIndex, "00") & ".xls"
    Set oResultsBook = Application.Workbooks.Open(strFilePath)

    'Do analysis and create report

    'Clean up
    oResultsBook.Close
    Set oResultsBook = Nothing
    Next i1

    Sheet1.Range("Messages") = "Success"
    Else
    errMsg = "Job completed with status: " & e.JobStatus
    Sheet1.Range("Messages") = errMsg
    End If
    Else
    errMsg = "JobCompleted Error: " & e.Error.Message
    Sheet1.Range("Messages") = errMsg
    End If
    End Sub

    ' Handle the successful completion of a task
    Private Sub mJob_TaskFinished(ByVal sender As Variant,
    ByVal e As Digipede_Framework.TaskStatusEventArgs)

    Dim errMsg As Variant
    If e.Error Is Nothing Then

    If e.TaskStatusSummary.FailureMessage = "" Then
    mResultsCount = mResultsCount + 1
    Else
    errMsg = "Task " & e.TaskId & " exited with error: " &
    e.TaskStatusSummary.FailureMessage
    Sheet1.Range("Messages") = errMsg
    End If
    Else
    errMsg = "Task " & e.TaskId & " exited with error: " & e.Error.Message
    Sheet1.Range("Messages") = errMsg
    End If
    End Sub


    You, the programmer, can process each Results workbook in mJob_TaskCompleted()
    as it is finished and/or you can process the workbooks at the end of the Job in mJob_JobFinished().

    I have just Digipede-enabled a workbook that uses a data workbook to initialize calculations. The calculations produce results workbooks for each calculation instance. The Master workbook then processes the results workbooks into a report workbook.

    So yes, you can distribute and run Excel on a grid. There is no silver bullet; you will have to do a little work. But it’s only a little work and the reduced execution time will likely more than make up for it. There are two things left to do to complete this exercise: 1) Write up all the debugging tricks I used and tell you about the strange Excel quirks I discovered. 2) Create a consolidation post with some pictures to show you what happened and links to all the parts.

    Essay Links:

    Start
    Automation
    Part 1
    Part 2
    Part 3
    Part 4
    Part 5

    Labels: