Why You Should Write Programs to Write Programs to Write Programs For You

I don't know what your work environment looks like, but mine has a lot of spreadsheets. Spreadsheets seem to be the one user interface that everyone I need to deal with understands. These are business users, IT departments, investors, investigators, and a small army of researchers. If we need data from these people, we'll probably get it in a spreadsheet. If we need to give them data, they probably want it in a spreadsheet.

I've long been grateful for Spreadsheet::ParseExcel and Spreadsheet::WriteExcel. One of the most useful pieces of code I've written this year is a simple iterator which uses either a spreadsheet or CSV file as the data source and lets you read the data row by row, giving you a hash reference keyed on column names. If it weren't for ParseExcel and Text::CSV and friends, my job would be much more difficult.

Then my corresponding business person asked me to create a new report from an existing template. I thought seriously about how to reproduce the form of the report with WriteExcel for about 30 seconds, and then was doubly grateful for the existence of Spreadsheet::ParseExcel::SaveParser (even if its API is slightly different from that of WriteExcel—I will happily deal with that for the sake of not having to write it myself).

Then I realized that I had to write a lot of code to populate each individual cell.

The report calculated various values grouped by country. In other words, for each country the business user cares about, I had to count records which matched multiple criteria. She also wanted the ability to change the countries or rearrange them.

Here's where choosing the right data structure is important. I wanted to write code like this:

B2: count_all_people
B3: count_all_places
B4: count_all_things

... where the first token is the address of the cell in the spreadsheet and the second token is a method to get the value for that cell. That was easy enough to make into a data structure:

my @updates = (
    [ B2 => 'count_all_people' ],
    [ B3 => 'count_all_places' ],
    [ B4 => 'count_all_things' ],
);

... which I could iterate through with:

for my $update (@updates) {
    my ($cell, $method, @args) = @$update;
    my ($col, $row)            = cell_to_pos( $cell );

    my $value                  = $self->$method( @args );
    my $format                 = $sheet->get_cell( $row, $col )->{FormatNo};
    $sheet->AddCell( $row, $col, $value, $format );
}

... which reads pretty well.

Then I had to figure out how to look up countries by name:

[ G14 => 'count_people', { country => 'Angora' } ]

That's a little fragile, though; it hard codes both the cell where the value should go and the name of the associated country. The first time my colleague revised her spreadsheet to add a country, I was glad to find a better approach:

my $country = sub {
    my $cell        = shift;
    my ($col, $row) = cell_to_pos( $cell );

    return ( country => $sheet->get_cell( $row, $col )->value );
};

This function takes a cell's location, looks up the value of that cell, and returns a key/value pair of country and country name. All that I need to know now is the range of cells which contain country names (sources) and places to store calculated values (sinks):

map {
    [ "G$_" => 'count_people', { $country->( "E$_" ) } ],
    [ "H$_" => 'count_people', { $country->( "E$_" ), contacted => 1 },
    ],
} 14 .. 35

That map expression builds several entries in my data structure which refer to countries in the spreadsheet and look up the right values. As long as the range is correct, the spreadsheet will have the correct associations between countries and reported values.

While I admit the map expression is a lot more difficult to read than the data it builds would be, it's much easier to maintain. This is a tradeoff I'd make any time.

Keep this in mind, however: this is effectively a little programming language. Yes, it's just a data structure, but it's a data structure that controls the control flow of the language. It uses a higher order function, $country, to generate some of the values to this data structure (writing a little program) as well as a builtin operator (map) to generate more of the program. The little runloop which processes this data structure uses dynamic dispatch to produce the necessary data—and what you don't see is that the methods called use SQL::Abstract to build queries dynamically.

That's the reason you ought to study a higher order language like a Lisp or a Scheme, and the reason you need to know how compilers work. That's also the reason you deserve to understand various kinds of data structures, so that you can organize your programs in a such a way that doing what you want to do is the natural process of traversing a sensible data structure.

When you reach this level of problem, sometimes the solution isn't just writing a brute force list of steps the computer needs to execute from the top to the bottom. Sometimes the better solution is to describe your problem in terms of the data you have and the data you need and let the computer figure out how to do it, even if that means writing a program to write a program to write yet another program for you.

Why You Should Write Programs to Write Programs to Write Programs For You

Tags:

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry