I don't know what your work environment looks like, but mine has a lot of
spreadsheets. Spreadsheets seem to be the one user interface that everyone I
need to deal with understands. These are business users, IT departments,
investors, investigators, and a small army of researchers. If we need data from
these people, we'll probably get it in a spreadsheet. If we need to give them
data, they probably want it in a spreadsheet.
I've long been grateful for Spreadsheet::ParseExcel
and Spreadsheet::WriteExcel.
One of the most useful pieces of code I've written this year is a simple
iterator which uses either a spreadsheet or CSV file as the data source and
lets you read the data row by row, giving you a hash reference keyed on column
names. If it weren't for ParseExcel and Text::CSV and friends, my
job would be much more difficult.
Then my corresponding business person asked me to create a new report from
an existing template. I thought seriously about how to reproduce the form of
the report with WriteExcel for about 30 seconds, and then was doubly grateful
for the existence of Spreadsheet::ParseExcel::SaveParser
(even if its API is slightly different from that of WriteExcel—I will
happily deal with that for the sake of not having to write it myself).
Then I realized that I had to write a lot of code to populate each
individual cell.
The report calculated various values grouped by country. In other words, for
each country the business user cares about, I had to count records which
matched multiple criteria. She also wanted the ability to change the countries
or rearrange them.
Here's where choosing the right data structure is important. I wanted to
write code like this:
B2: count_all_people
B3: count_all_places
B4: count_all_things
... where the first token is the address of the cell in the spreadsheet and
the second token is a method to get the value for that cell. That was easy
enough to make into a data structure:
my @updates = (
[ B2 => 'count_all_people' ],
[ B3 => 'count_all_places' ],
[ B4 => 'count_all_things' ],
);
... which I could iterate through with:
for my $update (@updates) {
my ($cell, $method, @args) = @$update;
my ($col, $row) = cell_to_pos( $cell );
my $value = $self->$method( @args );
my $format = $sheet->get_cell( $row, $col )->{FormatNo};
$sheet->AddCell( $row, $col, $value, $format );
}
... which reads pretty well.
Then I had to figure out how to look up countries by name:
[ G14 => 'count_people', { country => 'Angora' } ]
That's a little fragile, though; it hard codes both the cell where the value
should go and the name of the associated country. The first time my colleague
revised her spreadsheet to add a country, I was glad to find a better
approach:
my $country = sub {
my $cell = shift;
my ($col, $row) = cell_to_pos( $cell );
return ( country => $sheet->get_cell( $row, $col )->value );
};
This function takes a cell's location, looks up the value of that cell, and
returns a key/value pair of country and country name. All that I need to know
now is the range of cells which contain country names (sources) and places to
store calculated values (sinks):
map {
[ "G$_" => 'count_people', { $country->( "E$_" ) } ],
[ "H$_" => 'count_people', { $country->( "E$_" ), contacted => 1 },
],
} 14 .. 35
That map
expression builds several entries in my data structure
which refer to countries in the spreadsheet and look up the right values. As
long as the range is correct, the spreadsheet will have the correct
associations between countries and reported values.
While I admit the map
expression is a lot more difficult to
read than the data it builds would be, it's much easier to maintain. This is a
tradeoff I'd make any time.
Keep this in mind, however: this is effectively a little programming
language. Yes, it's just a data structure, but it's a data structure that
controls the control flow of the language. It uses a higher order function,
$country
, to generate some of the values to this data structure
(writing a little program) as well as a builtin operator (map
) to
generate more of the program. The little runloop which processes this data
structure uses dynamic dispatch to produce the necessary data—and what
you don't see is that the methods called use SQL::Abstract to build
queries dynamically.
That's the reason you ought to study a higher order language like a
Lisp or a Scheme, and the reason you
need to know how compilers work. That's also the reason you deserve to
understand various kinds of data structures, so that you can organize your
programs in a such a way that doing what you want to do is the natural process
of traversing a sensible data structure.
When you reach this level of problem, sometimes the solution isn't just
writing a brute force list of steps the computer needs to execute from the top
to the bottom. Sometimes the better solution is to describe your problem in
terms of the data you have and the data you need and let the computer figure
out how to do it, even if that means writing a program to write a program to
write yet another program for you.