ePub and Kindle versions of Modern Perl: the
Book and Liftoff:
Launching Agile Teams & Projects are coming soon. We've been pulling
all of the pieces of Pod::PseudoPod::DOM
and Pod::PseudoPod::Book
together so that any corpus written in PseudoPod can become a well-formatted
PDF (of multiple page sizes), an ePub book, or attractive HTML.
Part of that process required an improvement to the indexer. (It's no secret
that the Kindle index for the first edition of Modern Perl wasn't up to our
standards.)
Part of that process means writing good tests for indexes.
I've long believed that the best way to test code is to write your test code
as realistically as possible. This is a great way to exercise the code as real
people will use it, and it gives you immediate feedback on what's tedious and
awkward and easy to misuse.
Sometimes I still get the tests wrong though.
Consider: a PseudoPod
document uses the X<>
tags for indexing. A test of the
indexer must parse a document containing several such tags, then test that the
output or the internal tree which represents a parsed document contains
appropriate index nodes.
In short, a document containing X<ice cream sandwich>
should produce an entry for ice cream sandwiches in the index.
For my first testing approach, I tried to create the appropriate index nodes
and their children to run through the index emitter. That experiment lasted ten
minutes, with five of those minutes spent taking a break to rethink things.
Here's a secret about tests that people often don't realize: it doesn't
really matter whether you test units as units or the system as a whole if you
test everything you care about and your tests run fast enough.
That first approach failed because I cared too much about the details of
how the indexer worked than about what it does. The right way to
approach something like this is to figure out the characteristics of the code
you want to exercise, then figure out the test data, then decide how to test
for the results you want.
The basic tests must exercise basic indexed terms, the alphabetization and
representation of those terms, subindexed terms, multiple instances of a single
term, and the relationship of subindexed terms and top-level terms.
That sounds more complicated than it is, which led me to believe that there
was a simple way to represent the data. Then, of course, I felt both a little
silly and a lot relieved when I had the epiphany of using the document API to
produce the index:
sub make_index_nodes
{
my $doc = qq|=head0 My Document\n\n|;
my $count = 0;
for my $tag (@_)
{
$doc .= qq|=head1 Index Element $count\n\n$tag\n\n|;
$count++;
}
my $parser = Pod::PseudoPod::DOM->new(
formatter_role => 'Pod::PseudoPod::DOM::Role::XHTML',
filename => 'dummy_file.html',
);
my $dom = $parser->parse_string_document( $doc )->get_document;
my $index = Pod::PseudoPod::DOM::Index->new;
$index->add_entry( $_ ) for $dom->get_index_entries;
return $index;
}
That code looks more complex than it is, and could get simpler soon. (Next
post: write only the code you need, refactor only when you need to refactor.)
My tests use it like:
sub test_simple_index
{
my $index = make_index_nodes( 'X<some entry>' );
like $index->emit_index, qr!<h2>S</h2>!,
'index should contain top-level key for all entries';
}
... such that make_index_nodes()
takes a list of tags,
constructs a valid document, extracts an index, and lets the test functions do
what they will with it. All the test functions have to know is how to send
index tags to the test function and what to get out of the returned index
object. (My chances of getting that wrong in subsequent tests are low.)
If you take any lessons from this, think of three things. First, if your
tests are difficult to write, you might not understand what they need to do
fully yet. What are you really testing and why? Why are you testing at
the level you're testing? What do you expect to explore and how?
Second, use your API the real way as much as possible. Don't poke around in
private elements or throw mock objects at the problem unless that's really the
easiest way to test something tricky. Input goes in. Output comes out. Treat
your code as a black box as much as possible.
Finally, reduce duplication in your test code as much as is practical.
Simplicity is better than clever abstraction, but a function here and there or
data-driven code can reduce the likelihood of bugs in a dramatic fashion. Every
time I figure out a way to simplify my tests, they get easier to maintain and
extend and my code quality improves.
That's the real goal, after all: making great code to solve real problems.