March 2012 Archives

Bulk Orders for User Groups

By chromatic on March 28, 2012 5:42 PM

Our goal at Onyx Neon has always been to publish great books that real people ought to (and want to) read.

We're fortunate to have identified a couple of trends whose times have come, in Modern Perl: the book and recently in Liftoff: Launching Agile Teams & Projects. Yet one of the risks of identifying and publishing about a trend early is that sometimes you have to wait for the broader market to catch up to the early adopters.

Sometimes that means the best marketing strategy (in the case of Modern Perl: the book) is to give away electronic versions of the book (though if you want to purchase it from a brick and mortar store or from an electronic bookstore available on your device, be our guest). Other times, we rely on you, the people who've read it and know one or five or ten other people who'd benefit from reading it.

Maybe you're like me, in that you learn better with something tactile you can hold and touch and remember exactly where on the page you read that one little fact you need right now.

Either way, we know that you—our most devoted readers—are our best source of ideas (what do you want to read about?), our best source of feedback (what did we do right? what should we do better?), and our best evangelists. You've been great about telling others about us.

We'd like to expand that, in two ways.

First, while we can't lower our prices on electronic versions any further (free is free!), we can offer a discount for bulk orders. Any group that can put together an order for at least five books (any of our books!) will get an automatic 35% discount from the cover price. (Modern Perl for $23, Liftoff for $19, and so on).

Second, we'll give a better discount to any group that wants to take a box of books to a conference, seminar, or wherever. We've done this in the US (at Scale 9x in 2011) and in Europe, and it's worked well. The idea at Scale was to show how Perl 5 is vibrant and active and exciting—and to generate a little revenue for the Los Angeles Perl monger groups. (They made a little profit on each book; totally fine with us.)

If you're interested in either program, mail us at orders@onyxneon.com and we'll figure out what works best for both you and us.

Consistency, CPAN, and Captiousness

By chromatic on March 27, 2012 12:15 PM | 4 Comments

Once in a while, an innocent looking change to bleadperl (the version of Perl 5 under current development) causes changes which ripple through the CPAN. As the CPAN is a graph of dependencies, any such change which causes tests to fail could have dramatic effects on user applications.

(Once I almost released a change which would have made half of CPAN uninstallable. Then Schwern slapped my hand metaphorically.)

Sometimes the fault isn't in bleadperl.

Consider RT #106538, which laments the inconsistency between the output of the builtin die and Carp's croak(). From the bug report:

$ perl -e 'die'
Died at -e line 1.
$ perl -MCarp -e 'croak Died'
Died at -e line 1

If your eyes don't immediately catch the missing period, you're in good company.

Consistency suggests that the output of both error messages should be identical. After all, Carp exists to enhance Perl 5's core exception mechanisms.

Yet as you might expect, changing error messages breaks buggy code that attempts to parse unstructured text too strictly. Adding a single dot to an error message makes several important CPAN modules fail their tests.

I can't blame CPAN developers for performing exact matches against string error messages—it's quick and easy and unlikely to change, and it's reasonably easy to fix... until you get a fix that looks like:

$pattern .= $Carp::VERSION gt "1.24" ? "." :"";

... which knows that the period is present but persist in hard-coding specific formatting details of the output.

The right solution, of course, is to stop emitting only unstructured text (from the core side) and to stop testing the exact details of unstructured text (on the CPAN side). The interim solution is to stop testing the exact details of unstructured text on the CPAN side.

Despite all of the effort around what could have been a simple change, the entire process of developing Perl 5 is a huge improvement over its past. Making this change and identifying its effects was reasonably easy, when you consider the size of the task and its consequences. Sure, the entire Perl community has to pay off some of the technical debt for well-established choices and design decisions that turned out to have been less than perfect, but this is a good opportunity to see how much better things are than they were even five years ago and to reflect on how to improve processes and tools to make them better as early as next year.

Do keep in mind, however, that if you're performing exact string matches against the results of things the core has never promised not to change, you are writing risky code.

Inadvertent Inconsistencies: each versus Autoderef

By chromatic on March 23, 2012 9:33 AM | 2 Comments

Perl 5.12 allows you to use each, keys, and values on arrays. Perl 5.14 will automatically dereference references used as operands to the aggregate operators. The combination produces a worrisome inconsistency.

Perl 5.12's each had no obvious inconsistency problem; you had to write each @$kittens or each @{ $kittens } when using an array reference as its operand. Sure, you could write each %{ $kittens } when $kittens holds an array reference, but you'll get an error when the program runs like you would for dereferencing the wrong type of reference anyway.

With Perl 5.14, you have the curious situation where it's possible to give one of these polymorphic aggregate operators an operand which can behave both as a hash and as an array. By overloading an object, you can make it respond to array operations, or hash operations, or both.

If you use one of these objects as the operand to each, keys, or values, what is Perl to do? It's easy to test:

use Modern::Perl;

package DestroyerOfHope;

use overload
    '%{}' => \&gethash,
    '@{}' => \&getarray;

sub new
{
    my $self = shift;
    bless [qw( I am an array )], $self;
}

sub gethash  { { I => 'hash' } }
sub getarray { $_[0] }

package main;

my $d = DestroyerOfHope->new;
say each $d;

As of Perl 5.14, you get a runtime error "Type of argument to each on reference must be unblessed hashref or arrayref...". (The rationale was partly "Uh oh, this could go wrong!" and partly "Why would you want to iterate over something blessed?" The latter seems to me to ignore the fact that blessing is the only way to produce this kind of desirable overloading, but that's an argument for another time.)

While that decision certainly closes the door on this type of error, it's hardly the only way to solve this inconsistency. I see five other options:

Forbid autodereferencing on operands with any overloading
Always choose one overloading over the other (array always wins! hash always wins!), preferably producing a run-time warning
Forbid autodereferencing on operands with both types of overloading, giving a run-time error
Forbid autodereferencing with each, keys, and values
Revert the polymorphism of each, keys, and values

Keeping the existing behavior is probably the easiest, but it has two problems. First, it's inconsistent with Perl's nature. Sure, Perl deserves opaque objects, but what we have now are blessed references. Why are some references autodereferenceable and others not (especially in the presence of overload? Second, the existing behavior papers over a real problem. The interaction of these two features is inconsistent because one of the features ignores a longstanding design principle of Perl.

The real problem was making each, keys, and values work on arrays as well as hashes.

I understand the desire to make this feature work. It's easy to say "I want something like each that works on arrays!" The obvious next step is to expand that feature to include other hash aggregate operators. (The pursuit and implementation of a small consistency is easy. The pursuit and implementation of a language-wide consistency is very difficult.)

It's also much easier to hang new behavior off of existing keywords than it is both to find the right new keyword and to add a new keyword (adding new keywords is a perilous process). Would you want to type while (my ($index, $value) = arrayeach $kittens) { ... } every time you wanted to iterate over an array and get its index and value? Probably me neither.

Yet the problem remains. By making each, keys, and values polymorphic with respect to the types of their operands, Perl 5 has removed its ability to provide greater consistency across the language. (It's not just for the compiler; it's for people reading the code.)

The purest response, from the point of view of language design, is to deprecate the use of hash aggregates on anything but hashes and to find new keywords to perform the same functions on arrays. Enabling the feature set of Perl 5.12 or Perl 5.14 (or, by now, Perl 5.16) could re-enable this polymorphic behavior, but p5p could contain the damage to those releases alone and provide better options in the future.

The practical response is to acknowledge yet another wart on the language and keep the existing warning.

In user code, the best option is probably to avoid autodereferencing altogether, even as tempting as it seems. (This is a controversial statement, but I believe it's probably better to avoid the temptation to use a feature when the human brain's desire for pattern recognition and consistency may lead you down a path to using the inconsistent operators, and then where will you be?)

What's the solution in the future to avoid further inconsistencies like this? Always hew to Perl 5's fundamental principles. (Note that the biggest problem with the controversial and soon-to-be-bowdlerized smartmatch operator is that it also is a polymorphic operator and no one can memorize exactly what it does in every common situation, let alone every edge case.)

Inadvertent Inconsistencies: Aggregate Autoderef in 5.14

By chromatic on March 21, 2012 12:18 PM

Perl 5's dereferencing syntax has always been ugly. It's perhaps the ugliest part of the language's syntax. (Some people think regular expressions are ugly. They are, but they have a long history of being ugly. You don't blame the human knee for being unattractive when it does what it does as effectively as it does, but you also don't reproduce its ugliness when you design an entirely new lifeform.)

Perl 5's aggregate operators (such as push and shift) have traditionally required dereferencing when used on aggregate operands. In other words, through most of Perl 5's lifespan you had to write:

my $diversions = [ 'Halo', 'LoTR Pinball', 'Cat Herding', 'Mario Galaxy' ];

my $evening_event = shift @$diversions;
# or shift @{ $diversions }

The same goes for hash operators:

my $pet_nicknames =
{
    Lucky  => 'Stinkerbell',
    Rodney => 'Robot Parade',
    Choco  => 'Destructo Junior',
};

my @nicknames = values %$pet_nicknames;
# or values %{ $pet_nicknames }

Although Perl can warn you when you mention the wrong variable if you use first-class aggregates (writing keys %not_a_hash when there's no hash called %not_a_hash in scope, for example), you have no such protection at the point of compilation when you dereference aggregate references. If $pet_nicknames turns out not to be a hash, you get a runtime error.

That's all well and good; we've lived with that through Perl 5's lifespan and we know how to deal with that. Keep that in mind, though.

Perl 5.14 added a feature by which Perl will do what you mean when you provide a scalar as the operand to one of these aggregate operators. In other words:

my @nicknames = values $pet_nicknames;
my $evening_event = shift $diversions;

Because it's unambiguous that shift must operate on an array, Perl will happily now dereference $diversions as if it were an array. If it isn't an array, you get the same error that you'd have received if you'd written @{ $diversions } or @$diversions on something that isn't an array.

Removing extraneous superfluous bletcherous syntax is generally a good thing, if it's truly unnecessary and unambiguous. In the case of array operators, it is.

Then we come to hashes.

As of Perl 5.12, the hash operators work also on arrays. Again, that's usually okay. If you write each $diversions or values $diversions or even keys $diversions (though why would you do either of the latter? rhetoric parallelism in Perl poetry?), you'll get something sensible back...

... except that you don't get the protection you'd get if you'd written %{ $diversions } or %$diversions when $diversions is an array reference.

Obviously making shift and push autodereference aggregates implies making the other array operators autodereference aggregates, and making the array operators autodereference aggregate implies making the hash operators autodereference aggregates. Consistency is a good thing.

Yet the combination of these two features in two Perl releases has produced an inadvertent inconsistency such that using these features together is a little bit less safe than not using them.

This isn't the worst of it; the next post will explain how things can go worse, how to avoid these problems in your code, and how they could have been avoided at the design and implementation levels—and why they weren't.

Inadvertent Inconsistencies: each in Perl 5.12

By chromatic on March 19, 2012 9:37 AM

The explanation of changes in Perl 5.12 includes an innocent looking entry. In its entirety, it reads:

each, keys, values are now more flexible The each, keys, values function can now operate on arrays.

While there's often little use to using keys and values on arrays, each has one fantastic use: managed iteration over an array which provides access both to the index of the iteration and the current iterator value:

my @chapters = get_chapters( ... );

while (my ($number, $chapter) = each @chapters)
{
    say "Chapter $number: ", $chapter->title;
}

This feature is great, especially when walking over multiple arrays in parallel, or any time you want to let Perl handle the little details it already tracks for you (instead of having to write unnecessary structural code).

Repurposing each (and keys and values) to work on arrays as well as hashes has benefits and drawbacks. There are no conflicts with user-defined functions, as these keywords have existed for longer than I've been using Perl. Semantically they perform somewhat similar functions, even though the aggregates on which they work are different.

The main drawback is that the operators themselves have to become polymorphic, in the sense that their behavior depends on the type of data provided.

We take it for granted that Perl does the right thing when we use == to compare numbers for equality and eq to compare strings for equality. That's why we use . to concatenate strings in Perl 5 and + to add numbers; we make our intent unambiguous.

(If you've never understood monomorphic operators and the degree to which they prevent errors and express programmer intent in Perl, the Modern Perl book includes an explanation of type contexts intended to make these ideas clear. It's free to download and to redistribute.)

Is this change in operator philosophy a problem? Not by itself.

Next time: Perl 5.14 adds a feature.

Avoid Shipping at all Costs

By chromatic on March 15, 2012 12:15 PM

The Haskell programming language's unofficial motto is "Avoid success at all costs". Given Haskell's history as an internally consistent, academically pure language for many many years (it was practically useless until mathematicians figured out how to cram the possibility to read input and emit output into math-y notation), this makes sense.

Users, of course, do things you don't predict. They tend not to want to rewrite working code because you've changed everything from surface syntax to the most fundamental philosophy underlying foundational semantics.

Sure, I overgeneralize, but two distinct poles exist.

This tension exists in any creative endeavor which exhibits the curious quality of more than one way to accomplish something, especially when those multiple ways each exhibit somewhat intangible qualities up to and including aesthetics.

Once, a couple of well-known Perl programmers consulting through my company decided to rewrite the underlying framework of an application the evening before we were to deliver a milestone to the client. The result was better, from a future development point of view. The result took less work and provided better abstractions, from a current effort point of view. The result would have been great, six weeks earlier.

Here's the problem with software development: we don't know what we need to build until we really need to build it. We don't have infinite knowledge. We don't have infinite time. We can't devise a perfect model for a messy, complex, chaotic world that's built to slather corrosive entropy all over our best efforts from the first moment we put them in front of users.

We'd love to get it right from the start. We won't.

We'd love to get it flexible enough that we can adapt to any change with grace and ease. We don't.

This leaves us two options. Either we never actually ship, so we never have to deal with disappointing users by exposing them to the effects of inevitable entropy or we embrace the fact that entropy always wins in the long run and set up bulwarks to hold back that encroaching tide and make smart tradeoffs: in exchange for a little pain of upgrading, the next version is faster, easier to use, smaller, simpler, more powerful, less buggy, whatever form of better you want to chase.

We don't get to choose if things will change. We get to decide how we react to inevitable change and the inevitable realization that we've made mistakes.

A first-year financial education eventually discusses the time value of money, where a smaller amount of money now can be worth a larger amount in the future because of what you can do with it in the interim. The same goes for knowledge and user feedback and the ability to revise your design and implementation.

To strain a metaphor even further, knowledge and customer satisfaction compounds. Sure, you want to avoid the mistake of shipping something that doesn't work at all and chasing away your entire potential user base, but we tend not to make that mistake.

Instead, against all evidence, we hold out for the perfect and avoid shipping at all costs. What a shame, when we could gain even more benefit from embracing our imperfections.

Make It Easier to Test

By chromatic on March 12, 2012 11:45 AM

ePub and Kindle versions of Modern Perl: the Book and Liftoff: Launching Agile Teams & Projects are coming soon. We've been pulling all of the pieces of Pod::PseudoPod::DOM and Pod::PseudoPod::Book together so that any corpus written in PseudoPod can become a well-formatted PDF (of multiple page sizes), an ePub book, or attractive HTML.

Part of that process required an improvement to the indexer. (It's no secret that the Kindle index for the first edition of Modern Perl wasn't up to our standards.)

Part of that process means writing good tests for indexes.

I've long believed that the best way to test code is to write your test code as realistically as possible. This is a great way to exercise the code as real people will use it, and it gives you immediate feedback on what's tedious and awkward and easy to misuse.

Sometimes I still get the tests wrong though.

Consider: a PseudoPod document uses the X<> tags for indexing. A test of the indexer must parse a document containing several such tags, then test that the output or the internal tree which represents a parsed document contains appropriate index nodes.

In short, a document containing X<ice cream sandwich> should produce an entry for ice cream sandwiches in the index.

For my first testing approach, I tried to create the appropriate index nodes and their children to run through the index emitter. That experiment lasted ten minutes, with five of those minutes spent taking a break to rethink things.

Here's a secret about tests that people often don't realize: it doesn't really matter whether you test units as units or the system as a whole if you test everything you care about and your tests run fast enough.

That first approach failed because I cared too much about the details of how the indexer worked than about what it does. The right way to approach something like this is to figure out the characteristics of the code you want to exercise, then figure out the test data, then decide how to test for the results you want.

The basic tests must exercise basic indexed terms, the alphabetization and representation of those terms, subindexed terms, multiple instances of a single term, and the relationship of subindexed terms and top-level terms.

That sounds more complicated than it is, which led me to believe that there was a simple way to represent the data. Then, of course, I felt both a little silly and a lot relieved when I had the epiphany of using the document API to produce the index:

sub make_index_nodes
{
    my $doc   = qq|=head0 My Document\n\n|;
    my $count = 0;

    for my $tag (@_)
    {
        $doc .= qq|=head1 Index Element $count\n\n$tag\n\n|;
        $count++;
    }

    my $parser = Pod::PseudoPod::DOM->new(
        formatter_role => 'Pod::PseudoPod::DOM::Role::XHTML',
        filename       => 'dummy_file.html',
    );
    my $dom   = $parser->parse_string_document( $doc )->get_document;
    my $index = Pod::PseudoPod::DOM::Index->new;
    $index->add_entry( $_ ) for $dom->get_index_entries;

    return $index;
}

That code looks more complex than it is, and could get simpler soon. (Next post: write only the code you need, refactor only when you need to refactor.) My tests use it like:

sub test_simple_index
{
    my $index = make_index_nodes( 'X<some entry>' );
    like $index->emit_index, qr!<h2>S</h2>!,
        'index should contain top-level key for all entries';
}

... such that make_index_nodes() takes a list of tags, constructs a valid document, extracts an index, and lets the test functions do what they will with it. All the test functions have to know is how to send index tags to the test function and what to get out of the returned index object. (My chances of getting that wrong in subsequent tests are low.)

If you take any lessons from this, think of three things. First, if your tests are difficult to write, you might not understand what they need to do fully yet. What are you really testing and why? Why are you testing at the level you're testing? What do you expect to explore and how?

Second, use your API the real way as much as possible. Don't poke around in private elements or throw mock objects at the problem unless that's really the easiest way to test something tricky. Input goes in. Output comes out. Treat your code as a black box as much as possible.

Finally, reduce duplication in your test code as much as is practical. Simplicity is better than clever abstraction, but a function here and there or data-driven code can reduce the likelihood of bugs in a dramatic fashion. Every time I figure out a way to simplify my tests, they get easier to maintain and extend and my code quality improves.

That's the real goal, after all: making great code to solve real problems.

Loaded for Werewolf

By chromatic on March 8, 2012 3:47 PM | 1 Comment

Legend has it that Jean Chastel fired two shots and killed the beast of Gévaudan.

The beast attacked livestock and people in south-central France in the middle of the 18th century. Some say the beast killed over a hundred people. The attacks ended after the deaths of two large wolves—one in 1765 and the other by Chastel in 1767.

By the 20th century the legend had grown to include the story that Chastel had melted down a medallion of the Virgin Mary to make two bullets of silver.

In 1986, Fred Brooks used this metaphor to propose that no single technology nor management technique could produce an order of magnitude improvement in productivity, reliability, or simplicity within the decade. (Most people get this quote very wrong. Most people haven't read the paper in detail or at all.)

Perl 1 entered the public world in 1987.

Today (while solving hard problems), I realized that Moose and its ecosystem—including projects it's inspired, such as other declarative mechanisms for describing classes and objects—may just represent an order of magnitude improvement in productivity, reliability, and simplicity.

I write a lot less code. I write less repetitive code. My code is easier to read and to maintain. My code is much more correct. My code is much more flexible and easier to test. (Compilation error messages are much worse. That doesn't bother me terribly, but keep it in mind.)

Keep your blessed hash references all you want and lament that kids these days don't have to walk uphill in the snow both ways to stuff things in package global variables like @INC and that we're lazy, fat slobs for using CPAN modules for what you have done by hand since 1994, just as Moses did and King James always intended...

... but I'm loaded for werewolf.

Delete Low Value Tests

By chromatic on March 6, 2012 11:27 AM | 4 Comments

What's the purpose of a comprehensive test suite for your code?

It's not a trick question. The answer isn't "to get 100% coverage" or "to test every line of code". (The answer oughtn't be "because someone else said so", but if that's the best reason you have, it's better than the others.)

The reason to write and maintain a comprehensive test suite is to give you confidence that your code does what it should do and will continue to do so even as you change and modify and maintain it.

That's obvious and basic and should be a fundamental belief of anyone with a few months of practical experience writing useful tests, right? Yet I'm not sure we take the implications seriously enough.

For example:

use Test::More;

BEGIN { use_ok 'Some::Class' }

my $obj = Some::Class->new( ... );
isa_ok $obj, 'Some::Class';
can_ok $obj, 'some_method';
can_ok $obj, 'another_method';
can_ok $obj, 'still_one_more_method';

...

done_testing();

If those are the only tests you have, they're better than nothing, but if you have other tests that exercise actual useful behavior, delete these tests.

I repeat: delete them. They add little value. They're busywork. They take up space and they cost time.

Just as code has a peculiar inertia as it grows in mass, so does test code. Solving a problem in 10 lines of well-written, concise, and maintainable code is better than solving a problem in 100 lines of well-written, concise, and maintainable code, ceteris paribus.

What's the purpose of a single test predicate? To exercise some behavior of the code in a unique, isolated, and unambiguous fashion to demonstrate that it works as expected and allows you to diagnose any failures as easily as possible.

By example, if you're going to call a method as part of a test, there's little value in testing that that method exists. Call it. If it doesn't exist, you'll get an error message and tests will fail. Likewise, there's little value in using use_ok if you have further tests that require the module to have compiled fully. If you're not careful, your test suite can end up a big pile o' mess, a poorly organized dumping ground for whatever someone thought useful at the time. (It's code. Refactor relentlessly but carefully.)

(The desire for concision of expression in a test suite lies behind the existence of Test::Builder, which allows for the peaceful co-existence of hundreds of testing modules within the same program. Abstractions are possible. They exist. They work, millions of times a day.)

This isn't an excuse to skimp on your tests. By all means write great tests and cover the essential behavior of your programs completely. Do, however, keep in mind that non-essential code is a liability. Feel free to delete unnecessary, repetitive, verbose, low value tests—which neither provide diagnostic assistance nor exercise a unique property of your code.

Your Test Suite Needs at Least This File

By chromatic on March 2, 2012 3:38 PM | 2 Comments

I maintain a couple of applications which have turned from proofs of concept into working products with real users. As with much code in the real world, parts of these applications are great. Parts need rethinking. Weird bugs lurk in corners, revealed by deployment issues, assumptions about incoming data, and the desire to respond to customer wants and needs.

It's a lot of fun, and it provides fertile ground for seeing the kinds of mistakes people make on small projects driven by customer feedback.

It almost goes without saying that proof of concept code has far too little test coverage.

I found a bug in one project after deployment today. (Deployment happened last night, when it should have happened in the morning, but that's a different story.) It was a misplaced parenthesis. Here's the diff:

-    my ($status   = $response->status_line) =~ tr/\r//d;
+    (my $status   = $response->status_line) =~ tr/\r//d;

That's a very silly, very simple thing—an easy mistake to make. It's just a typo. I make a couple of dozen typos a day while coding, if not more.

This code is in a very simple class of about 60 lines that wraps around an HTTP request/response pair and dispatches to the appropriate method when the asynchronous response returns. It's very simple. It's a wrapper with a process() method, and you could probably write it yourself from that description.

It didn't have a test.

I understand the mindset of the person who wrote this. It's so simple, it almost can't fail. (It can only fail if it fails to compile.)

When I saw the error in our error logs, I had a fix in moments. Then I spent another minute to ensure that this type of error would never happen again. Our deployment strategy uses dzil release, which runs all of the tests and checks the Git checkout for sanity before releasing. A compilation error like this—even in code without a test—will never happen again. Here are the seven lines of code:

#!/usr/bin/env perl

use Modern::Perl;
use Test::More;

use lib 'lib';
use Module::Pluggable search_path => [ 'MyApp' ];

require_ok( $_ ) for __PACKAGE__->plugins;

done_testing;

I added this test to the other project I've been working on this week, and it caught a (small) bug too.

If you're not practicing rampant TDD for every file, package, and class you maintain, at least start with this one test file.

(Yes, static language fans: your compiler often helps you avoid this problem. Other times you have dynamic linking to consider.)

Blinded By Our Own Experiences

By chromatic on March 1, 2012 12:57 PM | 6 Comments

Programmers are optimists.

Programmers are notoriously bad at estimates. (Programmers are optimists.)

No program survives first contact with end users. (Programmers are optimists.)

No matter how much we try to predict the future or how many edge cases we can imagine, there's no substitute for the cold, hard crash of reality against our carefully constructed edifices. Two examples come to mind.

If you've written Perl before, spot the bug without running this program:

my %extensions_to_names =
(
    001 => 'Rodney',
    004 => 'Lucky',
    020 => 'Daisy',
    080 => 'Petunia',
    108 => 'Tuxie',
);

(Why do the animals have phone extensions? In a world of software telephony through Asterisk, the question becomes why not?)

Sure, the code aligns prettily and uses three-digit phone extensions where it makes sense, but four of those numbers are octal numbers. One is invalid. One isn't what you expect. The presence of the autoquoting fat comma doesn't fix this. Oops.

This is the kind of mistake you make without thinking about it because it never comes up. Then it comes up again. (Yes, the righter approach is to require an explicit base when writing literals in anything other than base 10, like 0b or 0x.)

We make these mistakes because of the assumptions we bring to our data, and we bring those assumptions from our experiences. As Tom Christiansen says, "You don't come from somewhere whose zipcode begins with a 0."

One of the reasons I use a single-word nom de plume and never capitalize it is to see what breaks. (Answer: lots of stuff.)

What happens if you're Icelandic and your surname depends on the first name of your father and not his surname? (Happened to a friend of mine.)

What happens if you don't have a middle name?

What happens if you're from a culture with a different order of names? What do you put in the little boxes in some startup's web form?

It's not just names. It's not just Unicode. It's not just the one true way to express street addresses or telephone numbers. It's every expectation we make.

How do you calculate the growth rate of free cash of a company when it's negative for the first few years, then turns positive halfway through? (What percentage is positive one million dollars greater than negative seven million dollars?)

As I see it, we have three overlapping pieces of solutions.

First, experience helps. As we do things wrong, find out why they're wrong, and fix them, we have the opportunity to learn things and do them less wrong in the future. Better yet it doesn't even have to be your own experience.

Second, knowledge of the world outside of programming helps, if you pay attention. The more you know about the world and its complexities, the more opportunities you have to learn about what can go right and what can go wrong, or at least the kind of information you will eventually have to scrub and munge and mangle into correctness.

Third, the sooner you deliver software to real users to break things, the sooner you can revise and fix your faulty assumptions. Keep a notebook. (I deployed software to friends and family over the past couple of weeks. Even though I thought I made it easy to use, they're full of suggestions to make it easier. Also they have a knack for finding weird bugs.)

If nothing else, move somewhere with a weird postal code and change the representation of your name for a while. I suggest Unicode symbols with weird casing and at least one punctuation character. If nothing else, you'll get your text editor settings right.

« February 2012 | Main Index | Archives | April 2012 »