Failure-Driven Design

I use actual, measured needs to help design software. I also use reported failures to help design software. I have one more design principle: I use misunderstandings to help me design software.

In particular, this is an API-level design principle. The question changes from "How can I make a particular feature possible?" to "How can I make a particular feature impossible to misuse?"

Up-front Failure Prevention

It's easy to demonstrate failures of this principle; consider string handling in the C programming language, or global-by-default variables in Perl 5, or the Python REPL's behavior when you type quit or exit. (That last example catches me every time I use Python's interactive mode. Thanks for the lovely warning. Please do what I want if you know what I want.)

This principle also falls somewhere in between necessity-driven design and bug-driven design. It requires asking "What could possibly go wrong?", enumerating the likely and unlikely possibilities, analyzing their risk, and determining the likelihood of failure.

For example, Parrot supports a data structure known as a constant string. These are immutable singleton structures which represent strings used pervasively throughout the core system. By making them immutable, we obviate the need to make copies to prevent unwanted modifications. By making them singletons, we can collapse multiple references to a single string into pointers to that singleton string and save lots of memory.

We use a macro called CONST_STRING in C code in the Parrot core to identify one of these strings.

While it would be nice if our documentation were always sufficient to describe how to write a Parrot extension without copying and pasting code from the core, I realize that almost everyone who will ever write a Parrot extension will start with skeleton code cribbed from elsewhere.

I wanted to make the constant string technique work reasonably well for extensions as well. It'll never be quite as fast nor efficient as the core version, but a quick cache does help a lot of benchmarks.

Our first attempt used a different macro, CONST_STRING_GEN, as the internal implementation of that macro had to be different. Rather than poking directly into interpreter memory, extensions have to go through a secondary lookup: they don't have access to internals in the same way that Parrot's core does.

Then I realized the problem.

I don't want to explain the mechanics of how constants string work, at least not to people writing extensions. I want to say nothing more than "If you know you'll never need to modify this string, mark it as a constant string." I know I don't want to explain the differences between the caching models, especially because extensions shouldn't need to know anything about Parrot's unencapsulated internals.

Yet I knew that people would copy code from the core into their extensions and then wonder why their versions Just Did Not Work.

I changed the extension processor to emit a different version of the CONST_STRING macro local to each extension which uses the appropriate public API to manipulate constant singleton strings. Even though the mechanics of how this works differs between core code and extension code, it still reads the same way. People can copy and paste code between core data structures and extensions without knowing the difference, at least in this respect.

Even though copying and pasting is generally bad, it's so pervasive (especially in this context), our interfaces have to allow for it -- and should not allow people to make subtle errors.

For further reading, I suggest Joshua Bloch's How to design a good API and why it matters talk from OOPLSA 2006.

Reacting to Failure

Of course, it's not always possible to predict what people will do wrong.

Sometimes the best you can do is look at a bug report, ask yourself "Wait, why in the world would you ever write code this way?", and then work backwards. How did that failure occur? Do you not provide the right APIs? Do your abstractions leak? Are people working around a broken feature? Are people working around the lack of a feature?

Sometimes you need to pave the cowpaths. Sometimes you need to change the way you explain code. Sometimes you need to change the name of an API call or a parameter to suggest the right behavior.

Mostly you need to understand where and why expectations went wrong. Only then can you change the vocabulary of the system to change expectations for the better.

That's why I welcome quick, rapid feedback. If something doesn't work, I want to know that as soon as possible -- before it gets too established to improve, before it confuses too many people, and before people accept it as "just another quirk."

There can be a little pain to start, especially for early adopters, but there's no substitute for solving real problems in the real world to help you understand exactly what you should have designed in the first place. Maybe next time you'll get more right.

(This, I believe, is one of those practices which separates real, actual agile development from Big-A-Because-It's-Hip-Agile development: pervasive, ubiquitous feedback gathered and reflected upon to produce small, verifiable changes to development practices designed to improve the process itself.)

1 Comment

zby | September 29, 2009 1:14 AM

That's why it is good for developers to write the documentation for their modules, and also do support (here the problem starts when they like correcting people and they never change the code to have people make less mistakes - because this would give them less occasions for that ego play).

Up-front Failure Prevention

Reacting to Failure

Tags:

1 Comment

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry