Suppose you want to write a program in Perl. (Suppose you have written a program in Perl.) If the thesis behind what I call Modern Perl is correct, you can write that program well or you can write that program poorly. (For supporting arguments for that thesis, see Piers Cawley's A tale of two languages.)

Likely you've seen examples of Poorly Written Perl on the Internet. They serve as the YouTube comments to Nabokov of English language. In other words, the proper response to a reluctant admission that:

Yes, I know that Perl can be written in an object-oriented and readable way.

— Tim Bray, D.P.H.

... or that:

There's also been a push in some applications to rewrite Perl utilities in Bash to enhance portability between platforms. While Perl exists on just about every platform out there, there are vagaries that can cause issues with differing Perl versions, which then leads to portability problems.

— Paul Venezia, Is it still libelous if you end your titles with question marks?

... the proper response is "Why didn't you write your code with maintainability in mind?"

I know, I know. That's not helpful. Here's a quick checklist to help those of you writing Perl (or those of you trying to hire people to write Perl (or those of you trying to hire people to learn to write Perl)) to determine if you're capable of writing Perl well:

You don't have to answer all of those questions in the correct way to write good and maintainable Perl, but if you answer most of those questions in the wrong way, of course you'll write bad code.

Perl allows people to accomplish their tasks without having to learn much, without having to participate in strange and unfamiliar ceremonies, and without even being much good at programming at all. That's by design, and that's a good thing for very specific circumstances. Yet if you approach programming as if it were merely typing and retyping until something barely working fell out of your typewriter, you're going to make lots of messes, and no language can save you from an unprofessional lack of discipline.

Writing good code requires discipline in any language.

I attended an exhibit about the work of Leonardo da Vinci several months ago. Part of that exhibit was a thorough analysis of his Mona Lisa painting. "It's perhaps the most famous painting in the world," I thought. "I've seen it (or at least replicas) thousands of times before."

Then at the suggestion of the exhibit, I looked behind the model and saw more details, such as a low wall, the lack of eyebrows and eyelashes, and other small details that have always been there but somehow failed to catch my attention.

Several years ago, I read an analysis of Roger Zelazny's The Chronicles of Amber series. The analyst admitted that he re-read the series every few years and learned new things each time. (Zelazny's Chandleresque tone in the first five books contributes to the depth of the books, but so does the fact that his characters gladly lie to, backstab, betray, confuse, manipulate, and distrust each other and their own selves.) A reinterpretation of a single line which seemed so innocent during the last reading could cause you to see a character in an entirely different light.

Good art is like that.

Today I understood an underused feature of Perl 5 better.

Paulo Custodio filed a bug on the Modern Perl draft that the explanation of module unimporting was incomplete. I had written that:

no Module::Name qw( arguments );

... is equivalent to:

BEGIN { Module::Name->unimport( qw( arguments ) ) }

In all accuracy (and, upon reflection, obviousness), no Module::Name qw( arguments ) is equivalent to:

BEGIN
{
    require 'Module::Name';
    Module::Name->unimport( qw( arguments ) );
}

Even though I rarely use module unimporting and have never, to my best recollection, unimported a module I haven't previously used, its obvious that unimporting through no should imply require. (I have trouble imagining an interface where you'd initially load a pragma with no, unless you use strictperl, but clever people can do clever things.)

You may all now chuckle at how long it took me to realize this (and, yes, I did read the Perl 5 source code to prove to myself that this occurs).

Jamie McCarthy made an interesting point about type safety in embedded SQL on String-Plus:

SQL is a great example for this. Relational databases are more useful with strong typing, so EMPLOYEE_ID is incompatible with PRODUCT_ID even if they are both implemented as INT. It'd be a great idea to see those constraints implemented at the perl level, presumably by giving perl more knowledge of the database schema than even the database engine has.

Imagine that you have, or can write, a little language parser for a SQL-like language. My simple example was:

SQL {{
    UPDATE users SET address = { Address $address } WHERE user = { User $user }
}}

This can decompose into several operations:

  • Get the value of the $address variable.
  • Get the primary key of the $user variable.
  • Prepare a database query with a rewritten query string which uses placeholders for the $address and $user variables to avoid SQL injection and other interpolation errors.
  • Execute the query.

That's a nice interface, but you can do better. As I suggested, you can add error checking if you know the structure of the database:

  • Get the metadata which describes the users table.
  • Verify that the required fields (address and user exist).
  • Get the value of the $address variable.
  • Get the primary key of the $user variable.
  • Prepare a database query with a rewritten query string which uses placeholders for the $address and $user variables to avoid SQL injection and other interpolation errors.
  • Execute the query.

You can take advantage of type checking too:

  • Get the metadata which describes the users table.
  • Verify that the required fields (address and user exist).
  • Verify that the type of $address is compatible with the type of the address field. Repeat for $user and user.
  • Get the value of the $address variable.
  • Get the primary key of the $user variable.
  • Prepare a database query with a rewritten query string which uses placeholders for the $address and $user variables to avoid SQL injection and other interpolation errors.
  • Execute the query.

If you know the structure of the database when the program starts, you can start to push some of this type checking to the point of compilation. (You may not be able to perform all of the type checking at compilation time, but you can do as much as possible as early as possible to prevent as many errors as possible.)

That's simple and easy. Now imagine something more interesting:

SQL {{
    SELECT name, address FROM users, addresses GIVEN { User $user }
}}

It's obvious from the syntax of the query language that the database needs to perform a join operation, and it's obvious that the primary key of the $user object is the important key of the operation. If the program knows the relationship of the users and addresses tables, it can join them effectively as well.

Don't get caught up in the syntax or the semantics of the remainder of examples here; they exist to demonstrate possibilities, not the final form of battle-tested code. Even so, imagine a dynamic query:

SQL {{
    SELECT @fields FROM { Table $table_one }, {Table $table_two } }
}}

Again the structure and intent of the code is obvious. The operations are now:

  • Find the primary keys for $table_one and $table_two.
  • Verify that they're joinable.
  • Verify that all members of @fields are present in either $table_one or $table_two.
  • Construct the query.

If I were to implement this, I'd make a join_tables multimethod. It takes two arguments (generalizable to more, but follow along with two for now). Imagine that it looks something like this:

multi join_tables( Table $t1, Table $t2 ) { ... }

multi join_tables( Any, Any ) { fail() }

Given two Table arguments, the first multi candidate matches and gets called. Given any other combination of arguments, the second candidate matches and produces an error.

Knowing that you have two Table objects isn't enough, however. The tables might have no relationship to each other. Imagine if you somehow could verify that the tables have an appropriate relationship. If I were to implement this, I might check that the keys of the tables matched types, perhaps with a syntax something like:

multi join_tables ( Table $t1, Table $t2 where { $t1.primary_key eqv $t2.foreign_key( $t1 ) } ) { ... }

That is, the keys must be of equivalent types. If one key is a user_id and the other is an Integer, the where clause won't match for this candidate, so a different multi will get called.

Now imagine that for those embedded SQL minilanguage statements where table name is available at compilation time and sufficient type information exists to verify the statements themselves at compilation time:

SQL {{
    SELECT name, address FROM { User users }, { Address addresses }
}}

... then everyone who uses this minilanguage (and has set up the table information appropriately) gets safety and correctness by default. Some of that can even occur before the program runs. The rest of it can occur as the program runs.

(A really, really good type checker and optimization system could infer that some errors are impossible even if it can't prove the use of a single type in every case.)

Now imagine that you have a language which allows you to build minilanguages like this, to build APIs which specify correct operations and fall back to good error reporting on incorrect operations, and which do so without interfering with other code and other extensions.

Welcome to Perl 6.

String-Plus

| 9 Comments | No TrackBacks

What does this variable represent?

my $thingie =<<'END';
Thaddeus Droit
4616 NW Washington Place
Beaverton, OR 97006
END

It's obviously an address, but what does Perl know about it? Perl knows it's a string. Perl knows it's some 60 characters long. Perl may even know that it's a valid string of Latin-1 characters.

Perl doesn't know where the string came from, nor that it contains a street address or a legal name nor a zip code (and not a zip + 4). Any meaning to the program beyond "It's a string of some 60 characters and is valid in the Latin-1 encoding" is far beyond what Perl knows about it. That's why the name of the variable is $thingie; even though Perl doesn't care about variable names, calling it $address instead could have led you to believe there's more structural meaning to this chunk of memory than actually exists.

Names are important, at least to people maintaining source code. This code is obviously wrong:

$user->set_address( $birthday );

... but to Perl it might as well be:

$foo->bar( $baz );

... for all of the semantic meaning it understands. There's no obvious intent.

I know you're smart and you're way ahead of me and you think "If I wanted a good static type system, I know where to find Haskell or OCaml and I'd never let code that bad get out of code review and why aren't you writing tests." but that's not the point. You can be super careful or make APIs which restrict the most natural way to write code in the host language in favor of extra security. That may be the right approach. (You have to be careful, though: the ease of interpolating untrusted user input into a raw string or the use of register globals in PHP seems analogous to the attractive nuisance doctrine, where people who don't know any better can't analyze the risk appropriately.

There may be another way.

Suppose I annotated the address:

my Address $thingie =<<'END';
Thaddeus Droit
4616 NW Washington Place
Beaverton, OR 97006
END

It's still a chunk of memory with certain characteristics, but now it has an extra piece of metadata related to the program itself (and not merely Perl itself). A clever compiler could detect certain places where the semantics of an operation don't match:

method set_address(Address $addy) { ... }

... though you do have to be able to resolve this kind of dispatch at compilation time to prove the type safety of the entire program at compilation time. (I've seen suggestions that even Smalltalk programs can resolve some 85-90% of dispatch targets in a static fashion.)

You don't have to go that far; runtime verification with a good test suite is effectve, can be fairly cheap, and is available right now in Perl 5 with Moose.

There's still another way. Consider again the untrusted input example. If you enable tainting, you might read user input into an address:

my Address $untrusted_addy = $req->get( 'address' );

You don't see it in the declaration, but the "This is tainted!" metadata is present in $untrusted_addy. How do you deal with that?

You could be picky about always untainting untrusted data, but can you do that accurately and effectively? Can you rely on everyone always getting it right?

What if you could write:

SQL {{
    UPDATE users SET address = { Address $address } WHERE user = { User $user }
}}

... and Perl could verify that $address is an appropriate Address (and $user is an appropriate User), could quote and escape and validate both of them effectively, could extract the primary key from $user, and could untaint any tainted $address or $user?

If your language supports multiple dispatch, lets you define your own types, lets you override stringification, and can override interpolation for cases like these, you can do such things.

In other words, you could turn what would otherwise be a raw string into an embedded little language with its own syntax and semantics, interoperate with native data structures in the host language, and provide composable safety—and users don't have to know much of anything about how this works, as it pretty much does what they expect.

I can imagine a language like that.

Some people believe that security problems and other severe bugs are inevitable. Some of these people believe that conscientious design and clear thinking about how languages and APIs work is irrelevant; bad code is possible in every language.

Bad code is possible in any language and wrong code is possible with any API. Even so, it's possible to create languages and APIs which make the right thing so much easier than the wrong thing that only the most incompetent (or dangerously malicious) write bad code.

Imagine, for example, a database access layer which forbids the use of raw strings to create SQL queries. You might have to write:

my $sth = $dbh->select( @tables )->join( %relations )->where( %conditions );

That's not necessarily a beautiful interface dashed off after a moment of thinking, but it has an important security property: it avoids the interpolation of untrusted user input. All data sent to the database may go through a quoting or untainting process without the user having to remember to do so.

A similar library could help avoid malicious user input from interfering with the display or operation of a web site, for example. These are both specific cases of a general principle: replace unstructured string data with structured data. In both cases, the structure of the data makes the intent of the data clear, which allows the library to ensure as much safety as possible.

This principle has other implications as well; more on that next time.

Find recent content on the main index or look in the archives to find all content.

Recent Comments

  • axqd.net: Can you write another post on answers (i.e. pattern and read more
  • chromatic: You're right. Thanks for the sanity check! read more
  • audreyt: In "What's Missing", maybe s/Three core pragmas/Three core pragmas and read more
  • xenoterracide: Unfortunately Taint works less than it should because so many read more
  • theclapp.myopenid.com: > no Module::Name qw( arguments ) is equivalent to: read more
  • chromatic: Right! I thought I'd seen something like that somewhere. A read more
  • sartak.org: Three modules which use this kind of interface (presumably for read more
  • https://me.yahoo.com/moleculasdevida#fea03: Readonly is actually a function that takes a list of read more
  • jnareb.openid.pl: I wanted to ask if it is what Readonly module read more
  • Jamie McCarthy: SQL is a great example for this. Relational databases are read more

Categories

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.23-en