Eliminating Errors with Little Languages

Jamie McCarthy made an interesting point about type safety in embedded SQL on String-Plus:

SQL is a great example for this. Relational databases are more useful with strong typing, so EMPLOYEE_ID is incompatible with PRODUCT_ID even if they are both implemented as INT. It'd be a great idea to see those constraints implemented at the perl level, presumably by giving perl more knowledge of the database schema than even the database engine has.

Imagine that you have, or can write, a little language parser for a SQL-like language. My simple example was:

SQL {{
    UPDATE users SET address = { Address $address } WHERE user = { User $user }
}}

This can decompose into several operations:

Get the value of the $address variable.
Get the primary key of the $user variable.
Prepare a database query with a rewritten query string which uses placeholders for the $address and $user variables to avoid SQL injection and other interpolation errors.
Execute the query.

That's a nice interface, but you can do better. As I suggested, you can add error checking if you know the structure of the database:

Get the metadata which describes the users table.
Verify that the required fields (address and user exist).
Get the value of the $address variable.
Get the primary key of the $user variable.
Prepare a database query with a rewritten query string which uses placeholders for the $address and $user variables to avoid SQL injection and other interpolation errors.
Execute the query.

You can take advantage of type checking too:

Get the metadata which describes the users table.
Verify that the required fields (address and user exist).
Verify that the type of $address is compatible with the type of the address field. Repeat for $user and user.
Get the value of the $address variable.
Get the primary key of the $user variable.
Prepare a database query with a rewritten query string which uses placeholders for the $address and $user variables to avoid SQL injection and other interpolation errors.
Execute the query.

If you know the structure of the database when the program starts, you can start to push some of this type checking to the point of compilation. (You may not be able to perform all of the type checking at compilation time, but you can do as much as possible as early as possible to prevent as many errors as possible.)

That's simple and easy. Now imagine something more interesting:

SQL {{
    SELECT name, address FROM users, addresses GIVEN { User $user }
}}

It's obvious from the syntax of the query language that the database needs to perform a join operation, and it's obvious that the primary key of the $user object is the important key of the operation. If the program knows the relationship of the users and addresses tables, it can join them effectively as well.

Don't get caught up in the syntax or the semantics of the remainder of examples here; they exist to demonstrate possibilities, not the final form of battle-tested code. Even so, imagine a dynamic query:

SQL {{
    SELECT @fields FROM { Table $table_one }, {Table $table_two } }
}}

Again the structure and intent of the code is obvious. The operations are now:

Find the primary keys for $table_one and $table_two.
Verify that they're joinable.
Verify that all members of @fields are present in either $table_one or $table_two.
Construct the query.

If I were to implement this, I'd make a join_tables multimethod. It takes two arguments (generalizable to more, but follow along with two for now). Imagine that it looks something like this:

multi join_tables( Table $t1, Table $t2 ) { ... }

multi join_tables( Any, Any ) { fail() }

Given two Table arguments, the first multi candidate matches and gets called. Given any other combination of arguments, the second candidate matches and produces an error.

Knowing that you have two Table objects isn't enough, however. The tables might have no relationship to each other. Imagine if you somehow could verify that the tables have an appropriate relationship. If I were to implement this, I might check that the keys of the tables matched types, perhaps with a syntax something like:

multi join_tables ( Table $t1, Table $t2 where { $t1.primary_key eqv $t2.foreign_key( $t1 ) } ) { ... }

That is, the keys must be of equivalent types. If one key is a user_id and the other is an Integer, the where clause won't match for this candidate, so a different multi will get called.

Now imagine that for those embedded SQL minilanguage statements where table name is available at compilation time and sufficient type information exists to verify the statements themselves at compilation time:

SQL {{
    SELECT name, address FROM { User users }, { Address addresses }
}}

... then everyone who uses this minilanguage (and has set up the table information appropriately) gets safety and correctness by default. Some of that can even occur before the program runs. The rest of it can occur as the program runs.

(A really, really good type checker and optimization system could infer that some errors are impossible even if it can't prove the use of a single type in every case.)

Now imagine that you have a language which allows you to build minilanguages like this, to build APIs which specify correct operations and fall back to good error reporting on incorrect operations, and which do so without interfering with other code and other extensions.

Welcome to Perl 6.

Eliminating Errors with Little Languages

Tags:

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry