July 2010 Archives

An Accurate Comparison of Perl 5 and Rakudo Star

By chromatic on July 30, 2010 3:26 PM | 5 Comments

this article was been updated since its original posting

Rakudo Star was supposed to be a useful and usable subset of Perl 6. It was promised as something "you can use right now". Years after its release, it still does not implement the complete Perl 6.0 specification. It's still by no means the final release. It's still buggy, memory hungry, and slow. While its developers will tell you it's improving, years of work still haven't brought it anywhere close to parity with Perl 5, let alone the vision of what Perl 6 might eventually be.

At the original writing of this article, I believed that "subsequent releases will bring improvements on completeness, correctness, and competitiveness." While that's technically true—it is more complete, more correct with regard to the changing specifications, and has slightly better performance—its progress has been disappointing. I was wrong about its schedule.

This article remains online only for historical interest. It's out of date and misleading. Rakudo's all-but-abandonment of the Parrot virtual machine (and the subsequent diaspora of Parrot developers) have made Parrot a poor choice for the long term plans of Rakudo. (Someone inclined to conspiracies might suggest that this was a goal of the Rakudo developers.) The current state of Rakudo and startup time and benchmarks must thus use the JVM backend, as no other implementation is sufficiently advanced to participate in this benchmark.

I have no connection with Perl 6 and no desire to produce this benchmark. I suspect it would fare badly for Rakudo, but that is supposition not based on evidence.

In late 2010, David Skoll posted a comparison of Rakudo Star to Perl 5.

Rakudo's perl6 binary for the Parrot backend is a couple of hundreds of lines of C code (if that much) linking to libparrot and mostly unoptimized Parrot bytecode. By unoptimized, I mean "There is no optimization of Parrot bytecode in Rakudo Star." — not constant folding, not size optimization, not inlining. None.

The perl5 binary is of course all compiled C code generated (hopefully) by an optimized compiler. Does that make for a valid comparison? I think not.

You cannot easily compare the feature set of the Perl 5 binary to that of Rakudo Star. Perl 5 contains no advanced object system, no grammars, no continuations or coroutines, no junctions, and no interoperability with other languages. It also lacks contain hyperoperators, function and method signatures, multidispatch, laziness, built-in language redefinition, strictness by default, autothreading, REPL, and automatic binding to shared libraries. Somehow Perl 5 programmers manage, mostly by the CPAN:

It's easier to add the loading of the appropriate CPAN modules to the Perl 5 process than to remove them from Rakudo Star, so for an accurate comparison, you must add:

Feature	Module	Virtual/Resident Memory in KB (cumulative)	Startup Cost (cumulative)
Parser manipulation	Devel::Declare (obsoleted by recent Perl 5 releases)	7356 / 2508	0.024s
Advanced object system	Moose and MooseX::Declare	18184 / 13064	0.705s
autoboxing	autobox, autobox::Core, and Moose::Autobox	19884 / 14732	0.777s
Coroutines	Coro	20276 / 15020	0.783s
Parse tree manipulation	B::Generate	20328 / 15080	0.799s
REPL	Devel::REPL	21132 / 15860	0.827s
Multidispatch	MooseX::MultiMethods	22076 / 16832	0.987s
Function signatures	signatures	22144 / 16920	0.992s
Shared library bindings	FFI	22188 / 16976	0.995s
Perl 5.12 features	feature	22216 / 16976	0.997s
Grammars	Regexp::Grammars	22780 / 17576	1.017s
Parrot language interoperability	Parrot::Embed (obsolete with the deprecation of Parot)	40812 / 18476	1.020s

I ordered the list in terms of dependencies to show cumulative costs. I also left out a few CPAN distributions you need to add to Perl 5 to add features enabled by default in Perl 6: for example, I couldn't get Data::Alias to build.

My perl5 binary is Perl 5.12.1, built fresh today as a 32-bit binary without threading. (A 64-bit binary uses more memory and a binary with threading is some 10-15% slower, reportedly.) It's 1,239,686 bytes in size.

My perl6 binary (built from today's checkout, not Rakudo Star) is 14,280,276 bytes in size. At the REPL, it uses 99,944 KB of virtual memory and 81,132 KB of resident memory. To run the program -e 1, it requires 1.035s. By my calculations, Rakudo Star starts up 1.5% more slowly than Perl 5 with all of the other modules loaded and uses 2.45 times as much virtual memory and 4.39 as much resident memory.

By way of comparison, Perl 5 running -Mperl5i::2 -E 1 runs in 0.149s. It requires 12,080 KB of virtual memory and 6,952 KB of resident memory. Rakudo Star starts 6.72 times slower than Perl 5 with perl5i. Rakudo Star uses 8.27 times more virtual memory and 11.67 times more resident memory.

Remember; those were numbers in mid-2010. Compare Perl 5.18.2 (which has had feature enhancements and optimizations) to a recent Rakudo Star release on the JVM and you will see different numbers.

Of course, if you're comparing the ability of either language to let you get things done... well, Perl 5 still comes out ahead.

How About a Shetland Ponie?

By chromatic on July 29, 2010 1:47 PM | 2 Comments

Rakudo Star is out, and so begins the next great wave of interest and use of Perl 6. The next several releases will improve performance, fix bugs, add features, port or create more libraries, and—in all likelihood—improve and otherwise clarify the Perl 6 specification.

The Perl ecosystem has room for other projects, however.

For example, one of the clearest benefits Perl 6 has over Perl 5 is its portability to other virtual machines and runtimes. By design Perl 6 encourages multiple implementations. Perl 5 is its own specification; in many places, what Perl 5 is is solely what Perl 5 happens to do. Sometimes that behavior gets enshrined in the specification tests, but other times it's folklore and institutional community knowledge.

Just as Parrot's Lorito project intends to make Parrot at least an order of magnitude faster, so too a reorganization of Perl 5 internals could make amazing things more possible.

What if there were a project to implement a minimal set of Perl 5 on the Parrot virtual machine as a prototype and exploration of how much of Perl 5 you can support, the effort it takes to do so, and what kind of utility you can expect? Parrot's compiler tools let the Rakudo developers write most of Perl 6 in Perl 6; surely it's possible to write Perl 5 in a similar fashion. (Credit to other projects such as Rubinius and PyPy for demonstrating that such things are possible.)

I know other projects have attempted this in the past. Perhaps the best place to steal information is Bradley Kuhn's masters thesis, Considerations on Porting Perl to the Java Virtual Machine.

As Jesse wrote in his comments, bug-for-bug compatibility isn't necessary. Nor is full compliance with the existing Perl 5 test suite. A simple proof of concept to produce the 80% of Perl 5 most people use in most programs should suffice. (Parrot gives you a lot of that anyway.)

As a bonus, you get cheap and easy interoperability with Perl 5, access to Parrot features such as multidispatch, grammars, continuations, and bytecode serialization, and you could even replace some of the uses of Perl 5 within Parrot's and perhaps even Rakudo's configuration and build processes.

It doesn't even have to be a pony of full size.

A Checklist for Writing Maintainable Perl

By chromatic on July 26, 2010 9:53 AM | 3 Comments

Suppose you want to write a program in Perl. (Suppose you have written a program in Perl.) If the thesis behind what I call Modern Perl is correct, you can write that program well or you can write that program poorly. (For supporting arguments for that thesis, see Piers Cawley's A tale of two languages.)

Likely you've seen examples of Poorly Written Perl on the Internet. They serve as the YouTube comments to Nabokov of English language. In other words, the proper response to a reluctant admission that:

Yes, I know that Perl can be written in an object-oriented and readable way.

— Tim Bray, D.P.H.

... or that:

There's also been a push in some applications to rewrite Perl utilities in Bash to enhance portability between platforms. While Perl exists on just about every platform out there, there are vagaries that can cause issues with differing Perl versions, which then leads to portability problems.

— Paul Venezia, Is it still libelous if you end your titles with question marks?

... the proper response is "Why didn't you write your code with maintainability in mind?"

I know, I know. That's not helpful. Here's a quick checklist to help those of you writing Perl (or those of you trying to hire people to write Perl (or those of you trying to hire people to learn to write Perl)) to determine if you're capable of writing Perl well:

Do you know how to use the Perl documentation
Do you use CPAN modules?
Do you use the CPAN distribution layout for organizing your code?
Have you enabled strictures and warnings? Is the resulting code clean of warnings and errors?
Are you using the standard Perl testing framework? (Did you write tests at all?)
Do you have an automated Perl configuration, build, dependency resolution, installation, and distribution mechanism?
Does your code conform to local Perl layout guidelines?
Does your code conform to Perl community standards for maintainability and correctness?
Are you familiar with the local Perl mongers group?
Are you using a recent version of Perl?
Are you familiar with writing secure Perl?
Do you use source control?
Do you use functions?
Do you use modules?
Do you use objects?
Do you use Moose or another abstraction mechanism from the CPAN?
Do you document your Perl code?
Do you use language constructs you don't understand, copied and pasted from elsewhere, smushed together into a hateful melange of barely-working confusion you occasionally tweak just to see what happens, and one afternoon you get sick of it and call it done?

You don't have to answer all of those questions in the correct way to write good and maintainable Perl, but if you answer most of those questions in the wrong way, of course you'll write bad code.

Perl allows people to accomplish their tasks without having to learn much, without having to participate in strange and unfamiliar ceremonies, and without even being much good at programming at all. That's by design, and that's a good thing for very specific circumstances. Yet if you approach programming as if it were merely typing and retyping until something barely working fell out of your typewriter, you're going to make lots of messes, and no language can save you from an unprofessional lack of discipline.

Writing good code requires discipline in any language.

The Best Art Continues to Surprise

By chromatic on July 22, 2010 3:10 PM | 3 Comments

I attended an exhibit about the work of Leonardo da Vinci several months ago. Part of that exhibit was a thorough analysis of his Mona Lisa painting. "It's perhaps the most famous painting in the world," I thought. "I've seen it (or at least replicas) thousands of times before."

Then at the suggestion of the exhibit, I looked behind the model and saw more details, such as a low wall, the lack of eyebrows and eyelashes, and other small details that have always been there but somehow failed to catch my attention.

Several years ago, I read an analysis of Roger Zelazny's The Chronicles of Amber series. The analyst admitted that he re-read the series every few years and learned new things each time. (Zelazny's Chandleresque tone in the first five books contributes to the depth of the books, but so does the fact that his characters gladly lie to, backstab, betray, confuse, manipulate, and distrust each other and their own selves.) A reinterpretation of a single line which seemed so innocent during the last reading could cause you to see a character in an entirely different light.

Good art is like that.

Today I understood an underused feature of Perl 5 better.

Paulo Custodio filed a bug on the Modern Perl draft that the explanation of module unimporting was incomplete. I had written that:

no Module::Name qw( arguments );

... is equivalent to:

BEGIN { Module::Name->unimport( qw( arguments ) ) }

In all accuracy (and, upon reflection, obviousness), no Module::Name qw( arguments ) is equivalent to:

BEGIN
{
    require 'Module::Name';
    Module::Name->unimport( qw( arguments ) );
}

Even though I rarely use module unimporting and have never, to my best recollection, unimported a module I haven't previously used, its obvious that unimporting through no should imply require. (I have trouble imagining an interface where you'd initially load a pragma with no, unless you use strictperl, but clever people can do clever things.)

You may all now chuckle at how long it took me to realize this (and, yes, I did read the Perl 5 source code to prove to myself that this occurs).

Eliminating Errors with Little Languages

By chromatic on July 20, 2010 10:51 AM

Jamie McCarthy made an interesting point about type safety in embedded SQL on String-Plus:

SQL is a great example for this. Relational databases are more useful with strong typing, so EMPLOYEE_ID is incompatible with PRODUCT_ID even if they are both implemented as INT. It'd be a great idea to see those constraints implemented at the perl level, presumably by giving perl more knowledge of the database schema than even the database engine has.

Imagine that you have, or can write, a little language parser for a SQL-like language. My simple example was:

SQL {{
    UPDATE users SET address = { Address $address } WHERE user = { User $user }
}}

This can decompose into several operations:

Get the value of the $address variable.
Get the primary key of the $user variable.
Prepare a database query with a rewritten query string which uses placeholders for the $address and $user variables to avoid SQL injection and other interpolation errors.
Execute the query.

That's a nice interface, but you can do better. As I suggested, you can add error checking if you know the structure of the database:

Get the metadata which describes the users table.
Verify that the required fields (address and user exist).
Get the value of the $address variable.
Get the primary key of the $user variable.
Prepare a database query with a rewritten query string which uses placeholders for the $address and $user variables to avoid SQL injection and other interpolation errors.
Execute the query.

You can take advantage of type checking too:

Get the metadata which describes the users table.
Verify that the required fields (address and user exist).
Verify that the type of $address is compatible with the type of the address field. Repeat for $user and user.
Get the value of the $address variable.
Get the primary key of the $user variable.
Prepare a database query with a rewritten query string which uses placeholders for the $address and $user variables to avoid SQL injection and other interpolation errors.
Execute the query.

If you know the structure of the database when the program starts, you can start to push some of this type checking to the point of compilation. (You may not be able to perform all of the type checking at compilation time, but you can do as much as possible as early as possible to prevent as many errors as possible.)

That's simple and easy. Now imagine something more interesting:

SQL {{
    SELECT name, address FROM users, addresses GIVEN { User $user }
}}

It's obvious from the syntax of the query language that the database needs to perform a join operation, and it's obvious that the primary key of the $user object is the important key of the operation. If the program knows the relationship of the users and addresses tables, it can join them effectively as well.

Don't get caught up in the syntax or the semantics of the remainder of examples here; they exist to demonstrate possibilities, not the final form of battle-tested code. Even so, imagine a dynamic query:

SQL {{
    SELECT @fields FROM { Table $table_one }, {Table $table_two } }
}}

Again the structure and intent of the code is obvious. The operations are now:

Find the primary keys for $table_one and $table_two.
Verify that they're joinable.
Verify that all members of @fields are present in either $table_one or $table_two.
Construct the query.

If I were to implement this, I'd make a join_tables multimethod. It takes two arguments (generalizable to more, but follow along with two for now). Imagine that it looks something like this:

multi join_tables( Table $t1, Table $t2 ) { ... }

multi join_tables( Any, Any ) { fail() }

Given two Table arguments, the first multi candidate matches and gets called. Given any other combination of arguments, the second candidate matches and produces an error.

Knowing that you have two Table objects isn't enough, however. The tables might have no relationship to each other. Imagine if you somehow could verify that the tables have an appropriate relationship. If I were to implement this, I might check that the keys of the tables matched types, perhaps with a syntax something like:

multi join_tables ( Table $t1, Table $t2 where { $t1.primary_key eqv $t2.foreign_key( $t1 ) } ) { ... }

That is, the keys must be of equivalent types. If one key is a user_id and the other is an Integer, the where clause won't match for this candidate, so a different multi will get called.

Now imagine that for those embedded SQL minilanguage statements where table name is available at compilation time and sufficient type information exists to verify the statements themselves at compilation time:

SQL {{
    SELECT name, address FROM { User users }, { Address addresses }
}}

... then everyone who uses this minilanguage (and has set up the table information appropriately) gets safety and correctness by default. Some of that can even occur before the program runs. The rest of it can occur as the program runs.

(A really, really good type checker and optimization system could infer that some errors are impossible even if it can't prove the use of a single type in every case.)

Now imagine that you have a language which allows you to build minilanguages like this, to build APIs which specify correct operations and fall back to good error reporting on incorrect operations, and which do so without interfering with other code and other extensions.

Welcome to Perl 6.

String-Plus

By chromatic on July 16, 2010 10:59 AM | 9 Comments

What does this variable represent?

my $thingie =<<'END';
Thaddeus Droit
4616 NW Washington Place
Beaverton, OR 97006
END

It's obviously an address, but what does Perl know about it? Perl knows it's a string. Perl knows it's some 60 characters long. Perl may even know that it's a valid string of Latin-1 characters.

Perl doesn't know where the string came from, nor that it contains a street address or a legal name nor a zip code (and not a zip + 4). Any meaning to the program beyond "It's a string of some 60 characters and is valid in the Latin-1 encoding" is far beyond what Perl knows about it. That's why the name of the variable is $thingie; even though Perl doesn't care about variable names, calling it $address instead could have led you to believe there's more structural meaning to this chunk of memory than actually exists.

Names are important, at least to people maintaining source code. This code is obviously wrong:

$user->set_address( $birthday );

... but to Perl it might as well be:

$foo->bar( $baz );

... for all of the semantic meaning it understands. There's no obvious intent.

I know you're smart and you're way ahead of me and you think "If I wanted a good static type system, I know where to find Haskell or OCaml and I'd never let code that bad get out of code review and why aren't you writing tests." but that's not the point. You can be super careful or make APIs which restrict the most natural way to write code in the host language in favor of extra security. That may be the right approach. (You have to be careful, though: the ease of interpolating untrusted user input into a raw string or the use of register globals in PHP seems analogous to the attractive nuisance doctrine, where people who don't know any better can't analyze the risk appropriately.

There may be another way.

Suppose I annotated the address:

my Address $thingie =<<'END';
Thaddeus Droit
4616 NW Washington Place
Beaverton, OR 97006
END

It's still a chunk of memory with certain characteristics, but now it has an extra piece of metadata related to the program itself (and not merely Perl itself). A clever compiler could detect certain places where the semantics of an operation don't match:

method set_address(Address $addy) { ... }

... though you do have to be able to resolve this kind of dispatch at compilation time to prove the type safety of the entire program at compilation time. (I've seen suggestions that even Smalltalk programs can resolve some 85-90% of dispatch targets in a static fashion.)

You don't have to go that far; runtime verification with a good test suite is effectve, can be fairly cheap, and is available right now in Perl 5 with Moose.

There's still another way. Consider again the untrusted input example. If you enable tainting, you might read user input into an address:

my Address $untrusted_addy = $req->get( 'address' );

You don't see it in the declaration, but the "This is tainted!" metadata is present in $untrusted_addy. How do you deal with that?

You could be picky about always untainting untrusted data, but can you do that accurately and effectively? Can you rely on everyone always getting it right?

What if you could write:

SQL {{
    UPDATE users SET address = { Address $address } WHERE user = { User $user }
}}

... and Perl could verify that $address is an appropriate Address (and $user is an appropriate User), could quote and escape and validate both of them effectively, could extract the primary key from $user, and could untaint any tainted $address or $user?

If your language supports multiple dispatch, lets you define your own types, lets you override stringification, and can override interpolation for cases like these, you can do such things.

In other words, you could turn what would otherwise be a raw string into an embedded little language with its own syntax and semantics, interoperate with native data structures in the host language, and provide composable safety—and users don't have to know much of anything about how this works, as it pretty much does what they expect.

I can imagine a language like that.

Strings and Security and Designing Away Bugs

By chromatic on July 14, 2010 9:23 AM | 4 Comments

Some people believe that security problems and other severe bugs are inevitable. Some of these people believe that conscientious design and clear thinking about how languages and APIs work is irrelevant; bad code is possible in every language.

Bad code is possible in any language and wrong code is possible with any API. Even so, it's possible to create languages and APIs which make the right thing so much easier than the wrong thing that only the most incompetent (or dangerously malicious) write bad code.

Imagine, for example, a database access layer which forbids the use of raw strings to create SQL queries. You might have to write:

my $sth = $dbh->select( @tables )->join( %relations )->where( %conditions );

That's not necessarily a beautiful interface dashed off after a moment of thinking, but it has an important security property: it avoids the interpolation of untrusted user input. All data sent to the database may go through a quoting or untainting process without the user having to remember to do so.

A similar library could help avoid malicious user input from interfering with the display or operation of a web site, for example. These are both specific cases of a general principle: replace unstructured string data with structured data. In both cases, the structure of the data makes the intent of the data clear, which allows the library to ensure as much safety as possible.

This principle has other implications as well; more on that next time.

The Urge to Brag

By chromatic on July 10, 2010 9:16 AM | 4 Comments

Way back in the late '90s and early 2000s, many Perl fans could rattle off a list of big projects using Perl: Slashdot, Amazon.com, IMDB. Eyebrows popped up (maybe at one point), as if the fact that billions of dollars of online sales went through Perl were validation of a language.

Perhaps it is.

Today much of the online Perl community discussion reads as reactionary, at least to me. Some random Internet argument will degrade into "Perl? Isn't that insert negative description here?" versus "Nuh uh, Perl isn't insert negative description here!"

Me, I'd rather hear about interesting new projects written in Perl. Take the recent Duck Duck Go written in Perl story. Repeat this a few dozen times (especially with new projects created in the past year or two) and responses will move from "Perl? People still use that?" to "Wow, people who know Perl can certainly do a lot of interesting things!"

I'd rather see the latter message spread than almost any other marketing message—so tell the world, what are you working on with Perl?

Don't Parse That String!

By chromatic on July 7, 2010 12:50 PM | 6 Comments

Defensive programmers anticipate what might go wrong. Robust code handles the unexpected, partly by minimizing the surface area of potential problems. The fewer things that can go wrong, the fewer things that will go wrong. (Things will still go wrong, but you can write safer code if you're clever.)

Yuval Kogman asked Are we ready to ditch string errors? I am; there's a general principle of API design beyond his question.

One problem with die "Some error!" is how to identify what error that represents—not to a programmer or user, who ostensibly speaks enough English and problem domain jargon to have some idea of what the error means—but the rest of the program. How does your code catch this error and distinguish it from some other type of error? Can you determine which of the two you can handle and which you must delegate?

Break out split or the regular expression engine and prepare to write heuristics which guess, and woe to you if someone someday internationalizes your error messages or runs all of your exceptions through a logging mechanism which changes their formatting slightly or....

The problem is that you can't take advantage of the structure of the exception data because it's not present in the string. The same goes for DBI's connection strings:

my $dbh = DBI->connect( 'dbi:DriverName:database=database_name;host=hostname;port=port' );

As the documentation suggests in the very next sentence:

There is no standard for the text following the driver name. Each driver is free to use whatever syntax it wants.

Compare this to a keyword argument form:

>my $dbh = DBI->connect(
    driver   => 'DriverName',
    database => 'database_name',
    host     => 'hostname',
    port     => 'port',
    extra    => 'arguments',
);

This has several advantages. The method doesn't have to guess (or parse) the string. The layout and vertical alignment makes the keyword form easier to read and to modify. DBDs can decorate and augment this argument list without parsing and recreating a string. Verification and default arguments are much easier.

The same argument goes for using a module such as File::stat instead of parsing the output of `ls -l filename`.

The same argument goes for... you get the point. It's far too easy to unfold the regex widget from the swiss-army chainsaw when a little bit of caution decomposing data into structured data makes your programs safer, easier to use, more flexible, and more robust.

(I consider sometimes how a language would look if it had only keyword arguments and how you could optimize them with immutable, internable strings and cached call sites and a zero-copy register allocation mechanism, but I made it as far as writing a self-hosting garbage collector before I had real work to do.)

Hire AND Train

By chromatic on July 5, 2010 12:45 PM | 10 Comments

A popular lament heard from business is "It's difficult to find Perl 5 programmers!"

I can imagine that it's difficult to find good Perl 5 programmers. most of the really good ones I know have full-time equivalent employment (and plenty don't want to move to the greater London metro area for 20,000 GBP per year or San Francisco for $40,000 a year). It's also difficult to evaluate the skill of any self-proclaimed Perl 5 programmer; it's easy to write baby Perl, but it's not always easy to know how to become a better Perl programmer.

Three possibilities present themselves:

Improve the ways in which they compete to find good Perl 5 programmers, such as offering larger salaries or telecommuting or better perquisites. (Few companies do this.)
Give up. (Anecdotal evidence suggests some companies have done this; it's easy to throw a few thousand dollars a year to find cheap PHP development.)
Train good programmers.

Five years ago, the latter might have been daunting. Now I can imagine that a motivated consultant could put together a customized hiring and training course for a specific company in a specific industry to identify the skills necessary (Perl and otherwise) for new hires as well as the skills necessary for existing developers.

I can imagine that new employees should read Perl Best Practices and should walk through Perl::Critic policies on their first day. I would love to see them handed a copy of Effective Perl Programming, 2e.

Perhaps I demonstrate no small hubris, but I hope that Modern Perl: The Book can fill in any of the gaps of an experienced (but still novice) Perl 5 programmer as well as explain how Perl 5 works to a new Perl 5 programmer. In short, my intent with the book is to help novices and neophytes become adepts. I believe we can achieve similar things with many of the tools developed during the Perl renaissance.

The important question is how to convince businesses to take advantage of this renaissance. In effect, we have to demonstrate that (like many other job skills) Perl 5 is something easy and effective to teach a motivated worker.

Luck and the Class Struct API

By chromatic on July 2, 2010 1:01 PM

Class::Struct has been a core module for ages. (Previously it was Class::Template, but a great renaming occurred 13 years ago.) If you've never seen it before, it might remind you a little bit of Moose:

package Cat;

use Class::Struct;

struct( name => '$', age => '$', diet => '$' );

You don't get all of the benefits of Moose, but you do get attributes and accessors. You also get a default constructor.

Of course, the default constructor reads something like:

{
    package Cat;
    use Carp;

    sub new
    {
        my ($class, %init) = @_;
        $class = __PACKAGE__ unless @_;
        ...
    }

    ...
}

If that emboldened line is curious to you, it's curious to me too. I saw a note in one of the test files somewhere suggesting that the purpose of this was to allow you to write:

package Cat;

my $cat = new();

I don't know why you'd do that, however. In what kind of object design does it make sense to create objects of a class from within that class? (That seems like a violation of responsibilities to me.) You can also write:

package NotCat;

my $cat = Cat::new();

... though that's exceedingly fragile. For one thing, it implies that you could also write RobotCat::new()—assuming that RobotCat extends Cat, but avoiding method dispatch for calling a constructor means that RobotCat had better provide its own new() which behaves as a function as well as a method. (Even if you somehow convinced the subclass to inherit the superclass's function through some sort of exporting scheme, the hardcoded __PACKAGE__ would hurt.)

Hardcoding a method dispatch as a function dispatch means that the maintainers of Cat are not free to change how Cat provides its constructor, much for the same reason.

Woe unto you if there's an inherited AUTOLOAD somewhere.

I realize that in 1994 or 1995, people who wrote OO code in Perl 5 might have had familiarity with OO in C++ (where this syntax makes a little more sense) or, perhaps, Java where the indirect constructor call (the my Cat $cat = new Cat; is prevalent), but the benefit of hindsight is that experienced Perl 5 programmers can look back on this API a decade later and cringe at its potential for misuse.

If you're lucky, everything will go right—but what kind of a defensive programmer relies on luck when designing an API?

When Sugar and Semantics Collide

By chromatic on July 1, 2010 11:10 AM | 11 Comments

I use Moose to explain object orientation in Perl in the Modern Perl book. It's much easier to explain the what and why of OO with syntax like:

{
    package Cat;

    use Moose;

    has 'name', is => 'ro', isa => 'Str';
    has 'age',  is => 'ro', isa => 'Int';
    has 'diet', is => 'rw';
}

... than the corresponding code where you must write your own accessors, poke into a blessed hash directly (and bless it yourself), perform your own coercions and verifications, and the like.

Of course, the preferred syntax for doing this within the Moose documentation is different from how I've done things. Moose recommends:

{
    package Cat;

    use Moose;

    has 'name' => (is => 'ro', isa => 'Str');
    has 'age' =>  (is => 'ro', isa => 'Int');
    has 'diet' => (is => 'rw');
}

Sometimes you quote the name of the attribute and sometimes you don't.

Should I drop the parentheses? Should I drop the fat arrow between the name of the attribute and its specializers? I do in my own Moose code for my preferences, and I did in the book. Then I thought about it and realized why I write code this way.

First, a digression. Perrin Harkins mentioned the inability of the "Takes a block!" prototype to replicate builtin syntaxes as a reason to dislike syntax-bending modules such as Error. For example:


use Error ':try';

try
{
    ...
}
catch
{
    ...
};

... really needs that trailing semicolon. For similar reasons, many modules which use Devel::Declare magic go through contortions to add trailing commas and semicolons. Perl 5's syntax is malleable, but when the parser wants something from the lexer, it really really wants something from the lexer. (When it wants to know that a statement or a group of terms has ended, you don't get to lie.)

In other words, even though you have a lot of options for mangling Perl 5's syntax any way you like it, the semantics of the host language will shine through. A parenthesis is a parenthesis. A labeled block is a labeled block. A bare sub { ... } is never an expression on its own, and it can never terminate an outer expression.

This is one of the downfalls of the so-called "embedded domain specific languages". If you haven't written your own parser, you'll have to take what you can get. This is even true if you do write a parser and generate and eval code, and it's especially true if your EDSL desugars to chained function or method calls.

I'm not suggesting a flaw with Moose's approach: it's clever and Perlish and doesn't succumb to the saccharine cutery of so many other so-called DSLs. (To my knowledge, no one in the Moose world has claimed it's anything other than Perl 5 syntax bent slightly into something which looks declarative enough.)

My concern—especially when explaining object orientation in Perl 5 to novices—is that any extra syntactic elements might confuse people to think that they mean more than they mean. You and I might both understand that the grouping parentheses in the Cat attribute declaration are merely visual hints to the reader that the specializers are subordinate to the attribute itself and that the fat arrow between the name of the attribute and its grouped specializers confers the notion of pairing between the attribute and its specializers, but how do you explain that to someone who's still struggling to figure out what this encapsulation thing is all about?

I've attempted caution throughout the book such that the fat comma always signifies a pairish relationship, such as for hash keys or named arguments. Certainly you can always use it in place of the skinny comma (and, barring any quoting changes, vice versa), but is it clear to do so?

Likewise, you can wrap parentheses around almost any old rvalue (barring precedence changes) and not change the behavior of lists, yet this confuses novices all the time:

my @lololol = ( 1, 2, ( 3, 4, (5, 6) ) );

I'm not criticizing the Moose documentation or the standard approaches to formatting Moose code. I'm not suggesting a change. I don't like deviating from community standards for declaring Moose attributes. Even so, avoiding the need to explain the equivalencies of syntax to people for whom learning syntax is still a really big deal is itself to me a big deal.

« June 2010 | Main Index | Archives | August 2010 »