January 2010 Archives

Subtle Encouragement Toward Correctness

By chromatic on January 28, 2010 3:38 PM | 4 Comments

I've been writing the Moose section of the Modern Perl book for the past week. Stevan (and other people) suggested that I explain how to create and use objects in Perl with Moose before explaining the bare-bones blessed reference approach. They were right.

I'm assuming that readers don't necessarily have the theoretical understanding of how objects work and why, of why Liskov substitutability is important, of what allomorphism means, and why polymorphism and encapsulation are much more interesting than inheritance. I don't even assume that readers know any of those words.

Yet I've noticed something far more interesting.

The standard approach to teaching Perl 5 OO (at least, the one approach I've seen that works) builds on the Perl 5 implementation ideas. That is to say, "a class is a package, a method is a subroutine, and an object is a blessed reference". If you know how to work with references in Perl 5, you can use Perl 5 objects.

That's true in the sense that a blessed hash reference is still a hash reference. That's false in the sense that treating an object as a struct with a vtable pointer is a terrible way to write robust OO. (I should know; I've been hip deep in multiple implementations.)

I like a lot of things about Moose, but what I appreciate most from a didactic standpoint is that I can explain object attributes:

{
    package Cat;

    use Moose;

    has 'name',        is => 'ro', isa => 'Str';
    has 'diet',        is => 'rw', isa => 'Str';
    has 'birth_year',  is => 'ro', isa => 'Int',
                       default => (localtime)[5] + 1900;

}

... and it doesn't occur to readers that they can poke into a Cat instance directly (even though they can). Moose encourages people to do the right thing by using accessors and respecting encapsulation and polymorphism and allomorphism and substitution by making something different--encapsulated access to instance data--look different from the well-understood mechanism of its implementation.

Objects may still be blessed hashes, but users treat them differently because they have different expectations.

In writing the examples for this chapter, I changed the implementation of the class to make correctness easier (and to discuss the value of immutable objects). The refactoring was trivial, thanks to Moose features, but the interface of the class could stay the same, thanks to Moose's subtle encouragement to program to an encapsulated interface.

I always enjoy encountering such a serendipity in code, and I made sure to mention it in the book. The Perl world needs more such serendipities.

Divine Meaning from Meaningless Numbers

By chromatic on January 25, 2010 11:22 AM

perl5i is back on the CPAN. perl5i is important because it may help shape the future of Perl 5. (Perl 5 experts and CPAN cognoscenti already know how to add dozens of pragmas and utility modules to every Perl 5 file they write, but that's annoying even for us and inaccessible to the other six and a half billion people on the planet.)

When people writing about Perl 5 in 2010 can find better answers in the sparse Ruby documentation than the Perl 5 documentation, something is wrong... but that's a far different story.

perl5i is again available and you should experiment with it. Why was it gone so long? The old nemesis of confusing version numbers.

The obvious use of perl5i is:

use perl5i;

... but what does that mean? If it's difficult to discern the version of a language used without explicit notation, how much more difficult to discern the version of a pragma or CPAN distribution intended? A CPAN author could upload multiple versions of a distribution in a day, each one with a subtly different API or semantics. How do you know which version you have? How do you know which version your code needs?

How do you know when you should upgrade and when you should keep the existing version? How do you know when an upgrade will change the behavior of code in a positive or negative way?

How do you write reliable, redistributable software which depends on external components with their own ideas about stability of interface?

Schwern has posted some thoughts on perl5i version numbering, distribution, and use, but this all is an attempt to cram too much meaning in a single number.

I'm starting to believe that the best approach is to use a regular release cycle -- perhaps every three months -- and support only the most recent couple of releases. The interface may change with every quarterly release. Interim releases can fix bugs. Use a date modifier as the argument to import() for best reliability:

use perl5i as_of => '2010-01-25';

Stop fussing with the MAJOR.MINOR.PATCHLEVEL scheme and "What constitutes a major API change?" and "But I just incremented MAJOR last week, isn't it too sooooooon?" and "What if someone wants to use a really old version and reports a bug?" distractions. Let's stop trying to work around change. Instead, let's take advantage of change to produce improvements.

Essential Skills for Perl 5 Programmers

By chromatic on January 20, 2010 6:32 PM | 3 Comments

Every time I explain something in the Modern Perl book under development, I have to change the way I think. I've spent a decade writing Perl 5, testing Perl 5, writing about Perl 5, editing writings about Perl 5, and thinking about how to do all of those. I still learn new things, but I haven't been a novice for a very long time.

Mature projects need the perspective of determined and intelligent novices to help find gaps in tutorials and documentation. It's too easy to assume that the mental model experienced users have is obvious for novices. After all, the design is clearly an effective design for the problems it has to solve.

The problem is two-fold. First, the novice may have very different problems and assumptions when approaching the software. Second, the expert mindset may be implicit: the result of experience developing the software, not approaching the problem fresh.

Any good documentation or tutorial intending to give novices practical experience must explain essential pieces of the model while avoiding too much explanation or gratuitious details. That's difficult to do in technical writing. (That's why most technical writing is passable at best and often atrocious.)

This is a long introduction to explain how I've spent a lot of time thinking about concepts that novices need to understand to take advantage of modern Perl. There are several:

Context: how it works, how to identify it, and how to take advantage of it. This includes both void/scalar/list context but also boolean/integer/string/numeric context.
Using perldoc: to review syntax and builtins as well as modules
Creating, managing, and using modules: including the packaging, testing, building, and deployment systems
Installing CPAN modules: especially with tools such as local::lib and, perhaps, a local CPAN mirror
Using the Perl 5 analysis tools: not limited to testing modules, Perl::Tidy, Perl::Critic, and B::Deparse

I thought about including "References and data structures", and I may do so. I left out OO on purpose. The same goes for most syntax; those are all learnable. This list tends toward the philosophical on purpose. These are necessary to understand Perl 5 and to take advantage of it (especially for further learning).

If a novice learns these five things, he or she is in good position to use Perl 5 effectively for almost any task. Leave out any one and you've added friction and frustration. Understand them and you can do almost anything in Perl 5.

Types, Invocations, and Designing Bugs Impossible

By chromatic on January 15, 2010 10:35 AM

Perl 5's type system has flaws. Those flaws are fixable (with a supreme act of will, lots of patience for discussion on p5p, and ... years of waiting for the state of the art in writing Perl 5 code to catch up with the historical baggage of a decade and a half of buggy code).

Are they preventable?

One sign of effective design is when people can use the feature correctly without training. Subtle design cues should encourage them toward proper uses and away from ineffective and dangerous uses. My paper shredder has a feed slot too narrow to contain my fingers, so it's unlikely for me to harm myself with the default operation. Of greater interest is the feature by which it refuses to operate if the top section with the blades has tilted — if I have removed that section to clear a paper jam, I don't want the blades to run. Arguably I should turn the shredder off and unplug it (and I do), but the danger is sufficiently great that the design actively protects my tender fingers even if I have forgotten to do so.

I've argued before that the lack of the right way to inspect capabilities of Perl 5's primitives causes bugs. Several design misfeatures combine to cause these problems, however.

People want to know what they can do with objects and references. The desire may be for defensive coding, or it may be to take advantage of genericity and polymorphism. Both are valid uses.

People can know some of this information through runtime type checking and reflection. Perl 5 offers some possibilities here, but it often answers the wrong questions. Worse, performing these checks safely requires a lot of code with a lot of subtleties to allow a lot of rare cases that are extremely important when they occur.

Consider the unfortunate case of UNIVERSAL::can (the CPAN module, not the method). By now, you should know why I believe that calling methods as functions is a mistake. U::c replaces the default can() method with a custom variant which warns when invoked directly on an invocant which has its own can() method.

That's the intent, anyhow.

The logic is simple: if I've overridden can() and you ignore that by calling UNIVERSAL::can( $instance_of_my_class, 'some_method' );, you've introduced a bug. This is not an academic, ivory tower concern over purity. I have a fairly popular CPAN module which relies on you not writing buggy code to work properly, and I've had way too many false bug reports that my code doesn't work because of this bug.

Unfortunately, U::c is unreliable because Perl 5 doesn't give sufficient information to know how control flow eventually wound up in its can() method. The current approach works 80% of the time; if the invocant has an overridden can() and the caller of UNIVERSAL::can() isn't a function or method named can, it's probably a bug.

That is, it's okay for an overridden can() to call UNIVERSAL->can(), because they've probably done so through SUPER::can(). In all other cases, someone's probably called it directly as a function, because if they've called it as a method, they'd have ended up in the overridden can() instead.

This is all a workaround for the fact that it's very difficult to tell how any particular invocation happened in Perl 5. Within pure Perl, I know no way of asking "Did a method call end up here?" or "Was this a function call?" If I could tell that, I wouldn't need this workaround.

I could write code which grabs the calling code, dematerializes it to its opcodes, walks the optree until it reaches the appropriate position of the call, then looks for the op which performs method dispatch, and I know how to do all of that, but that requires lots of internal introspection I don't want to write, introduces a few more heuristics which are tricky to get right, will be substantially slower, completely fails for XS calls, and is a lot more work than I want to perform for this task, especially when I could be doing something much more fun. (Trying to help people not write buggy code when they don't realize it's buggy and they don't want to hear it anyway is much less fun than almost anything else.)

The current heuristic has some awful flaws too. Consider this code, inspired by actual code in autobox:

sub gen_override_for_class
{
    my $class = shift;

    my $can_override = sub
    {
        my $self = shift;
        return $class->SUPER::can( @_ );
    }

    no strict 'refs';
    ${ $class . '::can' } = $can_override;
}

autobox creates classes named SCALAR, HASH, ARRAY, and the like. You can call methods on references of those types. The gen_override_for_class() function installs a can() method in those classes which dispatches to the correct package. (If you don't understand the rationale for redispatching, that's fine.)

Unfortunately, the U::c heuristic fails here... because the generated method is an anonymous function without the all-important name of can. Yes, it's in the right slot in the namespace, and it's a proper call of UNIVERSAL->can(), but U::c gives a warning in this case because it can't tell that this is a method call.

A correct use of methods in Perl 5 causes a warning because code that tries to detect incorrect uses of methods in Perl 5 can't determine if a particular invocation is a method or a function call. People use methods as functions in Perl 5 in this case because getting the method form right is difficult. People use these functions in Perl 5 because getting the type information for primitives is difficult and subtle.

If you believe in irony, autobox should make all of the introspection easier by allowing you to call methods on primitives, adding genericity and polymorphism where Perl 5 needs it the most.

That's several bugs all jammed together in something I'm not sure I can fix. Perhaps the best approach is to add a warning flag to Test::MockObject to enable U::c and UNIVERSAL::isa, so that they're not on by default and so that people getting weird behavior from buggy code will at least have the option of figuring out that the bugs are in code that uses methods as functions and not in T::MO... but I despair, considering the flood of new bug reports.

Some of this problem comes from Python, which also makes little distinction (syntactic or semantic) between functions and methods:

class Foo(object):
    def bar(self):
        print self, ': bar'

def baz(param):
    print param, ': baz'

Foo.baz = baz

foo = Foo()
foo.bar()
foo.baz()

Yes, I deliberately obfuscated the Python code by naming the parameter to baz param instead of self. (Thus I disprove the claim that it's impossible to write unreadable Python.) Even still, Python does get this behavior more correct, in that grabbing the first-class function from either the class itself or from an instance produces a first-class function that knows it's all objecty:

quux = Foo.baz
quux('Not an object')

TypeError: unbound method baz() must be called with Foo instance
    as first argument (got str instance instead)

quux = foo.baz
quux('Not an object')

TypeError: baz() takes exactly 1 argument (2 given)

Compare that to Perl 5, where you can slap any old argument in that unspectacular invocant slot and get... well, you get all of the pieces when it breaks.

Sure, at the lowest levels in a VM or a processor core, the invocation mechanism is "shuffle some args around, keep track of the current location in code, then branch somewhere else" regardless of whether you've invoked a function, a method, a coroutine, a continuation, or an exception. That's fine. Stack those turtles as high as you can.

At the language level, however, they're all very different. A language design should encourage people to treat them differently, even if there's only one stack of turtles, else the apparent consistency may be a foolish and tempting consistency which produces subtle inconsistencies. You can't prevent malicious or incompetent people from doing malicious and incompetent things and you shouldn't prevent clever people from doing clever things.

I believe it's possible (and good!) to encourage the rest of us to do smart and safe things.

How to Add Allomorphism to Perl 5's Primitives

By chromatic on January 13, 2010 10:17 AM

If I had sufficient time and patience to write the patch and argue about it on the Perl 5 Porters mailing list, how would I implement optional types for Perl 5?

(Dave Rolsky gets it right in the comments: this has to be part of the core to canonize the names of the roles the Perl 5 primitives provide. This idea is worthless when more than one CPAN implementation exists, if they diverge on naming conventions — and, for all I like the CPAN, divergence is inevitable.)

If I had my way, I'd add a new keyword and op combination. I like does. (In my imaginary fantasy world, I win all arguments through the unassailable forces of logic and good taste alone. Also I have a hovercraft.) I'd also add a table to store the behaviors that builtins support. That is, a hash always does Hashlike and a hash reference always does Reference and Hashreflike. Similarly, a regex does Stringifies (or Stringlike).

It's important to leave bikesheds so people on p5p have something to argue about other than the implementation.

I'd also add a storage location within classes to keep track of all of the roles class instances perform. This is necessary because....

... tie and overload have to update that does list depending on the behaviors they support.

does might end up performing separate operations in boolean (scalar) and void context. In boolean context, it returns true if the first argument performs the appropriate roles. In void context, it marks the current class as performing the appropriate roles. (This is nice if Perl 5 ever gets a class keyword; it falls out as nicely as the syntax for MooseX::Declare.)

The method form is already available thanks to UNIVERSAL::DOES. Those two forms can tie together nicely.

This produces allomorphism for objects (or classes or anything which supports method invocation) as well as non-invocants (Perl 5 primitives).

The remaining question is what to do with ref(), which doesn't work and tends to break code.

...

... and there, my mythical patch removes ref(). Forget backwards compatibility. Sometimes writing correct code in the present and future means removing the ability to write incorrect code and requiring people to fix broken code in the present.

The nice part about this imaginary patch (besides having neither to write it nor argue for it) is that it's perfectly compatible with further imaginary patches to add function signatures to Perl 5 -- with type checking, of course -- and a class syntax and even multidispatch. All of those features are perfectly optional.

Of course, ref() could stay in that world. The possibility exists that the temptation to use shiny new features which remove boilerplate, work more correctly, provide better abstraction possibilities, and simplify the design and implementation of code would encourage people to adopt better programming practices and remove dangerous, ugly, ill-designed, and confusing practices from their code. Sometimes that even works.

(These features all work in Perl 6 today. Try Rakudo now.)

Optional Types and Perl

By chromatic on January 11, 2010 12:20 PM | 3 Comments

This punchline comes first, so if you've already made up your mind about type systems and dynamic languages and Perl 6, you can read something else: optional typing solves a lot of tricky problems in Perl 5.

When the desires for modularity, genericity, and correctness conflict, Perl 5 programmers have to fall back on heuristics. You can write quick and dirty and loose code that accepts whatever it gets and tries to do something sane with it, but you give up some ability to produce good error messages when something goes wrong and you run the risk of misinterpreting the intent of other programmers. The amount of risk varies on what you do and why, but it's there.

You can write strict code that checks everything against a severe whitelist, but you run the risk of forbidding people from doing necessary things and forcing them to contort their solutions in unnatural ways to conform to what you predicted they would need. The amount of contortion varies on what you do and why, but it's there.

(This dialectic informs the design of programming languages as well.)

My recent posts have described the conflict between encapsulation and manual type checking in Perl 5, how the available Perl 5 primitives provide wrong and misleading and incomplete information about types and capabilities, and why robust Perl 5 programming should worry more about what an object does than how it does it.

If you accept the idea that your Perl programs can be more robust and that error checking and loosely-coupled genericity are both desirable, the next obvious question is how to achieve both. Sadly, you can't — not as Perl 5 exists today.

The question Aristotle and I have debated in comments is how to judge the intent of two programmers: the one writing the API and placing constraints on parameters and the one using the API and providing arguments. The problem with heuristics is subtle but pervasive. The only evidence available to judge that intent is indirect. Certainly on the callee side I can detect whether a provided parameter is an object (a class less so), or whether it has overloading, or whether it uses a hash reference for its representation.

I can't tell why.

On the caller side, it's equally as important to know what the callee expects from its arguments. Here it's safer to assume that a function which takes a hash reference will treat that argument as a hash, but it's not always that easy. Will a function modify a string in place? (Can your language modify strings in place?)

Documentation can help... but if I believed documentation quantity made up for poor design, I'd use PHP.

Here's the punchline again. If Perl 5 supported a mechanism by which I could write the callee:

sub retrieve_or_calculate (Key $key, Hashlike $cache)
{
    $cache->{$key} //= calculate_expensive_value($key);

    return $cache->{key};
}

... then on the caller side I could provide anything which performs the Key and Hashlike roles and the code would work as expected. Obviously I'd have to do a little bit of work on the caller side to ensure that if I passed anything other than a simple value for $key and a hash reference for $cache that they had the appropriate annotations to mark that they performed the appropriate roles... but the obvious code you'd write by default without doing anything spectacular could work without modification.

Don't get caught up in the function signatures or the implicit compiler-provided error checking. Think about how this imaginary feature not present in Perl 5 has removed the need to guess from both sides of this call. Of course it would be optional. Of course you don't have to use it. (Of course it implies the presence of other features, which I'll discuss in my next entry.) Of course you can write copious tests to prove that this isn't a problem in practice for every use case you imagined when you wrote the tests.

Yet look at how it solves a very real robustness and correctness problem in existing Perl 5 code. Perl 6 supports this feature for a very good reason. (You can use it today in Rakudo.)

If none of this convinces you, MJD as usual explains things better than I do. See Strong Typing Doesn't Have to Suck (1999) and his Atypical Types from OOPSLA 2008.

Genericity, Serendipity, Surety

By chromatic on January 8, 2010 3:36 PM | 4 Comments

I read everything Aristotle Pagaltzis writes carefully. I often agree. You should too.

Aristotle responded to my practical/philosophical tirade against manual type checks in Perl 5 APIs with a comment that bears consideration:

Many objects are blessed hashes. However, most of the time, that is an implementation detail rather than an advertised part of the API... the question you should ask is not "can I treat this like a hash?" but "am I supposed to treat this like a hash?".

He's right. That's a potential design flaw in my suggestion to attempt to treat what should be a hash reference as a hash reference. Of course, the opposite design flaw is to disallow perfectly valid values that should work just fine if you treat them as hash references.

You get to pick which theoretical axe to grind there.

(Aristotle also recommends the use of UNIVERSAL::ref to allow objects and classes to provide their own ref() methods to return appropriate values. That's a good solution if you're working with code that uses the ref() technique to check for type appropriateness. If you're working with library code which uses a mishmash of all of the various techniques which catch some cases but not others... that's where the simplicity of eval { ... } shines for me. Do consider UNIVERSAL::ref, however. Along the same lines, Burak Gursoy noted that Scalar::Util::Reftype has a better API than reftype() from Scalar::Util.)

Your theoretical bias determines the way you write your APIs, even if you don't realize you have a theoretical bias.

If you take my approach, you give the programmer the responsibility of not passing in objects which shouldn't be dereferenced as hashes. If you pass in a blessed hash reference, my code assumes you intended it to work like a hash reference. Yes, you could pass in an object accidentally, but I prefer to allow people to do clever things (like passing in an overloaded object) when necessary, rather than forbidding them.

If you take the other approach, you give the programmer more safety against accidentally passing in the wrong thing unintentially while removing some possibility for cleverness that may be necessary in some cases. Please note: I'm not saying that this is what Aristotle himself prefers; I merely used his comment to illustrate this possibility.

My approach is not always right in every circumstance. I don't always use it in every circumstance. Yet I use it as a design principle: I don't want to forbid intelligent people from doing clever things I didn't intend because I didn't imagine a concrete use for them when I designed the API. Quite the opposite! I want people to use APIs I write to do things I couldn't imagine. That, to me, is a sign of success.

Some people will abuse them. Some people will misuse them. Good documentation and great examples help. (Modern Perl schadenfreude means shaking your head in disbelief when you see the dreadful my $io = new IO::Socket::INET->new( ... ) idiom repeated in 2009.)

Yet my experience writing copious tests for and maintaning plenty of Perl 5 code suggests that treating types checks as "What can you do?" not "How do you do it?" or "What are you?" improves genericity, improves reusability, and expands the possibilities for happy serendipities. I can't prevent inexperienced or bad or malicious coders from doing bad things with my APIs, but I can allow disciplined and smart and well-cultured programmers to do great things with them by not getting in their way.

Perl Type Checks versus Encapsulation

By chromatic on January 6, 2010 3:03 PM | 4 Comments

You code defensively. You like contracts and preconditions. You like to ensure that people don't misuse your API accidentally. You warn early, and you warn often.

This is all well and good, except.... Suppose you have an API which takes a hash reference. You want to retrieve something from an optional cache, or calculate a default value. You write:

use Modern::Perl;
use Carp 'croak';

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    # buggy; do not use
    croak "callee is buggy; I need a hash ref!"
        unless ref( $cache ) eq 'HASH';

    $cache->{$key} //= calculate_expensive_value($key);

    return $cache->{key};
}

This code looks good, except that it's buggy. (If you've read Is It, Can It, Does It, and Robust Perl 5 OO you already know why.)

The function expects a hash reference. This is just as much a hash reference as anything else:

my $cache = bless {}, 'SecretMonkey::Cache';

... but try to pass it into that function. Boom. It's indeed a hash reference, but it won't pass the ref() test.

That's fine. There are ways around that. Pull out the big bag of "Features that (mostly) should be in the core but aren't", spelled Scalar::Util, specifically reftype():

use Modern::Perl;
use Carp 'croak';
use Scalar::Util 'reftype';

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    # buggy; do not use
    croak "callee is buggy; I need a hash ref!"
        unless reftype( $cache ) eq 'HASH';

    $cache->{$key} //= calculate_expensive_value($key);

    return $cache->{key};
}

This catches the case of a blessed hash reference, but now it throws warnings because reftype() returns undef when passed a non-reference. That's easy to fix:

use Modern::Perl;
use Carp 'croak';
use Scalar::Util 'reftype';

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    # buggy; do not use
    croak "callee is buggy; I need a hash ref!"
        unless ( reftype( $cache ) // '' ) eq 'HASH';

    $cache->{$key} //= calculate_expensive_value($key);

    return $cache->{key};
}

... except that that gets into "Wow, that's kind of ugly!" territory, and Devel::Cover doesn't like it. (Any time an ancillary tool suffers from an API design decision and you have to make a choice between penalizing otherwise good and reliable metrics and working around API deficiencies, you should... aw, forget it. Something in the DarkPAN probably relies on exactly this behavior. Maybe.)

Of course, poking directly into the internals of an object is wrong. That's the internal representation of an object. Fortunately, you can protect against this behavior by changing the representation of SecretMonkey::Cache:

package SecretMonkey::Cache;

use overload '%{}' => \&treat_as_hash;

sub treat_as_hash { ... }

sub new
{
    bless [], shift;
}

Now the $cache object behaves as a hash (thanks to overload), except that reftype() still pokes in the guts of the object to discover a blessed array reference, not a blessed hash reference. Oops.

Some people pull out UNIVERSAL::isa() here and they are wrong. Many of them balk at using reftype() because it's in a separate (but core) module, because its API has that nasty undef quirk, or because they don't know it exists. This buggy code has the same problem with regard to hash dereference overloading:

use Modern::Perl;
use Carp 'croak';

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    # buggy; do not use
    croak "callee is buggy; I need a hash ref!"
        unless UNIVERSAL::isa( $cache, 'HASH' );

    $cache->{$key} //= calculate_expensive_value($key);

    return $cache->{key};
}

One option is to override the isa() method within SecretMonkey::Cache:

package SecretMonkey::Cache;

sub isa
{
    my ($self, $class) = @_;
    return 1 if $class eq 'Hash';
    return $self->SUPER::isa($class);
}

... but that fails when people use ref() or reftype() or UNIVERSAL::isa() as a function, not a method. The checking code could change to work around this — in truth, it must:

use Modern::Perl;
use Carp 'croak';

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    # buggy; do not use
    croak "callee is buggy; I need a hash ref!"
        unless $cache->isa( 'HASH' );

    $cache->{$key} //= calculate_expensive_value($key);

    return $cache->{key};
}

... except that it's okay if $cache isn't a blessed reference. In fact, the intent of the code is not to handle blessed references. Passing in an array reference should cause the "I want a hash ref!" exception just as much as passing in an object that can't behave as a hash ref should.

If you're willing to go the autobox::Core route, you can use UNIVERSAL::DOES() instead (see The Why of Perl Roles), provided that the core data types do override DOES() appropriately and that overload sets it by default (which it should) and....

Ultimately that's the right solution, but Perl 5 has littered its glorious history with the detritus of several mostly not entirely wrong half-solutions which permeate both the CPAN and the DarkPAN with poorly understood traps for people trying to design, maintain, and write perfectly valid and theoretically correct code, and ... well, if you put scare quotes around the word "working" when you say "Don't break working code!", the irony police probably won't haul you away, even if they should.

The point is simple. The question this API really should be asking is not some question of "Is this a hash reference?" or "Is it a blessed hash reference?" or "Is it a blessed reference with hashlike overloading?" or "Is it a tied variable that understands how to perform a keyed fetch operation?". The question this API really needs to ask is "Can I treat this like a hash?" The game is lost at the point where the API cares what $cache is; all that matters is what $cache does.

You could probably get some combination of all of these checks working with:

use Modern::Perl;

use overload;

use Carp         'croak';
use Scalar::Util qw( blessed reftype );

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    my $type_check    = 0;

    # it's an object
    $type_check = 1
        if (blessed($cache)
        &&  ($cache->does( 'HASH' )
        ||  reftype( $cache ) eq 'HASH' 
        ||  overload::Method( $cache, '@{}' )));

    $type_check = 1 if ref( $cache ) eq 'Hash';

    croak croak "callee is buggy; I need a hash ref!" unless $type_check;

    ...
}

... but ouch.

Fortunately, Perl 5 has a perfectly good way to attempt and recover from an operation that may not succeed:

use Modern::Perl;
use Carp 'croak';

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    eval
    {
        $cache->{$key} //= calculate_expensive_value($key);

        return $cache->{key};
    }

    croak "callee is buggy; I need a hash ref!" if $@;
}

Yes, you may need to check $@ for the specific exception thrown, but that's much more straightforward, much simpler, and much, much easier to get right.

That leaves only the simple matter of fixing existing code, teaching current and future Perl programmers why not to use the obvious-but-wrong approaches, and enhancing Perl so that it's easier to do the right thing than the wrong thing. (Try Rakudo today!)

Good luck.

UNIVERSAL and API Decisions

By chromatic on January 4, 2010 2:06 PM

I explained some of the troubles with Perl 5's UNIVERSAL in Is It, Can It, Does It, and Robust Perl 5 OO. More — and subtler — problems lurk.

A common antipattern in Perl 5 APIs is to allow spectacular flexibility in argument passing. It's too common to see functions which take some kind of reference and manually switch based on the argument type. (Some APIs do need this flexibility, such as a pretty printer for nested data structures or a serializer. Even so, languages with support for multiple dispatch or pattern matching of the non-regex kind provide much cleaner, simpler, and more correct code. I take advantage of this feature all of the time in Perl 6.) An example might be:

sub my_awesome_api_does_everything
{
    my $arg = shift;

    given (ref($arg))
    {
        when 'ARRAY'  { ... }
        when 'SCALAR' { ... }
        when 'HASH'   { ... }
    }
}

That can be messy, and better API design can ameliorate that in many cases. OO fans may already have looked up Replace Conditional with Polymorphism to post in the little comment box, but that occasionally runs into the "is it a primitive or an object" conundrum that multi-paradigm languages provide.

Yes, you can make my_awesome_api_does_everything() into a method on a data type, but unless you've steadfastly avoided primitive obsession in your code, you'll have circumstances where you want to pass in a simple data structure instead of an object and vice versa.

The so-called solution of extension methods on base types has its own problems. Globals are tricky, even if they're namespaced methods: anything you could possibly want gets crammed into poor Array, potentially multiple and conflicting times. (If you turn your head the right direction, it's obvious that only a PHP programmer could have created Ruby on Rails; there but for namespaces and some degree of encapsulation....)

The real problem is that you have to manage all of that complexity somewhere. In the absence of a sane API which refuses to handle that complexity (usually the best solution) and language features which hide that complexity for you (and have plenty of experience from plenty of real programs and a copious test suite to ensure that correctness), you will get it wrong in myriad, subtle, conflicting ways.

For example, consider a bog-standard blessed hash Perl 5 object. If you pass it into my_awesome_api_does_everything(), what happens? What should happen?

You can argue that the switch statement needs another case to handle the object. You can also argue that the HASH approach should suffice. Yet accessing the object as a hash breaks encapsulation. Adding another case makes the code a little bit less maintainable (the combinations increase, adding to the complexity of this code, to say nothing of the additional documentation and testing requirements).

On the other side, the more genericity and polymorphism you can support in your APIs, the less the coupling of your program in general and the better the reuse possibilities. The ultimate goal of maintaining a system is to be able to delete code while adding features and removing bugs. Net negative SLOC production is wonderful.

You want to avoid unnecessary data conversions to use the API, but you also want to maintain safety and correctness. You don't want to rule out useful behavior, but you want to enforce consistency.

ref() is almost never the answer here, but at least it's not actively misleading like the code people often use instead, when they realize that ref() doesn't answer the interesting question.

When the right answer for the API is to poke in the reference as if it were a hash, you often see this bad code:

if (UNIVERSAL::isa( $ref, 'HASH' )) { ... } # buggy; do not use

The reason this is wrong is subtler than you may thing; part of the answer is in my previous article. I'll explain why in the next installment.

« December 2009 | Main Index | Archives | February 2010 »

January 2010 Archives

Subtle Encouragement Toward Correctness

Divine Meaning from Meaningless Numbers

Essential Skills for Perl 5 Programmers

Types, Invocations, and Designing Bugs Impossible

How to Add Allomorphism to Perl 5's Primitives

Optional Types and Perl

Genericity, Serendipity, Surety

Perl Type Checks versus Encapsulation

UNIVERSAL and API Decisions

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Archive