The Problems with Indirect Object Notation

| 5 Comments

This excerpt from Modern Perl: the book discusses another feature of Perl 5 which makes parsing Perl 5 difficult. Avoiding this feature in your own code will make it more reliable and easier to debug.

Read a few Perl 5 object tutorials (or the documentation of too many CPAN modules), and you might believe that new is a language keyword just as in C++ and Java:

    my $q = new CGI; # DO NOT USE

As objects has made clear, a constructor in Perl 5 is anything which returns an object. By convention, constructors are class methods named new(), but you have the flexibility to choose a different approach to meet your needs. If new() is instead a class method, the standard method call approach should apply:

    my $q = CGI->new();

These syntaxes are equivalent in behavior, except when they're not.

The first form is the indirect object form (more precisely, the dative case), where the verb (the method) precedes the noun to which it refers (the object). This is fine in spoken languages, but it introduces difficult to debug ambiguities in Perl 5.

Bareword indirect invocations

One problem is that the name of the method is a bareword, requiring the Perl 5 parser to perform several heuristics to determine the proper interpretation. While these heuristics are well-tested and almost always correct, their failure modes can be very confusing and difficult to debug. Worse, they're fragile in the face of the order of compilation and module loading.

Parsing is more difficult for humans and the computer when the constructor takes arguments. The Java-style approach may resemble:

    # DO NOT USE
    my $obj = new Class( arg => $value );

... thus making the classname Class look like a subroutine call. Perl 5 can disambiguate many of these cases, but its heuristics depend on which package names the parser has seen so far, which barewords it has already resolved (and how it resolved them), and the names of subroutines already declared in the current package.

Imagine running afoul of a subroutine with prototypes with a name which just happens to conflict somehow with the name of a class or a method called indirectly. This happens infrequently, but it's difficult enough to debug that it's worth making impossible by avoiding this syntax.

Indirect notation scalar limitations

Another danger of the syntax is that the parser expects a single scalar expression as the object. You may have had trouble printing to a filehandle stored in an aggregate variable:

    # DOES NOT WORK AS WRITTEN
    say $config->{output} "This is a diagnostic message!";

print, close, and say -- all keywords which operate on filehandles -- operate in an indirect fashion. This was fine when filehandles were package globals, but with lexical_filehandles the problem can be more apparent, when Perl 5 tries to call the say method on the $config object. The solution is to disambiguate the expression which produces the intended invocant:

    say {$config->{output}} "This is a diagnostic message!";

Alternatives to indirect notation

Direct invocation notation does not suffer this ambiguity problem. To construct an object, call the constructor method on the class name directly:

    my $q   = CGI->new();
    my $obj = Class->new( arg => $value );

For filehandle operations, which are limited, known to the Perl 5 parser directly, and pervasive in their idiomatic use of the dative case, use curly brackets to remove ambiguity about your intended invocant. Alternately, consider loading the core IO::Handle module which allows you to perform IO operations by calling methods on filehandle objects (such as lexical filehandles).

To identify indirect calls in your code, use the CPAN module Perl::Critic::Policy::Dynamic::NoIndirect (a plugin for Perl::Critic). To forbid their use at compile time, use the CPAN module indirect.

5 Comments

And of course there's already an explanation of -how- perl disambiguates and an example of the ways it can screw you up in the "indirect but still fatal" post on my blog

There is a way to avoid the ambiguity. I found this in the camel book.

my $object = new Class::($args);

Therefore I see no technical reasons to use this form not, when no method dispatching is wanted or needed. But beginner should know and teached about the differences between the two forms.

Thanks for this reminder. I thought I had purged indirect object notation from CGI.pm in the 3.43 release, but there were clearly plenty of cases left to address.

I've patched those now, and they will be removed from the next release. Unfortunately, it seems too late to this update into 5.10.1, so it will be 5.10.2 when the change appears in the core.

Reference:
http://github.com/markstos/CGI.pm/commit/a36e451716f8cee8ac02376617bb98a33d5ac9c0

It's too bad that the word "indirect" was assigned to this notation. A high value name for a notation that we typically don't use. I understand the linguistic reason for the usage, but programmers have their own use for the word "indirect". It'd be nice to have a concise description of the Class->$method(@args) style notation.

As mentioned elsewhere:

use Class;
sub Class {
warn 'Called Class sub not Class package';
'Class'
}
my $q = Class->new; # calls the Class sub above
my $s = new Class; # throws a 'Bareword found where operator expected' error
my $t = Class::->new # this works
my $u = new Class::; # this also works (even with sub main in the current package)


Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on August 21, 2009 5:21 PM.

The Problem with Prototypes was the previous entry in this blog.

Vision and the Perl 5 Ecosystem is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?