This excerpt from Modern Perl: the book discusses another feature of Perl 5 which makes parsing Perl 5 difficult. Avoiding this feature in your own code will make it more reliable and easier to debug.
Read a few Perl 5 object tutorials (or the documentation of too many CPAN
modules), and you might believe that new
is a language keyword
just as in C++ and Java:
my $q = new CGI; # DO NOT USE
As objects has made clear, a constructor in Perl 5 is anything which
returns an object. By convention, constructors are class methods named
new()
, but you have the flexibility to choose a different
approach to meet your needs. If new()
is instead a class
method, the standard method call approach should apply:
my $q = CGI->new();
These syntaxes are equivalent in behavior, except when they're not.
The first form is the indirect object form (more precisely, the dative case), where the verb (the method) precedes the noun to which it refers (the object). This is fine in spoken languages, but it introduces difficult to debug ambiguities in Perl 5.
Bareword indirect invocations
One problem is that the name of the method is a bareword, requiring the Perl 5 parser to perform several heuristics to determine the proper interpretation. While these heuristics are well-tested and almost always correct, their failure modes can be very confusing and difficult to debug. Worse, they're fragile in the face of the order of compilation and module loading.
Parsing is more difficult for humans and the computer when the constructor takes arguments. The Java-style approach may resemble:
# DO NOT USE
my $obj = new Class( arg => $value );
... thus making the classname Class
look like a subroutine
call. Perl 5 can disambiguate many of these cases, but its
heuristics depend on which package names the parser has seen so far, which
barewords it has already resolved (and how it resolved them), and the
names of subroutines already declared in the current package.
Imagine running afoul of a subroutine with prototypes with a name which just happens to conflict somehow with the name of a class or a method called indirectly. This happens infrequently, but it's difficult enough to debug that it's worth making impossible by avoiding this syntax.
Indirect notation scalar limitations
Another danger of the syntax is that the parser expects a single scalar expression as the object. You may have had trouble printing to a filehandle stored in an aggregate variable:
# DOES NOT WORK AS WRITTEN
say $config->{output} "This is a diagnostic message!";
print
, close
, and say
-- all
keywords which operate on filehandles -- operate in an indirect fashion.
This was fine when filehandles were package globals, but with
lexical_filehandles the problem can be more apparent, when Perl 5 tries to
call the say
method on the $config
object. The
solution is to disambiguate the expression which produces the intended
invocant:
say {$config->{output}} "This is a diagnostic message!";
Alternatives to indirect notation
Direct invocation notation does not suffer this ambiguity problem. To construct an object, call the constructor method on the class name directly:
my $q = CGI->new();
my $obj = Class->new( arg => $value );
For filehandle operations, which are limited, known to the Perl 5 parser
directly, and pervasive in their idiomatic use of the dative case, use
curly brackets to remove ambiguity about your intended invocant.
Alternately, consider loading the core IO::Handle
module which
allows you to perform IO operations by calling methods on filehandle
objects (such as lexical filehandles).
To identify indirect calls in your code, use the CPAN module Perl::Critic::Policy::Dynamic::NoIndirect (a plugin for Perl::Critic). To forbid their use at compile time, use the CPAN module indirect.
And of course there's already an explanation of -how- perl disambiguates and an example of the ways it can screw you up in the "indirect but still fatal" post on my blog
There is a way to avoid the ambiguity. I found this in the camel book.
my $object = new Class::($args);
Therefore I see no technical reasons to use this form not, when no method dispatching is wanted or needed. But beginner should know and teached about the differences between the two forms.
Thanks for this reminder. I thought I had purged indirect object notation from CGI.pm in the 3.43 release, but there were clearly plenty of cases left to address.
I've patched those now, and they will be removed from the next release. Unfortunately, it seems too late to this update into 5.10.1, so it will be 5.10.2 when the change appears in the core.
Reference:
http://github.com/markstos/CGI.pm/commit/a36e451716f8cee8ac02376617bb98a33d5ac9c0
It's too bad that the word "indirect" was assigned to this notation. A high value name for a notation that we typically don't use. I understand the linguistic reason for the usage, but programmers have their own use for the word "indirect". It'd be nice to have a concise description of the Class->$method(@args) style notation.
As mentioned elsewhere:
use Class;
sub Class {
warn 'Called Class sub not Class package';
'Class'
}
my $q = Class->new; # calls the Class sub above
my $s = new Class; # throws a 'Bareword found where operator expected' error
my $t = Class::->new # this works
my $u = new Class::; # this also works (even with sub main in the current package)