Perl isn't perfect. Some features are difficult to use correctly and others seemed great but don't work all that well. A few features combine with others in strange ways with weird edge cases. Knowing Perl's rough edges will help you avoid them, when possible, and avoid rough edges when you must use them.
Perl is a malleable language. You can write programs in whichever creative, maintainable, obfuscated, or bizarre fashion you prefer. Good programmers write code that they want to maintain, but Perl won't decide for you what you consider maintainable.
Perl's parser understands Perl's builtins and operators. It uses sigils to identify variables and other punctuation to recognize function and method calls. Yet sometimes the parser has to guess what you mean, especially when you use a bareword—an identifier without a sigil or other syntactically significant punctuation.
Though the strict
pragma (Pragmas) rightly forbids ambiguous barewords, some barewords are acceptable.
Hash keys in Perl are usually not ambiguous because the parser can identify them as string keys; pinball
in $games{pinball}
is obviously a string.
Occasionally this interpretation is not what you want, especially when you intend to evaluate a builtin or a function to produce the hash key. To make these cases clear, pass arguments to the function or use parentheses, or prepend a unary plus to force the evaluation of the builtin:
# the literal 'shift' is the key
my $value = $items{shift};
# the value produced by shift is the key
my $value = $items{shift @_}
# the function returns the key
my $value = $items{myshift( @_ )}
# unary plus uses the builtin shift
my $value = $items{+shift};
Package names are also barewords. If your naming conventions rule that package names have initial capitals and functions do not, you'll rarely encounter naming collisions. Even still, Perl must determine how to parse Package->method
. Does it mean "call a function named Package()
and call method()
on its return value?" or "Call a method named method()
in the Package
namespace?" The answer depends on the code you've already compiled.
Force the parser to treat Package
as a package name by appending the package separator (::
) Even among people who understand why this works, very few people do it. or make it a literal string:
# probably a class method
Package->method;
# definitely a class method
Package::->method;
# a slightly less ugly class method
'Package'->method;
The special named code blocks AUTOLOAD
, BEGIN
, CHECK
, DESTROY
, END
, INIT
, and UNITCHECK
are barewords which declare functions without the sub
builtin. You've seen this before (Code Generation):
package Monkey::Butler;
BEGIN { initialize_simians( __PACKAGE__ ) }
sub AUTOLOAD { ... }
While you can declare AUTOLOAD()
without using sub
, few people do.
Constants declared with the constant
pragma are usable as barewords:
# don't use this for real authentication
use constant NAME => 'Bucky';
use constant PASSWORD => '|38fish!head74|';
return unless $name eq NAME && $pass eq PASSWORD;
These constants do not interpolate in double-quoted strings.
Constants are a special case of prototyped functions (Prototypes). When you predeclare a function with a prototype, the parser will treat all subsequent uses of that bareword specially—and will warn about ambiguous parsing errors. All other drawbacks of prototypes still apply.
No matter how cautiously you code, barewords still produce ambiguous code. You can avoid the worst abuses, but you will encounter several types of barewords in legacy code.
Some old code may not take pains to quote the values of hash pairs:
# poor style; do not use
my %parents =
(
mother => Annette,
father => Floyd,
);
When neither the Floyd()
nor Annette()
functions exist, Perl will interpret these barewords as strings. strict 'subs'
will produce an error in this situation.
Code written without strict 'subs'
may use bareword function names. Adding parentheses will make the code pass strictures. Use perl -MO=Deparse,-p
(see perldoc B::Deparse
) to discover how Perl parses them, then parenthesize accordingly.
Prior to lexical filehandles (Filehandle References), all file and directory handles used barewords. You can almost always safely rewrite this code to use lexical filehandles. Perl's parser recognizes the special exceptions of STDIN
, STDOUT
, and STDERR
.
Finally, the second operand of the sort
builtin can be the name of a function to use for sorting. While this is rarely ambiguous to the parser, it can confuse human readers. The alternative of providing a function reference in a scalar is little better:
# bareword style
my @sorted = sort compare_lengths @unsorted;
# function reference in scalar
my $comparison = \&compare_lengths;
my @sorted = sort $comparison @unsorted;
The second option avoids the use of a bareword, but the result is longer. Unfortunately, Perl's parser does not understand the single-line version due to the special parsing of sort
; you cannot use an arbitrary expression (such as taking a reference to a named function) where a block or a scalar might otherwise go.
# does not work
my @sorted = sort \&compare_lengths @unsorted;
In both cases, the way sort
invokes the function and provides arguments can be confusing (see perldoc -f sort
for the details). Where possible, consider using the block form of sort
instead. If you must use either function form, add a comment about what you're doing and why.
Perl has no operator new
. A constructor is anything which returns an object. By convention, constructors are class methods named new()
, but you can name these methods (or even use functions). Several old Perl OO tutorials promote the use of C++ and Java-style constructor calls:
my $q = new CGI; # DO NOT USE
... instead of the obvious method call:
my $q = CGI->new;
These syntaxes produce equivalent behavior, except when they don't.
In the indirect object form (more precisely, the dative case) of the first example, the verb (the method) precedes the noun to which it refers (the object). This is fine in spoken languages, but it introduces parsing ambiguities in Perl.
Because the method's name is a bareword (Barewords), the parser uses several heuristics to figure out the proper interpretation of this code. While these heuristics are well-tested and almost always correct, their failure modes are confusing. Things get worse when you pass arguments to a constructor:
# DO NOT USE
my $obj = new Class( arg => $value );
In this example, the name of the class looks like a function call. Perl can disambiguate many of these cases, but its heuristics depend on which package names the parser has seen, which barewords it has already resolved (and how it resolved them), and the names of functions already declared in the current package. For an exhaustive list of these conditions, you have to read the source code of Perl's parser—not something the average Perl programmer wants to do.
Imagine running afoul of a prototyped function (Prototypes) with a name which just happens to conflict somehow with the name of a class or a method called indirectly. This is rare It's happened to your author when using the JSON
module., but so unpleasant to debug that it's worth avoiding indirect invocations.
Another danger of the indirect syntax is that the parser expects a single scalar expression as the object. Printing to a filehandle stored in an aggregate variable seems obvious, but it is not:
# DOES NOT WORK AS WRITTEN
say $config->{output} 'Fun diagnostic message!';
Perl will attempt to call say
on the $config
object.
print
, close
, and say
—all builtins which operate on filehandles—operate in an indirect fashion. This was fine when filehandles were package globals, but lexical filehandles (Filehandle References) make the indirect object syntax problems obvious. To solve this, disambiguate the subexpression which produces the intended invocant:
say {$config->{output}} 'Fun diagnostic message!';
Direct invocation notation does not suffer this ambiguity problem. To construct an object, call the constructor method on the class name directly:
my $q = Plack::Request->new;
my $obj = Class->new( arg => $value );
This syntax still has a bareword problem in that if you have a function named Request
in the Plack
namespace, Perl will interpret the bareword class name as a call to the function, as:
sub Plack::Request;
# you wrote Plack::Reuqest->new, but Perl saw
my $q = Plack::Request()->new;
While this happens rarely, you can disambiguate classnames by appending the package separator (::
) or by explicitly marking class names as string literals:
# package separator
my $q = Plack::Request::->new;
# unambiguously a string literal
my $q = 'Plack::Request'->new;
Almost no one ever does this.
For the limited case of filehandle operations, the dative use is so prevalent that you can use the indirect invocation approach if you surround your intended invocant with curly brackets. If you're using Perl 5.14 or newer (or if you load IO::File
or IO::Handle
), you can use methods on lexical filehandles Almost no one does this for print
and say
..
The CPAN module Perl::Critic::Policy::Dynamic::NoIndirect
(a plugin for Perl::Critic
) can analyze your code to find indirect invocations. The CPAN module indirect
can identify and prohibit their use in running programs:
# warn on indirect use
no indirect;
# throw exceptions on their use
no indirect ':fatal';
A prototype is a piece of metadata attached to a function or variable. A function prototype changes how Perl's parser understands it.
Prototypes allow users to define their own functions which behave like builtins. Consider the builtin push
, which takes an array and a list. While Perl would normally flatten the array and list into a single list passed to push
, the parser knows to treat the array as a container, not to flatten its values. In effect, this is like passing a reference to an array and a list of values to push
. The parser's behavior allows push
to modify the values of the container.
Function prototypes attach to function declarations:
sub foo (&@);
sub bar ($$) { ... }
my $baz = sub (&&) { ... };
Any prototype attached to a forward declaration must match the prototype attached to the function declaration. Perl will give a warning if this is not true. Strangely you may omit the prototype from a forward declaration and include it for the full declaration—but there's no reason to do so.
The builtin prototype
takes the name of a function and returns a string representing its prototype.
To see the prototype of a builtin, use the CORE::
form of the builtin's name as the operand to prototype
:
$ perl -E "say prototype 'CORE::push';"
\@@
$ perl -E "say prototype 'CORE::keys';"
\%
$ perl -E "say prototype 'CORE::open';"
*;$@
prototype
will return undef
for those builtins whose functions you cannot emulate:
say prototype 'CORE::system' // 'undef'
# undef; cannot emulate builtin system
say prototype 'CORE::prototype' // 'undef'
# undef; builtin prototype
has no prototype
Remember push
?
$ perl -E "say prototype 'CORE::push';"
\@@
The @
character represents a list. The backslash forces the use of a reference to the corresponding argument. This prototype means that push
takes a reference to an array and a list of values. You might write mypush
as:
sub mypush (\@@)
{
my ($array, @rest) = @_;
push @$array, @rest;
}
Other prototype characters include $
to force a scalar argument, %
to mark a hash (most often used as a reference), and &
to identify a code block. See perldoc perlsub
for more information.
Prototypes change how Perl parses your code and how Perl coerces arguments to your functions. While these prototypes may superficially resemble function signatures in other languages, they are very different. They do not document the number or types of arguments functions expect, nor do they map arguments to named parameters.
Prototype coercions work in subtle ways, such as enforcing scalar context on incoming arguments:
sub numeric_equality($$)
{
my ($left, $right) = @_;
return $left == $right;
}
my @nums = 1 .. 10;
say 'They're equal, whatever that means!'
if numeric_equality @nums, 10;
... but only work on simple expressions:
sub mypush(\@@);
# compilation error: prototype mismatch
# (expects array, gets scalar assignment)
mypush( my $elems = [], 1 .. 20 );
To debug this, users of mypush
must know both that a prototype exists, and the limitations of the array prototype.
Debugging Prototype Errors
If you think this error message is inscrutable, wait until you see the complicated prototype errors.
Prototypes do have a few good uses that outweigh their problems. For example, you can use a prototyped function to override one of Perl's builtins. First check that you can override the builtin by examining its prototype in a small test program. Then use the subs
pragma to tell Perl that you plan to override a builtin, and finally declare your override with the correct prototype:
use subs 'push';
sub push (\@@) { ... }
Beware that the subs
pragma is in effect for the remainder of the file, regardless of any lexical scoping.
The second reason to use prototypes is to define compile-time constants. When Perl encounters a function declared with an empty prototype (as opposed to no prototype) and this function evaluates to a single constant expression, the optimizer will turn all calls to that function into constants instead of function calls:
sub PI () { 4 * atan2(1, 1) }
All subsequent code will use the calculated value of pi in place of the bareword PI
or a call to PI()
, with respect to scoping and visibility.
The core pragma constant
handles these details for you. The Const::Fast
module from the CPAN creates constant scalars which you can interpolate into strings.
A reasonable use of prototypes is to extend Perl's syntax to operate on anonymous functions as blocks. The CPAN module Test::Exception
uses this to good effect to provide a nice API with delayed computation See also Test::Fatal
. Its throws_ok()
function takes three arguments: a block of code to run, a regular expression to match against the string of the exception, and an optional description of the test:
use Test::More;
use Test::Exception;
throws_ok
{ my $unobject; $unobject->yoink }
qr/Can't call method "yoink" on an undefined/,
'Method on undefined invocant should fail';
done_testing();
The exported throws_ok()
function has a prototype of &$;$
. Its first argument is a block, which becomes an anonymous function. The second argument is a scalar. The third argument is optional.
Careful readers may have spotted the absence of a comma after the block. This is a quirk of Perl's parser, which expects whitespace after a prototyped block, not the comma operator. This is a drawback of the prototype syntax. If that bothers you, use throws_ok()
without taking advantage of the prototype:
use Test::More;
use Test::Exception;
throws_ok(
sub { my $unobject; $unobject->yoink() },
qr/Can't call method "yoink" on an undefined/,
'Method on undefined invocant should fail' );
done_testing();
A final good use of prototypes is when defining a custom named function to use with sort
Ben Tilly suggested this example.:
sub length_sort ($$)
{
my ($left, $right) = @_;
return length($left) <=> length($right);
}
my @sorted = sort length_sort @unsorted;
The prototype of $$
forces Perl to pass the sort pairs in @_
. sort
's documentation suggests that this is slightly slower than using the package globals $a
and $b
, but using lexical variables often makes up for any speed penalty.
Perl's object system is deliberately minimal (Blessed References). Because a class is a package, Perl does not distinguish between a function and a method stored in a package. The same builtin, sub
, declares both. Perl will happily dispatch to a function called as a method. Likewise, you can invoke a method as if it were a function—fully-qualified, exported, or as a reference—if you pass in your own invocant manually.
Invoking the wrong thing in the wrong way causes problems.
Consider a class with several methods:
package Order;
use List::Util 'sum';
...
sub calculate_price
{
my $self = shift;
return sum( 0, $self->get_items );
}
Given an Order
object $o
, the following invocations of this method may seem equivalent:
my $price = $o->calculate_price;
# broken; do not use
my $price = Order::calculate_price( $o );
Though in this simple case, they produce the same output, the latter violates object encapsulation by avoiding method lookup.
Perl has one circumstance where this behavior may seem necessary. If you force method resolution without dispatch, how do you invoke the resulting method reference?
my $meth_ref = $o->can( 'apply_discount' );
There are two possibilities. The first is to discard the return value of the can()
method:
$o->apply_discount if $o->can( 'apply_discount' );
The second is to use the reference itself with method invocation syntax:
if (my $meth_ref = $o->can( 'apply_discount' ))
{
$o->$meth_ref();
}
When $meth_ref
contains a function reference, Perl will invoke that reference with $o
as the invocant. This works even under strictures, as it does when invoking a method with a scalar containing its name:
my $name = 'apply_discount';
$o->$name();
There is one small drawback in invoking a method by reference; if the structure of the program changes between storing the reference and invoking the reference, the reference may no longer refer to the most appropriate method. If the Order
class has changed such that Order::apply_discount
is no longer the right method to call, the reference in $meth_ref
will not have updated.
When you use this invocation form, limit the scope of the references.
The CGI
module has these two-faced functions. Every one of them must apply several heuristics to determine whether the first argument is an invocant. This causes problems. It's difficult to predict exactly which invocants are potentially valid for a given method, especially when you may have to deal with subclasses. Creating an API that users cannot easily misuse is more difficult too, as is your documentation burden. What happens when one part of the project uses the procedural interface and another uses the object interface?
If you must provide a separate procedural and OO interface to a library, create two separate APIs.
Perl can automatically dereference certain references on your behalf. Given an array reference in $arrayref
, you can write:
push $arrayref, qw( list of values );
Given an expression which returns an array reference, you can do the same:
push $houses{$location}[$closets], \@new_shoes;
The same goes for the array operators pop
, shift
, unshift
, splice
, keys
, values
, and each
and the hash operators keys
, values
, and each
. If the reference provided is not of the proper type—if it does not dereference properly—Perl will throw an exception. While this may seem more dangerous than explicitly dereferencing references directly, it is in fact the same behavior:
my $ref = sub { ... };
# will throw an exception
push $ref, qw( list of values );
# will also throw an exception
push @$ref, qw( list of values );
Unfortunately, this automatic dereferencing has two problems. First, it only works on plain variables. If you have a bless
ed array or hash, a tie
d hash, or an object with array or hash overloading, Perl will throw a runtime exception instead of dereferencing the reference.
Second, remember that each
, keys
, and values
can operate on both arrays and hashes. You can't look at:
my @items = each $ref;
... and tell whether @items
contains a list of key/value pairs or index/value pairs, because you don't know whether you should expect $ref
to refer to a hash or an array. Yes, choosing good variable names will help, but this code is intrinsically confusing.
Neither of these drawbacks make this syntax unusable in general, but its rough edges and potential for confusing readers make it less useful than it could be.
Where overloading (Overloading) allows you to customize the behavior of classes and objects for specific types of coercion, a mechanism called tying allows you to customize the behavior of primitive variables (scalars, arrays, hashes, and filehandles). Any operation you might perform on a tied variable translates to a specific method call on an object.
The tie
builtin originally allowed you to use disk space instead of RAM for hashes and arrays, so that Perl could use data larger than available memory. The core module Tie::File
allows you to do this, in effect treating files as if they were arrays.
The class to which you tie
a variable must conform to a defined interface for a specific data type. Read perldoc perltie
for an overview, then see the core modules Tie::StdScalar
, Tie::StdArray
, and Tie::StdHash
for specific details. Start by inheriting from one of those classes, then override any specific methods you need to modify.
When Class and Package Names Collide
If tie
weren't confusing enough, Tie::Scalar
, Tie::Array
, and Tie::Hash
define the necessary interfaces to tie scalars, arrays, and hashes, but Tie::StdScalar
, Tie::StdArray
, and Tie::StdHash
provide the default implementations.
To tie a variable:
use Tie::File;
tie my @file, 'Tie::File', @args;
The first argument is the variable to tie. The second is the name of the class into which to tie it. @args
is an optional list of arguments required for the tying function. In the case of Tie::File
, @args
should contain a valid filename.
Tying functions resemble constructors: TIESCALAR
, TIEARRAY()
, TIEHASH()
, or TIEHANDLE()
for scalars, arrays, hashes, and filehandles respectively. Each function returns a new object which represents the tied variable. Both tie
and tied
return this object, though most people use tied
in a boolean context.
To implement the class of a tied variable, inherit from a core module such as Tie::StdScalar
Tie::StdScalar
lacks its own .pm file, so write use Tie::Scalar;
., then override the specific methods for the operations you want to change. In the case of a tied scalar, these are likely FETCH
and STORE
, possibly TIESCALAR()
, and probably not DESTROY()
.
Here's a class which logs all reads from and writes to a scalar:
package Tie::Scalar::Logged
{
use Modern::Perl;
use Tie::Scalar;
use parent -norequire => 'Tie::StdScalar';
sub STORE
{
my ($self, $value) = @_;
Logger->log("Storing <$value> (was [$$self])", 1);
$$self = $value;
}
sub FETCH
{
my $self = shift;
Logger->log("Retrieving <$$self>", 1);
return $$self;
}
}
Assume that the Logger
class method log()
takes a string and the number of frames up the call stack of which to report the location.
Within the STORE()
and FETCH()
methods, $self
works as a blessed scalar. Assigning to that scalar reference changes the value of the scalar and reading from it returns its value.
Similarly, the methods of Tie::StdArray
and Tie::StdHash
act on blessed array and hash references, respectively. Again, perldoc perltie
explains the methods tied variables support, such as reading or writing multiple values at once.
Isn't tie
Fun?
The -norequire
option prevents the parent
pragma from attempting to load a file for Tie::StdScalar
, as that module is part of the file Tie/Scalar.pm. This is messy but necessary.
Tied variables seem like fun opportunities for cleverness, but they can produce confusing interfaces. Unless you have a very good reason for making objects behave as if they were builtin data types, avoid creating your own ties. tie
d variables are often much slower than builtin data types.
With that said, tied variables can help you debug tricky code (use the logged scalar to help you understand where a value changes) or to make certain impossible things possible (access large files without running out of memory). Tied variables are less useful as the primary interfaces to objects; it's often too difficult and constraining to try to fit your whole interface to that supported by tie()
.
A final word of warning is a sad indictment of lazy programming: a lot of code goes out of its way to prevent use of tied variables, often by accident. This is unfortunate, but library code is sometimes fast and lazy with what it expects, and you can't always fix it.