The Perl Language

Like a spoken language, the whole of Perl is a combination of several smaller but interrelated parts. Unlike spoken language, where nuance and tone of voice and intuition allow people to communicate despite slight misunderstandings and fuzzy concepts, computers and source code require precision. You can write effective Perl code without knowing every detail of every language feature, but you must understand how they work together to write Perl code well.

Names

Names (or identifiers) are everywhere in Perl programs: you get to choose them for variables, functions, packages, classes, and even filehandles. Valid Perl names all begin with a letter or an underscore and may optionally include any combination of letters, numbers, and underscores. When the utf8 pragma (Unicode and Strings) is in effect, you may use any UTF-8 word characters in identifiers. All of these are valid Perl identifiers:

    my $name;
    my @_private_names;
    my %Names_to_Addresses;
    sub anAwkwardName3;

    # with use utf8; enabled
    package Ingy::Döt::Net;

These are invalid Perl identifiers:

    my $invalid name;
    my @3;
    my %~flags;

    package a-lisp-style-name;

Names exist primarily for the benefit of the programmer. These rules apply only to literal names which appear in your source code, such as sub fetch_pie or my $waffleiron. Only Perl's parser enforces the rules about identifier names. Perl allows you to refer to entities with names generated at runtime or provided as input to a program. These symbolic lookups provide flexibility at the expense of safety.

In particular, invoking functions or methods indirectly or looking up symbols in a namespace lets you bypass Perl's parser. Symbolic lookups can produce confusing code. As Mark Jason Dominus recommends so effectively http://perl.plover.com/varvarname.html, prefer a hash (Hashes) or nested data structure (Nested Data Structures) over variables named, for example, $recipe1, $recipe2, and so on.

Variable Names and Sigils

Variable names always have a leading sigil (a symbol) which indicates the type of the variable's value. Scalar variables (Scalars) use the dollar sign ($). Array variables (Arrays) use the at sign (@). Hash variables (Hashes) use the percent sign (%):

    my $scalar;
    my @array;
    my %hash;

Sigils allow you to separate variables into different namespaces. It's possible—though confusing—to declare multiple variables of the same name with different types:

    my ($bad_name, @bad_name, %bad_name);

Though Perl won't get confused, people reading this code will.

The sigil of a variable can change depending on what you do with it; the term for this is variant sigils. As context determines how many items you expect from an operation or what type of data you expect to get, so the sigil governs how you manipulate the data of a variable. For example, you must use the scalar sigil ($) to access a single element of an array or a hash:

    my $hash_element  = $hash{ $key };
    my $array_element = $array[ $index ]

    $hash{ $key }     = 'value';
    $array[ $index ]  = 'item';

The parallel with amount context is important. Using a scalar element of an aggregate as an lvalue (the target of an assignment; on the left side of the = character) imposes scalar context (Context) on the rvalue (the value assigned; on the right side of the = character).

Similarly, accessing multiple elements of a hash or an array—an operation known as slicing—uses the at symbol (@) and imposes list context ... even if the list itself has zero or one elements:

    my @hash_elements  = @hash{ @keys };
    my @array_elements = @array[ @indexes ];

    my %hash;
    @hash{ @keys }     = @values;

The most reliable way to determine the type of a variable—scalar, array, or hash—is to examine the operations performed on it. Scalars support all basic operations, such as string, numeric, and boolean manipulations. Arrays support indexed access through square brackets. Hashes support keyed access through curly brackets.

Namespaces

Perl provides a mechanism to group similar functions and variables into their own unique named spaces—namespaces (Packages). A namespace is collection of symbols grouped under a globally unique name. Perl allows multi-level namespaces, with names joined by double colons (::), where DessertShop::IceCream refers to a logical collection of related variables and functions, such as scoop() and pour_hot_fudge().

Within a namespace, you may use the short name of its members. Outside of the namespace, you must refer to a member by its fully-qualified name. Within DessertShop::IceCream, add_sprinkles() refers to the same function as does DessertShop::IceCream::add_sprinkles() outside of the namespace.

While standard naming rules apply to package names, user-defined packages all start with uppercase letters by convention. The Perl core reserves lowercase package names for core pragmas (Pragmas), such as strict and warnings. This is a policy enforced primarily by community guidelines.

All namespaces in Perl are globally visible. When Perl looks up a symbol in DessertShop::IceCream::Freezer, it looks in the main:: symbol table for a symbol representing the DessertShop:: namespace, then in that namespace for the IceCream:: namespace, and so on. Yet Freezer:: is visible from outside of the IceCream:: namespace. Namespaces are all globally accessible. The nesting of the former within the latter is only a storage mechanism, and implies nothing further about relationships between parent and child or sibling packages.

Only a programmer can make logical relationships between entities obvious—by choosing good names and organizing them well.

Variables

A variable in Perl is a storage location for a value (Values). While a trivial program can manipulate values directly, most programs work with variables to simplify the logic of the code. A variable represents values; it's easier to explain the Pythagorean theorem in terms of the variables a, b, and c than by intuiting its principle by producing a long list of valid values. This concept may seem basic, but to program effectively, you must learn the art of balancing the generic and reusable with the specific.

Variable Scopes

Variables are available within your program depending on their scope (Scope). Most of the variables you will encounter have lexical scope (Lexical Scope), or scope governed by the syntax of the program as written. Most lexical scopes are either the contents of blocks delimited by curly braces ({ and }) or entire files. Files themselves provide their own lexical scopes, such that the package declaration on its own does not create a new scope:

    package Store::Toy;

    my $discount = 0.10;

    package Store::Music;

    # $discount still visible
    say "Our current discount is $discount!";

You may also provide a block to the package declaration As of 5.14.. Because this introduces a new block, it also provides a new lexical scope:

    package Store::Toy
    {
        my $discount = 0.10;
    }

    package Store::Music
    {
        # $discount not available
    }

    package Store::BoardGame;

    # $discount still not available

Variable Sigils

The sigil of the variable in a declaration determines the type of the variable: scalar, array, or hash. The sigil used when accessing a variable varies depending on what you do to the variable. For example, you declare an array as @values. Access the first element—a single value—of the array with $values[0]. Access a list of values from the array with @values[ @indices ]. As you might expect, the sigil you use determines amount context in an lvalue situation:

    # imposes lvalue context on some_function()
    @values[ @indexes ] = some_function()

... or gets coerced in an rvalue situation:

    # list evaluated to final element in scalar context
    my $element = @values[ @indices ]

Anonymous Variables

Perl variables do not require names. Names exist to help you, the programmer, keep track of an $apple, @barrels, or %cookie_recipes. Variables created without literal names in your source code are anonymous variables. The only way to access anonymous variables is by reference (References).

Variables, Types, and Coercion

This relationship between variable types, sigils, and context is essential to your understanding of Perl.

A Perl variable represents both a value (a dollar cost, available pizza toppings, the names and numbers of guitar stores) and the container which stores that value. Perl's type system deals with value types and container types. While a variable's container type—scalar, array, or hash—cannot change, Perl is flexible about a variable's value type. You may store a string in a variable in one line, append to that variable a number on the next, and reassign a reference to a function (Function References) on the third ... but you'll confuse yourself if you do all of that..

Performing an operation on a variable which imposes a specific value type may cause coercion (Coercion) from the variable's existing value type.

For example, the documented way to determine the number of entries in an array is to evaluate that array in scalar context (Context). Because a scalar variable can only ever contain a scalar, assigning an array to a scalar imposes scalar context on the operation, and an array evaluated in scalar context produces the number of elements in the array:

    my $count = @items;

Values

As you gain experience, you'll discover that the structure of your programs will depend on the way you model your data with variables.

Variables allow the abstract manipulation of data while the values they hold make programs concrete and useful. The more accurate your values, the better your programs. These values are your aunt's name and address, the distance between your office and a golf course on the moon, or the weight of all of the cookies you've eaten in the past year. Within your program, the rules regarding the format of that data are often strict.

Effective programs need effective (simple, fast, efficient, easy to use) ways of representing their data.

Strings

A string is a piece of textual or binary data with no particular formatting or contents. It could be your name, the contents of an image file, or the source code of the program itself. A string has meaning in the program only when you give it meaning.

To represent a literal string in your program, surround it with a pair of quoting characters. The most common string delimiters are single and double quotes:

    my $name    = 'Donner Odinson, Bringer of Despair';
    my $address = "Room 539, Bilskirnir, Valhalla";

Characters in a single-quoted string are exactly and only ever what they appear to be, with two exceptions. To include a single quote inside a single-quoted string, escaping it with a leading backslash:

    my $reminder = 'Don\'t forget to escape '
                 . 'the single quote!';

If you want a backslash at the end of the string, you'll have to escape it as well, to avoid making Perl think you're trying to escape the closing delimiter Programming language design is full of corner cases like this.:

    my $exception = 'This string ends with a '
                  . 'backslash, not a quote: \\';

Any other backslash will be part of the string as it appears, unless you have two adjacent backslashes, in which case Perl will believe that you intended to escape the second:

    is('Modern \ Perl', 'Modern \\ Perl',
        'single quotes backslash escaping');

A double-quoted string gives you more options. For example, you may encode otherwise invisible whitespace characters in the string:

    my $tab       = "\t";
    my $newline   = "\n";
    my $carriage  = "\r";
    my $formfeed  = "\f";
    my $backspace = "\b";

This demonstrates a useful principle: there are multiple possible representations of the same string. You can include a tab within a string by typing the \t escape sequence or by hitting the Tab key on your keyboard. Within Perl's purview, both strings behave the same way, even though the representation of the string may differ in the source code.

A string declaration may cross (and include) newlines, so these two declarations are equivalent:

    my $escaped = "two\nlines";
    my $literal = "two
    lines";
    is $escaped, $literal, 'equivalent \n and newline';

With that said, the escape sequences are often much easier to read than their literal equivalents.

As you manipulate and modify strings, Perl will change their sizes as appropriate; these strings have variable lengths. For example, you can combine multiple strings into a larger string with the concatenation operator .:

    my $kitten = 'Choco' . ' ' . 'Spidermonkey';

... though this is effectively the same as if you'd initialized the string all at once.

You may also interpolate the value of a scalar variable or the values of an array within a double-quoted string, such that the current contents of the variable become part of the string as if you'd concatenated them:

    my $factoid = "$name lives at $address!";

    # equivalent to
    my $factoid = $name . ' lives at ' . $address . '!';

Include a literal double-quote inside a double-quoted string by escaping it with a leading backslash:

    my $quote = "\"Ouch,\", he cried.  \"That hurt!\"";

When repeated backslashing becomes unwieldy, use a quoting operator, which allows you to choose an alternate string delimiter. The q operator indicates single quoting (no interpolation), while the qq operator provides double quoting behavior (interpolation). The character immediately following the operator determines the characters used to delimit the strings. If the character is the opening character of a balanced pair—such as opening and closing braces—the closing character will be the final delimiter. Otherwise, the character itself will be both the starting and ending delimiter.

    my $quote     = qq{"Ouch", he said.  "That hurt!"};
    my $reminder  =  q^Don't escape the single quote!^;
    my $complaint =  q{It's too early to be awake.};

When declaring a complex string with a series of embedded escapes is tedious, use the heredoc syntax to assign multiple lines to a string:

    my $blurb =<<'END_BLURB';

    He looked up. "Change is the constant on which they all
    can agree.  We instead, born out of time, remain perfect
    and perfectly self-aware. We only suffer change as we
    pursue it. It is against our nature. We rebel against
    that change. Shall we consider them greater for it?"
    END_BLURB

The <<'END_BLURB' syntax has three parts. The double angle-brackets introduce the heredoc. The quotes determine whether the heredoc follows single- or double-quoted behavior. (The default behavior is double-quoted.) END_BLURB is an arbitrary identifier which the Perl parser uses as the ending delimiter.

Regardless of the indentation of the heredoc declaration itself, the ending delimiter must start at the beginning of the line:

    sub some_function {
        my $ingredients =<<'END_INGREDIENTS';
        Two eggs
        One cup flour
        Two ounces butter
        One-quarter teaspoon salt
        One cup milk
        One drop vanilla
        Season to taste
    END_INGREDIENTS
    }

Using a string in a non-string context will induce coercion (Coercion).

Unicode and Strings

Unicode is a system for representing the characters of the world's written languages. While most English text uses a character set of only 127 characters (which requires seven bits of storage and fits nicely into eight-bit bytes), it's naïve to believe that you won't someday need an umlaut.

Perl strings can represent either of two separate but related data types:

Sequences of Unicode characters

Each character has a codepoint, a unique number which identifies it in the Unicode character set.
Sequences of octets

Binary data is a sequence of octets—8 bit numbers, each of which can represent a number between 0 and 255.

Words Matter

Why octet and not byte? Assuming that one character fits in one byte will cause you no end of Unicode grief. Separate the idea of memory storage from character representation. Forget that you ever heard of bytes.

Unicode strings and binary strings look superficially similar. Each has a length(). Each supports standard string operations such as concatenation, splicing, and regular expression processing (Regular Expressions and Matching). Any string which is not purely binary data is textual data, and thus should be a sequence of Unicode characters.

However, because of how your operating system represents data on disk or from users or over the network—as sequences of octets—Perl can't know if the data you read is an image file or a text document or anything else. By default, Perl treats all incoming data as sequences of octets. It's up to you to add a specific meaning to that data.

Character Encodings

A Unicode string is a sequence of octets which represents a sequence of characters. A Unicode encoding maps octet sequences to characters. Some encodings, such as UTF-8, can encode all of the characters in the Unicode character set. Other encodings represent only a subset of Unicode characters. For example, ASCII encodes plain English text with no accented characters, while Latin-1 can represent text in most languages which use the Latin alphabet.

An Evolving Standard

Perl 5.14 supports the Unicode 6.0 standard, 5.16 the 6.1 standard, and 5.18 the 6.2 standard. See http://unicode.org/versions/.

To avoid most Unicode problems, always decode to and from the appropriate encoding at the inputs and outputs of your program.

Unicode in Your Filehandles

When you tell Perl that a specific filehandle (Files) should handle data with a specific Unicode encoding, Perl will use an IO layer to convert between octets and characters. The mode operand of the open builtin allows you to request an IO layer by name. For example, the :utf8 layer decodes UTF-8 data:

    open my $fh, '<:utf8', $textfile;

    my $unicode_string = <$fh>;

Use binmode to apply an IO layer to an existing filehandle:

    binmode $fh, ':utf8';
    my $unicode_string = <$fh>;

    binmode STDOUT, ':utf8';
    say $unicode_string;

Without the utf8 mode, printing certain Unicode strings to a filehandle will result in a warning (Wide character in %s), because files contain octets, not Unicode characters.

Enable UTF-8 Everywhere

The utf8::all module enables UTF-8 IO layers on all filehandles throughout your program and enables all sorts of other Unicode features. It's very handy, but it's no substitute for (eventually) figuring out what your program needs.

Unicode in Your Data

The core module Encode provides a function named decode() to convert a scalar containing octets to Perl's internal version of Unicode strings. The corresponding encode() function converts from Perl's internal encoding to the desired encoding:

    my $from_utf8 = decode('utf8', $data);
    my $to_latin1 = encode('iso-8859-1', $string);

To handle Unicode properly, you must always decode incoming data via a known encoding and encode outgoing data to a known encoding. Yes, this means you have to know what kind of data you expect to give and receive, but you should know this anyway. Being specific will help you avoid all kinds of trouble.

Unicode in Your Programs

You may include Unicode characters in your programs in three ways. The easiest is to use the utf8 pragma (Pragmas), which tells the Perl parser to interpret the rest of the source code file with the UTF-8 encoding. This allows you to use Unicode characters in strings and identifiers:

    use utf8;

    sub £_to_¥ { ... }

    my $yen = £_to_¥('1000£');

To write this code, your text editor must understand UTF-8 and you must save the file with the appropriate encoding Again, any two programs which communicate with Unicode data must agree on the encoding of that data..

Within double-quoted strings, you may use the Unicode escape sequence to represent character encodings. The syntax \x{} represents a single character; place the hex form of the character's Unicode number See http://unicode.org/charts/ for an exhaustive list. within the curly brackets:

    my $escaped_thorn = "\x{00FE}";

Some Unicode characters have names, and these names are often clearer to read than their numbers even though they're much longer. Use the charnames pragma to enable them and the \N{} escape to refer to them:

    use charnames ':full';
    use Test::More tests => 1;

    my $escaped_thorn = "\x{00FE}";
    my $named_thorn   = "\N{LATIN SMALL LETTER THORN}";

    is $escaped_thorn, $named_thorn,
        'Thorn equivalence check';

You may use the \x{} and \N{} forms within regular expressions as well as anywhere else you may legitimately use a string or a character.

Implicit Conversion

Most Unicode problems in Perl arise from the fact that a string could be either a sequence of octets or a sequence of characters. Perl allows you to combine these types through the use of implicit conversions. When these conversions are wrong, they're rarely obviously wrong and they're often spectacularly wrong in ways that are difficult to debug.

When Perl concatenates a sequence of octets with a sequence of Unicode characters, it implicitly decodes the octet sequence using the Latin-1 encoding. The resulting string will contain Unicode characters. When you print Unicode characters, Perl will encode the string using UTF-8, because Latin-1 cannot represent the entire set of Unicode characters—Latin-1 is a subset of UTF-8.

The asymmetry between encodings and octets can lead to Unicode strings encoded as UTF-8 for output and decoded as Latin-1 from input. Worse yet, when the text contains only English characters with no accents, the bug stays hidden, because both encodings use the same representation for every character.

    my $hello    = "Hello, ";
    my $greeting = $hello . $name;

If $name contains an English name such as Alice you will never notice any problem, because the Latin-1 representation is the same as the UTF-8 representation. If $name contains a name such as José, $name can contain several possible values:

$name contains four Unicode characters.
$name contains four Latin-1 octets representing four Unicode characters.
$name contains five UTF-8 octets representing four Unicode characters.

The string literal has several possible scenarios:

It is an ASCII string literal and contains octets.
```
    my $hello = "Hello, ";
```
It is a Latin-1 string literal with no explicit encoding and contains octets.
```
    my $hello = "¡Hola, ";
```
The string literal contains octets.
It is a non-ASCII string literal with the utf8 or encoding pragma in effect and contains Unicode characters.
```
    use utf8;
    my $hello = "Kuirabá, ";
```

If both $hello and $name are Unicode strings, the concatenation will produce another Unicode string.

If both strings are octet streams, Perl will concatenate them into a new octet string. If both values are octets of the same encoding—both Latin-1, for example, the concatenation will work correctly. If the octets do not share an encoding—for example, a concatenation appending UTF-8 data to Latin-1 data—then the resulting sequence of octets makes sense in neither encoding. This could happen if the user entered a name as UTF-8 data and the greeting were a Latin-1 string literal, but the program decoded neither.

If only one of the values is a Unicode string, Perl will decode the other as Latin-1 data. If this is not the correct encoding, the resulting Unicode characters will be wrong. For example, if the user input were UTF-8 data and the string literal were a Unicode string, the name would be incorrectly decoded into five Unicode characters to form JosÃ© (sic) instead of José because the UTF-8 data means something else when decoded as Latin-1 data.

If your head is spinning, you're not alone. Always decode on input and encode on output.

See perldoc perluniintro for a far more detailed explanation of Unicode, encodings, and how to manage incoming and outgoing data in a Unicode world For far more detail about managing Unicode effectively throughout your programs, see Tom Christiansen's answer to "Why does Modern Perl avoid UTF-8 by default?" http://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default/6163129#6163129 and his "Perl Unicode Cookbook" series on Perl.com http://www.perl.com/pub/2012/04/perlunicook-standard-preamble.html.

Numbers

Perl supports numbers as both integers and floating-point values. You may represent them with scientific notation as well as in binary, octal, and hexadecimal forms:

    my $integer   = 42;
    my $float     = 0.007;
    my $sci_float = 1.02e14;
    my $binary    = 0b101010;
    my $octal     = 052;
    my $hex       = 0x20;

The emboldened characters are the numeric prefixes for binary, octal, and hex notation respectively. Be aware that a leading zero on an integer always indicates octal mode.

When 1.99 + 1.99 is 4

Even though you can write floating-point values explicitly with perfect accuracy, Perl—like most programming languages—represents them internally in a binary format. This representation is sometimes imprecise in specific ways; consult perldoc perlnumber for more details.

You may not use commas to separate thousands in numeric literals, lest the parser interpret the commas as the comma operator. Instead, use underscores within the number. The parser will treat them as invisible characters. Thus all of these are equivalent, though the second might be the most readable:

    my $billion = 1000000000;
    my $billion = 1_000_000_000;
    my $billion = 10_0_00_00_0_0_0;

Because of coercion (Coercion), Perl programmers rarely have to worry about converting data from outside the program to numbers. Perl will treat anything which looks like a number as a number when evaluated in a numeric context. In the rare circumstances where you need to know if something looks like a number without evaluating it in a numeric context, use the looks_like_number function from the core module Scalar::Util. This function returns a true value if Perl will consider the given argument numeric.

The Regexp::Common module from the CPAN provides several well-tested regular expressions to identify more specific types of numeric values such as whole numbers, integers, and floating-point values.

Undef

Perl's undef value represents an unassigned, undefined, and unknown value. Declared but undefined scalar variables contain undef:

    my $name = undef;   # unnecessary assignment
    my $rank;           # also contains undef

undef evaluates to false in boolean a context. Evaluating undef in a string context—such as interpolating it into a string:

    my $undefined;
    my $defined = $undefined . '... and so forth';

... produces an uninitialized value warning:

    Use of uninitialized value $undefined in
    concatenation (.) or string...

The defined builtin returns a true value if its operand evaluates to a defined value (that is, anything other than undef):

    my $status = 'suffering from a cold';

    say defined $status;  # 1, which is a true value
    say defined undef;    # empty string; a false value

The Empty List

When used on the right-hand side of an assignment, the () construct represents an empty list. In scalar context, this evaluates to undef. In list context, it is an empty list. When used on the left-hand side of an assignment, the () construct imposes list context. Why would you ever do this? To count the number of elements returned from an expression in list context without using a temporary variable, use the idiom (Idioms):

    my $count = () = get_clown_hats();

Because of the right associativity (Associativity) of the assignment operator, Perl first evaluates the second assignment by calling get_clown_hats() in list context. This produces a list.

Assignment to the empty list throws away all of the values of the list, but that assignment takes place in scalar context, which evaluates to the number of items on the right hand side of the assignment. As a result, $count contains the number of elements in the list returned from get_clown_hats().

Sound complicated? It can confuse new programmers, but with practice, you'll see how Perl's fundamental design features fit together.

Lists

A list is a comma-separated group of one or more expressions. Lists may occur verbatim in source code as values:

    my @first_fibs = (1, 1, 2, 3, 5, 8, 13, 21);

... as targets of assignments:

    my ($package, $filename, $line) = caller();

... or as lists of expressions:

    say name(), ' => ', age();

Parentheses do not create lists. The comma operator creates lists. Where present, the parentheses in these examples group expressions to change their precedence (Precedence).

Use the range operator to create lists of literals in a compact form See? Lists but no parentheses!:

    my @chars = 'a' .. 'z';
    my @count = 13 .. 27;

Use the qw() operator to split a literal string on whitespace to produce a list of strings Parentheses, but you could use any delimiter, such as qw!!.:

    my @stooges = qw( Larry Curly Moe Shemp Joey Kenny );

No Comment Please

Perl will emit a warning if a qw() contains a comma or the comment character (#), because not only are such characters rare in a qw(), their presence is often a mistake.

Lists can (and often do) occur as the results of expressions, but these lists do not appear literally in source code.

Lists and arrays are not interchangeable in Perl. You may store a list in an array and you may coerce an array to a list, but lists and arrays are separate concepts. Lists are values. Arrays are containers. For example, indexing into a list always occurs in list context. Indexing into an array can occur in scalar context (for a single element) or list context (for a slice):

    # don't worry about the details right now
    sub context
    {
        my $context = wantarray();

        say defined $context
             ? $context
                 ? 'list'
                 : 'scalar'
             : 'void';
        return 0;
    }

    my @list_slice  = (1, 2, 3)[context()];
    my @array_slice = @list_slice[context()];
    my $array_index = $array_slice[context()];

    say context(); # list context
    context();     # void context

Control Flow

Perl's basic control flow is straightforward. Program execution starts at the beginning (the first line of the file executed) and continues to the end:

    say 'At start';
    say 'In middle';
    say 'At end';

Perl's control flow directives change the order of execution—that is, what happens next in the program.

Branching Directives

The if directive performs the associated action only when its conditional expression evaluates to a true value:

    say 'Hello, Bob!' if $name eq 'Bob';

This postfix form is useful for simple expressions. Its block form groups multiple expressions into a unit which evaluates to a single boolean value:

    if ($name eq 'Bob')
    {
        say 'Hello, Bob!';
        found_bob();
    }

While the block form requires parentheses around its condition, the postfix form does not.

The conditional expression may consist of multiple subexpressions, as long as it evaluates to something which can be coerced to a boolean value:

    if ($name eq 'Bob' && not greeted_bob())
    {
        say 'Hello, Bob!';
        found_bob();
    }

In the postfix form, adding parentheses can clarify the intent of the code at the expense of visual cleanliness:

    greet_bob() if ($name eq 'Bob' && not greeted_bob());

The unless directive is the negated form of if. Perl will perform the action when the conditional expression evaluates to a false value:

    say "You're not Bob!" unless $name eq 'Bob';

Like if, unless also has a block form, though many programmers avoid it due to its potential for confusion:

    unless (is_leap_year() and is_full_moon())
    {
        frolic();
        gambol();
    }

unless works very well for postfix conditionals, especially parameter validation in functions (Postfix Parameter Validation):

    sub frolic
    {
        # do nothing without parameters
        return unless @_;

        for my $chant (@_) { ... }
    }

The block forms of if and unless both support the else directive, which provides code to run when the conditional expression does not evaluate to the appropriate true or false value:

    if ($name eq 'Bob')
    {
        say 'Hi, Bob!';
        greet_user();
    }
    else
    {
        say "I don't know you.";
        shun_user();
    }

else blocks allow you to rewrite if and unless conditionals in terms of each other:

    unless ($name eq 'Bob')
    {
        say "I don't know you.";
        shun_user();
    }
    else
    {
        say 'Hi, Bob!';
        greet_user();
    }

However, the implied double negative of using unless with an else block can be confusing. This example may be the only place you ever see it.

Just as Perl provides both if and unless to allow you to phrase your conditionals in the most readable way, Perl has both positive and negative conditional operators:

    if ($name ne 'Bob')
    {
        say "I don't know you.";
        shun_user();
    }
    else
    {
        say 'Hi, Bob!';
        greet_user();
    }

... though the double negative implied by the presence of the else block may be difficult to read.

If you have lots of conditions to check—and if they're mutually exclusive—use one or more elsif directives:

    if ($name eq 'Bob')
    {
        say 'Hi, Bob!';
        greet_user();
    }
    elsif ($name eq 'Jim')
    {
        say 'Hi, Jim!';
        greet_user();
    }
    else
    {
        say "You're not my uncle.";
        shun_user();
    }

An unless chain may also use an elsif block Good luck deciphering that!. There is no elseunless.

Writing else if is a syntax error Larry prefers elsif for aesthetic reasons, as well the prior art of the Ada programming language.:

    if ($name eq 'Rick')
    {
        say 'Hi, cousin!';
    }

    # warning; syntax error
    else if ($name eq 'Kristen')
    {
        say 'Hi, cousin-in-law!';
    }

The Ternary Conditional Operator

The ternary conditional operator evaluates a conditional expression and evaluates to one of two alternatives:

    my $time_suffix = after_noon($time)
                    ? 'afternoon'
                    : 'morning';

The conditional expression precedes the question mark character (?) and the colon character (:) separates the alternatives. The alternatives are expressions of arbitrary complexity—including other ternary conditional expressions.

An interesting, though obscure, idiom is to use the ternary conditional to select between alternative variables, not only values:

    push @{ rand() > 0.5 ? \@red_team : \@blue_team },
        Player->new;

Again, weigh the benefits of clarity versus the benefits of conciseness.

Short Circuiting

Perl exhibits short-circuiting behavior when it encounters complex conditional expressions. When Perl can determine that a complex expression would succeed or fail as a whole without evaluating every subexpression, it will not evaluate subsequent subexpressions. This is most obvious with an example:

    say 'Both true!' if ok( 1, 'subexpression one' )
                     && ok( 1, 'subexpression two' );

    done_testing();

The return value of ok() (Testing) is the boolean value produced by the first argument, so the example prints:

    ok 1 - subexpression one
    ok 2 - subexpression two
    Both true!

When the first subexpression—the first call to ok—evaluates to a true value, Perl must evaluate the second subexpression. If the first subexpression had evaluated to a false value, there would be no need to check subsequent subexpressions, as the entire expression could not succeed:

    say 'Both true!' if ok( 0, 'subexpression one' )
                     && ok( 1, 'subexpression two' );

This example prints:

    not ok 1 - subexpression one

Even though the second subexpression would obviously succeed, Perl never evaluates it. The same short-circuiting behavior is evident for logical-or operations:

    say 'Either true!' if ok( 1, 'subexpression one' )
                       || ok( 1, 'subexpression two' );

This example prints:

    ok 1 - subexpression one
    Either true!

With the success of the first subexpression, Perl can avoid evaluating the second subexpression. If the first subexpression were false, the result of evaluating the second subexpression would dictate the result of evaluating the entire expression.

Besides allowing you to avoid potentially expensive computations, short circuiting can help you to avoid errors and warnings, as in the case where using an undefined value might raise a warning:

    my $bbq;
    if (defined $bbq and $bbq eq 'brisket') { ... }

Context for Conditional Directives

The conditional directives—if, unless, and the ternary conditional operator—all evaluate an expression in boolean context (Context). As comparison operators such as eq, ==, ne, and != all produce boolean results when evaluated, Perl coerces the results of other expressions—including variables and values—into boolean forms.

Perl has no single true value nor a single false value. Any number which evaluates to 0 is false. This includes 0, 0.0, 0e0, 0x0, and so on. The empty string ('') and '0' evaluate to a false value, but the strings '0.0', '0e0', and so on do not. The idiom '0 but true' evaluates to 0 in numeric context but true in boolean context, thanks to its string contents.

Both the empty list and undef evaluate to a false value. Empty arrays and hashes return the number 0 in scalar context, so they evaluate to a false value in boolean context. An array which contains a single element—even undef—evaluates to true in boolean context. A hash which contains any elements—even a key and a value of undef—evaluates to a true value in boolean context.

Greater Control Over Context

The Want module from the CPAN allows you to detect boolean context within your own functions. The core overloading pragma (Overloading) allows you to specify what your own data types produce when evaluated in various contexts.

Looping Directives

Perl provides several directives for looping and iteration. The foreach-style loop evaluates an expression which produces a list and executes a statement or block until it has exhausted that list:

    foreach (1 .. 10)
    {
        say "$_ * $_ = ", $_ * $_;
    }

This example uses the range operator to produce a list of integers from one to ten inclusive. The foreach directive loops over them, setting the topic variable $_ (The Default Scalar Variable) to each in turn. Perl executes the block for each integer and, as a result, prints the squares of the integers.

foreach versus for

Many Perl programmers refer to iteration as foreach loops, but Perl treats the names foreach and for interchangeably. The parenthesized expression determines the type and behavior of the loop; the keyword does not.

Like if and unless, this loop has a postfix form:

    say "$_ * $_ = ", $_ * $_ for 1 .. 10;

A for loop may use a named variable instead of the topic:

    for my $i (1 .. 10)
    {
        say "$i * $i = ", $i * $i;
    }

When a for loop uses an iterator variable, the variable scope is within the loop. Perl will set this lexical to the value of each item in the iteration. Perl will not modify the topic variable ($_). If you have declared a lexical $i in an outer scope, its value will persist outside the loop:

    my $i = 'cow';

    for my $i (1 .. 10)
    {
        say "$i * $i = ", $i * $i;
    }

    is( $i, 'cow', 'Value preserved in outer scope' );

This localization occurs even if you do not redeclare the iteration variable as a lexical ... but do declare your iteration variables as lexicals to reduce their scope.:

    my $i = 'horse';

    for $i (1 .. 10)
    {
        say "$i * $i = ", $i * $i;
    }

    is( $i, 'horse', 'Value preserved in outer scope' );

Iteration and Aliasing

The for loop aliases the iterator variable to the values in the iteration such that any modifications to the value of the iterator modifies the iterated value in place:

    my @nums = 1 .. 10;

    $_ **= 2 for @nums;

    is( $nums[0], 1, '1 * 1 is 1' );
    is( $nums[1], 4, '2 * 2 is 4' );

    ...

    is( $nums[9], 100, '10 * 10 is 100' );

This aliasing also works with the block style for loop:

    for my $num (@nums)
    {
        $num **= 2;
    }

... as well as iteration with the topic variable:

    for (@nums)
    {
        $_ **= 2;
    }

You cannot use aliasing to modify constant values, however. Perl will produce an exception about modification of read-only values.

    $_++ and say for qw( Huex Dewex Louid );

You may occasionally see the use of for with a single scalar variable:

    for ($user_input)
    {
        s/\A\s+//;      # trim leading whitespace
        s/\s+\z//;      # trim trailing whitespace

        $_ = quotemeta; # escape non-word characters
    }

This idiom (Idioms) uses the iteration operator for its side effect of aliasing $_. Usually it's clearer to operate on the named variable itself.

Iteration and Scoping

The topic variable's iterator scoping has a subtle gotcha. Consider a function topic_mangler() which modifies $_ on purpose. If code iterating over a list called topic_mangler() without protecting $_, you'd have to spend some time debugging the effects:

    for (@values)
    {
        topic_mangler();
    }

    sub topic_mangler
    {
        s/foo/bar/;
    }

Yes, the substitution in topic_mangler() will modify elements of @values in place. If you must use $_ rather than a named variable, use the topic aliasing behavior of for:

    sub topic_mangler
    {
        # was $_ = shift;
        for (shift)
        {
            s/foo/bar/;
            s/baz/quux/;
            return $_;
        }
    }

Alternately, use a named iteration variable in the for loop. That's almost always the right advice.

The C-Style For Loop

The C-style for loop requires you to manage the conditions of iteration:

    for (my $i = 0; $i <= 10; $i += 2)
    {
        say "$i * $i = ", $i * $i;
    }

You must explicitly assign to an iteration variable in the looping construct, as this loop performs neither aliasing nor assignment to the topic variable. While any variable declared in the loop construct is scoped to the lexical block of the loop, Perl will not limit the lexical scope of a variable declared outside of the loop construct:

    my $i = 'pig';

    for ($i = 0; $i <= 10; $i += 2)
    {
        say "$i * $i = ", $i * $i;
    }

    isnt( $i, 'pig', '$i overwritten with a number' );

The looping construct may have three subexpressions. The first subexpression—the initialization section—executes only once, before the loop body executes. Perl evaluates the second subexpression—the conditional comparison—before each iteration of the loop body. When this evaluates to a true value, iteration proceeds. When it evaluates to a false value, iteration stops. The final subexpression executes after each iteration of the loop body.

    for (
        # loop initialization subexpression
        say 'Initializing', my $i = 0;

        # conditional comparison subexpression
        say "Iteration: $i" and $i < 10;

        # iteration ending subexpression
        say 'Incrementing ' . $i++
    )
    {
        say "$i * $i = ", $i * $i;
    }

Note the lack of a semicolon after the final subexpression as well as the use of the comma operator and low-precedence and; this syntax is surprisingly finicky. When possible, prefer the foreach-style loop to the for loop.

All three subexpressions are optional. One infinite for loop is:

    for (;;) { ... }

While and Until

A while loop continues until the loop conditional expression evaluates to a boolean false value. An idiomatic infinite loop is:

    while (1) { ... }

Unlike the iteration foreach-style loop, the while loop's condition has no side effects. If @values has one or more elements, this code is also an infinite loop, because every iteration will evaluate @values in scalar context to a non-zero value and iteration will continue:

    while (@values)
    {
        say $values[0];
    }

To prevent such an infinite while loop, use a destructive update of the @values array by modifying the array within each iteration:

    while (@values)
    {
        my $value = shift @values;
        say $value;
    }

Modifying @values inside of the while condition check also works, but it has some subtleties related to the truthiness of each value.

    while (my $value = shift @values)
    {
        say $value;
    }

This loop will exit as soon as it reaches an element that evaluates to a false value, not necessarily when it has exhausted the array. That may be the desired behavior, but it probably deserves a comment to explain why.

The until loop reverses the sense of the test of the while loop. Iteration continues while the loop conditional expression evaluates to a false value:

    until ($finished_running)
    {
        ...
    }

The canonical use of the while loop is to iterate over input from a filehandle:

    while (<$fh>)
    {
        # remove newlines
        chomp;
        ...
    }

Perl interprets this while loop as if you had written:

    while (defined($_ = <$fh>))
    {
        # remove newlines
        chomp;
        ...
    }

Without the implicit defined, any line read from the filehandle which evaluated to a false value in a scalar context—a blank line or a line which contained only the character 0—would end the loop. The readline (<>) operator returns an undefined value only when it has reached the end of the file.

Both while and until have postfix forms, such as the infinite loop 1 while 1;. Any single expression is suitable for a postfix while or until, including the classic "Hello, world!" example from 8-bit computers of the early 1980s:

    print "Hello, world!  " while 1;

Infinite loops are more useful than they seem, especially for event loops in GUI programs, program interpreters, or network servers:

    $server->dispatch_results until $should_shutdown;

Use a do block to group several expressions into a single unit:

    do
    {
        say 'What is your name?';
        my $name = <>;
        chomp $name;
        say "Hello, $name!" if $name;
    } until (eof);

A do block parses as a single expression which may contain several expressions. Unlike the while loop's block form, the do block with a postfix while or until will execute its body at least once. This construct is less common than the other loop forms, but no less powerful.

Loops within Loops

You may nest loops within other loops:

    for my $suit (@suits)
    {
        for my $values (@card_values) { ... }
    }

When you do this, declare your iteration variables! The potential for confusion with the topic variable and its scope is too great otherwise.

Novices commonly exhaust filehandles accidentally while nesting foreach and while loops:

    use autodie 'open';
    open my $fh, '<', $some_file;

    for my $prefix (@prefixes)
    {
        # DO NOT USE; buggy code
        while (<$fh>)
        {
            say $prefix, $_;
        }
    }

Opening the filehandle outside of the for loop leaves the file position unchanged between each iteration of the for loop. On its second iteration, the while loop will have nothing to read and will not iterate. You can solve this problem in many ways; re-open the file inside the for loop (wasteful but simple), slurp the entire file into memory (works best with small files), or seek the filehandle back to the beginning of the file for each iteration:

    for my $prefix (@prefixes)
    {
        while (<$fh>)
        {
            say $prefix, $_;
        }

        seek $fh, 0, 0;
    }

Loop Control

Sometimes you need to break out of a loop before you have exhausted the iteration conditions. Perl's standard control mechanisms—exceptions and return—work, but you may also use loop control statements.

The next statement restarts the loop at its next iteration. Use it when you've done all you need to in the current iteration. To loop over lines in a file and skip everything that starts with the comment character #:

    while (<$fh>)
    {
        next if /\A#/;
        ...
    }

Multiple Exits versus Nested Ifs

Compare the use of next with the alternative: wrapping the rest of the body of the block in an if. Now consider what happens if you have multiple conditions which could cause you to skip a line. Loop control modifiers with postfix conditionals can make your code much more readable.

The last statement ends the loop immediately. To finish processing a file once you've seen the ending token, write:

    while (<$fh>)
    {
        next if /\A#/;
        last if /\A__END__/
        ...
    }

The redo statement restarts the current iteration without evaluating the conditional again. This can be useful in those few cases where you want to modify the line you've read in place, then start processing over from the beginning without clobbering it with another line. To implement a silly file parser that joins lines which end with a backslash:

    while (my $line = <$fh>)
    {
        chomp $line;

        # match backslash at the end of a line
        if ($line =~ s{\\$}{})
        {
            $line .= <$fh>;
            chomp $line;
            redo;
        }

        ...
    }

Using loop control statements in nested loops can be confusing. If you cannot avoid nested loops—by extracting inner loops into named functions—use a loop label to clarify:

    LINE:
    while (<$fh>)
    {
        chomp;

        PREFIX:
        for my $prefix (@prefixes)
        {
            next LINE unless $prefix;
            say "$prefix: $_";
            # next PREFIX is implicit here
        }
    }

Continue

The continue construct behaves like the third subexpression of a for loop; Perl executes any continue block before subsequent iterations of a loop, whether due to normal loop repetition or premature re-iteration from next The Perl equivalent to C's continue is next.. You may use it with a while, until, when, or for loop. Examples of continue are rare, but it's useful any time you want to guarantee that something occurs with every iteration of the loop, regardless of how that iteration ends:

    while ($i < 10 )
    {
        next unless $i % 2;
        say $i;
    }
    continue
    {
        say 'Continuing...';
        $i++;
    }

Be aware that a continue block does not execute when control flow leaves a loop due to last or redo.

Switch Statements

Perl 5.10 introduced a new construct named given as a Perlish switch statement. It didn't quite work out; given is still experimental, but it's less buggy in 5.18 than it was in any previous version of Perl. That's a nice way of saying "don't use it unless you know what you're doing."

If you need a switch statement, use for to alias the topic variable ($_) and when to match it against simple expressions with smart match (Smart Matching) semantics. To write the Rock, Paper, Scissors Adding Spock and Lizard is an exercise for the reader. game:

    my @options  = ( \&rock, \&paper, \&scissors );
    my $confused = "I don't understand your move.";

    do
    {
        say "Rock, Paper, Scissors!  Pick one: ";
        chomp( my $user = <STDIN> );
        my $computer_match = $options[ rand @options ];
        $computer_match->( lc( $user ) );
    } until (eof);

    sub rock
    {
        print "I chose rock.  ";

        for (shift)
        {
            when (/paper/)    { say 'You win!' };
            when (/rock/)     { say 'We tie!'  };
            when (/scissors/) { say 'I win!'   };
            default           { say $confused  };
        }
    }

    sub paper
    {
        print "I chose paper.  ";

        for (shift)
        {
            when (/paper/)    { say 'We tie!'  };
            when (/rock/)     { say 'I win!'   };
            when (/scissors/) { say 'You win!' };
            default           { say $confused  };
        }
    }

    sub scissors
    {
        print "I chose scissors.  ";

        for (shift)
        {
            when (/paper/)    { say 'I win!'   };
            when (/rock/)     { say 'You win!' };
            when (/scissors/) { say 'We tie!'  };
            default           { say $confused  };
        }
    }

Perl executes the default rule when none of the other conditions match.

Simplified Dispatch with Multimethods

The CPAN module MooseX::MultiMethods provides another technique to simplify this code.

Tailcalls

A tailcall occurs when the last expression within a function is a call to another function. The outer function's return value becomes the inner function's return value:

    sub log_and_greet_person
    {
        my $name = shift;
        log( "Greeting $name" );

        return greet_person( $name );
    }

Returning from greet_person() directly to the caller of log_and_greet_person() is more efficient than returning to log_and_greet_person() and then from log_and_greet_person(). Returning directly from greet_person() to the caller of log_and_greet_person() is a tailcall optimization.

Heavily recursive code (Recursion)—especially mutually recursive code—can consume a lot of memory. Tailcalls reduce the memory needed for internal bookkeeping of control flow and can make expensive algorithms cheaper. Unfortunately, Perl does not automatically perform this optimization, so you have to do it yourself when it's necessary.

The builtin goto operator has a form which calls a function as if the current function were never called, essentially erasing the bookkeeping for the new function call. The ugly syntax confuses people who've heard "Never use goto", but it works:

    sub log_and_greet_person
    {
        my ($name) = @_;
        log( "Greeting $name" );

        goto &greet_person;
    }

This example has two important characteristics. First, goto &function_name or goto &$function_reference requires the use of the function sigil (&) so that the parser knows to perform a tailcall instead of jumping to a label. Second, this form of function call passes the contents of @_ implicitly to the called function. You may modify @_ to change the passed arguments if you desire.

This technique is relatively rare; it's most useful when you want to hijack control flow to get out of the way of other functions inspecting caller (such as when you're implementing special logging or some sort of debugging feature), or when using an algorithm which requires a lot of recursion. Remember it if you need it, but feel free not to use it.

Scalars

Perl's fundamental data type is the scalar: a single, discrete value. That value may be a string, an integer, a floating point value, a filehandle, or a reference—but it is always a single value. Scalars may be lexical, package, or global (Global Variables) variables. You may only declare lexical or package variables. The names of scalar variables must conform to standard variable naming guidelines (Names). Scalar variables always use the leading dollar-sign ($) sigil (Variable Sigils).

Variant Sigils and Context

Scalar values and scalar context have a deep connection; assigning to a scalar imposes scalar context. Using the scalar sigil with an aggregate variable imposes scalar context to access a single element of the hash or array.

Scalars and Types

A scalar variable can contain any type of scalar value without special conversions, coercions, or casts. The type of value stored in a scalar variable, once assigned, can change arbitrarily:

    my $value;
    $value = 123.456;
    $value = 77;
    $value = "I am Chuck's big toe.";
    $value = Store::IceCream->new;

Even though this code is legal, changing the type of data stored in a scalar is confusing.

This flexibility of type often leads to value coercion (Coercion). For example, you may treat the contents of a scalar as a string, even if you didn't explicitly assign it a string:

    my $zip_code       = 97123;
    my $city_state_zip = 'Hillsboro, Oregon' . ' ' . $zip_code;

You may also use mathematical operations on strings:

    my $call_sign = 'KBMIU';

    # update sign in place and return new value
    my $next_sign = ++$call_sign;

    # return old value, then update sign
    my $curr_sign = $call_sign++;

    # but does not work as:
    my $new_sign  = $call_sign + 1;

One-Way Increment Magic

This magical string increment behavior has no corresponding magical decrement behavior. You can't restore the previous string value by writing $call_sign--.

This string increment operation turns a into b and z into aa, respecting character set and case. While ZZ9 becomes AAA0, ZZ09 becomes ZZ10—numbers wrap around while there are more significant places to increment, as on a vehicle odometer.

Evaluating a reference (References) in string context produces a string. Evaluating a reference in numeric context produces a number. Neither operation modifies the reference in place, but you cannot recreate the reference from either result:

    my $authors     = [qw( Pratchett Vinge Conway )];
    my $stringy_ref = '' . $authors;
    my $numeric_ref =  0 + $authors;

$authors is still useful as a reference, but $stringy_ref is a string with no connection to the reference and $numeric_ref is a number with no connection to the reference.

To allow coercion without data loss, Perl scalars can contain both numeric and string components. The internal data structure which represents a scalar in Perl has a numeric slot and a string slot. Accessing a string in a numeric context produces a scalar with both string and numeric values. The dualvar() function within the core Scalar::Util module allows you to manipulate both values directly within a single scalar.

Scalars do not contain a separate slot for boolean values. In boolean context, the empty strings ('') and '0' evaluate to false values. All other strings evaluate to true values. In boolean context, numbers which evaluate to zero (0, 0.0, and 0e0) evaluate to false values. All other numbers are evaluate to true values.

What is Truth?

Be careful that the strings '0.0' and '0e0' evaluate to true values. This is one place where Perl makes a distinction between what looks like a number and what really is a number.

One other value is always a false value: undef.

Arrays

Perl arrays are first-class data structures—the language supports them as a built-in data type—which store zero or more scalars. You can access individual members of the array by integer indexes, and you can add or remove elements at will. Arrays grow or shrink as you manipulate them.

The @ sigil denotes an array. To declare an array:

    my @items;

Array Elements

Use the scalar sigil to access an individual element of an array. $cats[0] is an unambiguous use of the @cats array, because postfix (Fixity) square brackets ([]) always mean indexed access to an array.

The first element of an array is at the zeroth index:

    # @cats contains a list of Cat objects
    my $first_cat = $cats[0];

The last index of an array depends on the number of elements in the array. An array in scalar context (due to scalar assignment, string concatenation, addition, or boolean context) evaluates to the number of elements in the array:

    # scalar assignment
    my $num_cats = @cats;

    # string concatenation
    say 'I have ' . @cats . ' cats!';

    # addition
    my $num_animals = @cats + @dogs + @fish;

    # boolean context
    say 'Yep, a cat owner!' if @cats;

To get the index of the final element of an array, subtract one from the number of elements of the array (because array indexes start at 0) or use the unwieldy $#cats syntax:

    my $first_index = 0;
    my $last_index  = @cats - 1;
    # or
    # my $last_index = $#cats;

    say   "My first cat has an index of $first_index, "
        . "and my last cat has an index of $last_index."

When you care more about the relative position of an element in the array, use a negative array index. The last element of an array is available at the index -1. The second to last element of the array is available at index -2, and so on:

    my $last_cat           = $cats[-1];
    my $second_to_last_cat = $cats[-2];

$# has another use: resize an array in place by assigning to $#array. Remember that Perl arrays are mutable. They expand or contract as necessary. When you shrink an array, Perl will discard values which do not fit in the resized array. When you expand an array, Perl will fill the expanded positions with undef.

Array Assignment

Assign to individual positions in an array directly by index:

    my @cats;
    $cats[3] = 'Jack';
    $cats[2] = 'Tuxedo';
    $cats[0] = 'Daisy';
    $cats[1] = 'Petunia';
    $cats[4] = 'Brad';
    $cats[5] = 'Choco';

If you assign to an index beyond the array's current bound, Perl will extend the array to account for the new size and will fill in all intermediary positions with undef. After the first assignment, the array will contain undef at positions 0, 1, and 2 and Jack at position 3.

As an assignment shortcut, initialize an array from a list:

    my @cats = ( 'Daisy', 'Petunia', 'Tuxedo', ... );

... but remember that these parentheses do not create a list. Without parentheses, this would assign Daisy as the first and only element of the array, due to operator precedence (Precedence). Petunia, Tuxedo, and all of the other cats would be evaluated in void context and Perl would complain So would the cats, Petunia especially..

You may assign any expression which produces a list in list context to an array:

    my @cats     = get_cat_list();
    my @timeinfo = localtime();
    my @nums     = 1 .. 10;

Assigning to a scalar element of an array imposes scalar context, while assigning to the array as a whole imposes list context.

To clear an array, assign an empty list:

    my @dates = ( 1969, 2001, 2010, 2051, 1787 );
    ...
    @dates    = ();

This is one of the only cases where parentheses do indicate a list; without something to mark a list, Perl and readers of the code would get confused.

Arrays Start Empty

my @items = (); is a longer and noisier version of my @items. Freshly-declared arrays start out empty.

Array Operations

Sometimes an array is more convenient as an ordered, mutable collection of items than as a mapping of indices to values. Perl provides several operations to manipulate array elements without using indices.

The push and pop operators add and remove elements from the tail of an array, respectively:

    my @meals;

    # what is there to eat?
    push @meals, qw( hamburgers pizza lasagna turnip );

    # ... but your nephew hates vegetables
    pop @meals;

You may push a list of values onto an array, but you may only pop one at a time. push returns the new number of elements in the array. pop returns the removed element.

Because push operates on a list, you can easily append the elements of one multiple arrays with:

    push @meals, @breakfast, @lunch, @dinner;

Similarly, unshift and shift add elements to and remove an element from the start of an array, respectively:

    # expand our culinary horizons
    unshift @meals, qw( tofu spanakopita taquitos );

    # rethink that whole soy idea
    shift @meals;

unshift prepends a list of elements to the start of the array and returns the new number of elements in the array. shift removes and returns the first element of the array.

Few programs use the return values of push and unshift.

The splice operator removes and replaces elements from an array given an offset, a length of a list slice, and replacement elements. Both replacing and removing are optional; you may omit either behavior. The perlfunc description of splice demonstrates its equivalences with push, pop, shift, and unshift. One effective use is removal of two elements from an array:

    my ($winner, $runnerup) = splice @finalists, 0, 2;

    # or
    my $winner              = shift @finalists;
    my $runnerup            = shift @finalists;

The each operator allows you to iterate over an array by index and value:

    while (my ($index, $value) = each @bookshelf)
    {
        say "#$index: $value";
        ...
    }

Array Slices

The array slice construct allows you to access elements of an array in list context. Unlike scalar access of an array element, this indexing operation takes a list of zero or more indices and uses the array sigil (@):

    my @youngest_cats = @cats[-1, -2];
    my @oldest_cats   = @cats[0 .. 2];
    my @selected_cats = @cats[ @indexes ];

Array slices are useful for assignment:

    @users[ @replace_indices ] = @replace_users;

The only syntactic difference between an array slice of one element and the scalar access of an array element is the leading sigil. The semantic difference is greater: an array slice always imposes list context. An array slice evaluated in scalar context will produce a warning:

    Scalar value @cats[1] better written as $cats[1]...

An array slice imposes list context on the expression used as its index:

    # function called in list context
    my @hungry_cats = @cats[ get_cat_indices() ];

A slice can contain zero or more elements—including one:

    # single-element array slice; list context
    @cats[-1] = get_more_cats();

    # single-element array access; scalar context
    $cats[-1] = get_more_cats();

Arrays and Context

In list context, arrays flatten into lists. If you pass multiple arrays to a normal function, they will flatten into a single list:

    my @cats = qw( Daisy Petunia Tuxedo Brad Jack Choco );
    my @dogs = qw( Rodney Lucky Rosie );

    take_pets_to_vet( @cats, @dogs );

    sub take_pets_to_vet
    {
        # BUGGY: do not use!
        my (@cats, @dogs) = @_;
        ...
    }

Within the function, @_ will contain nine elements, not two, because list assignment to arrays is greedy. An array will consume as many elements from the list as possible. After the assignment, @cats will contain every argument passed to the function. @dogs will be empty ... but Rosie thinks she's a cat, so it's not all bad..

This flattening behavior sometimes confuses novices who attempt to create nested arrays:

    # creates a single array, not an array of arrays
    my @numbers = (1 .. 10, (11 .. 20, (21 .. 30)));

... but this code is effectively the same as either:

    # parentheses do not create lists
    my @numbers = ( 1 .. 10, 11 .. 20, 21 .. 30 );

    # creates a single array, not an array of arrays
    my @numbers = 1 .. 30;

... because parentheses merely group expressions. They do not create lists. To avoid this flattening behavior, use array references (Array References).

Array Interpolation

Arrays interpolate in strings as lists of the stringifications of each item separated by the current value of the magic global $". The default value of this variable is a single space. Its English.pm mnemonic is $LIST_SEPARATOR. Thus:

    my @alphabet = 'a' .. 'z';
    say "[@alphabet]";
    [a b c d e f g h i j k l m
     n o p q r s t u v w x y z]

Localize $" with a delimiter to ease your debugging Credit goes to Mark Jason Dominus for this technique.:

    # what's in this array again?
    local $" = ')(';
    say "(@sweet_treats)";
    (pie)(cake)(doughnuts)(cookies)(cinnamon roll)

Hashes

A hash is a first-class Perl data structure which associates string keys with scalar values. Just as the name of a variable corresponds to something which holds a value, so does a hash key refer to something which contains a value. Think of a hash like a contact list: use the names of your friends to look up their phone numbers. Other languages call hashes tables, associative arrays, dictionaries, or maps.

Hashes have two important properties: they store one scalar per unique key and they provide no specific ordering of keys. Keep that latter property in mind. Though it has always been true in Perl, it's very, very true in Perl 5.18.

Declaring Hashes

Hashes use the % sigil. Declare a lexical hash with:

    my %favorite_flavors;

A hash starts out empty. You could write my %favorite_flavors = ();, but that's redundant.

Hashes use the scalar sigil $ when accessing individual elements and curly braces { } for keyed access:

    my %favorite_flavors;
    $favorite_flavors{Gabi}    = 'Dark chocolate raspberry';
    $favorite_flavors{Annette} = 'French vanilla';

Assign a list of keys and values to a hash in a single expression:

    my %favorite_flavors = (
        'Gabi',    'Dark chocolate raspberry',
        'Annette', 'French vanilla',
    );

Hashes store pairs of keys and values. Perl will warn you if you assign an odd number of elements to a hash. Idiomatic Perl often uses the fat comma operator (=>) to associate values with keys, as it makes the pairing more visible:

    my %favorite_flavors = (
        Gabi    => 'Dark chocolate raspberry',
        Annette => 'French vanilla',
    );

The fat comma operator acts like the regular comma and also automatically quotes the previous bareword (Barewords). The strict pragma will not warn about such a bareword—and if you have a function with the same name as a hash key, the fat comma will not call the function:

    sub name { 'Leonardo' }

    my %address = (
        name => '1123 Fib Place'
    );

The key of this hash will be name and not Leonardo. To call the function, make the function call explicit:

    my %address = (
        name() => '1123 Fib Place'
    );

Assign an empty list to empty a hash You may occasionally see undef %hash, but that's a little ugly.:

    %favorite_flavors = ();

Hash Indexing

To access an individual hash value, use a key (a keyed access operation):to

    my $address = $addresses{$name};

In this example, $name contains a string which is also a key of the hash. As with accessing an individual element of an array, the hash's sigil has changed from % to $ to indicate keyed access to a scalar value.

You may also use string literals as hash keys. Perl quotes barewords automatically according to the same rules as fat commas:

    # auto-quoted
    my $address = $addresses{Victor};

    # needs quoting; not a valid bareword
    my $address = $addresses{'Sue-Linn'};

    # function call needs disambiguation
    my $address = $addresses{get_name()};

Don't Quote Me

Novices often always quote string literal hash keys, but experienced developers elide the quotes whenever possible. If you code this way, you can use the rare presence of quotes to indicate that you're doing something different.

Even Perl builtins get the autoquoting treatment:

    my %addresses =
    (
        Leonardo => '1123 Fib Place',
        Utako    => 'Cantor Hotel, Room 1',
    );

    sub get_address_from_name
    {
        return $addresses{+shift};
    }

The unary plus (Unary Coercions) turns what would be a bareword (shift) subject to autoquoting rules into an expression. As this implies, you can use an arbitrary expression—not only a function call—as the key of a hash:

    # don't actually do this though
    my $address = $addresses{reverse 'odranoeL'};

    # interpolation is fine
    my $address = $addresses{"$first_name $last_name"};

    # so are method calls
    my $address = $addresses{ $user->name };

Hash keys can only be strings. Anything that evaluates to a string is an acceptable hash key. Perl will go so far as to coerce (Coercion) any non-string into a string. For example, if you use an object as a hash key, you'll get the stringified version of that object instead of the object itself:

    for my $isbn (@isbns)
    {
        my $book = Book->fetch_by_isbn( $isbn );

        # unlikely to do what you want
        $books{$book} = $book->price;
    }

Hash Key Existence

The exists operator returns a boolean value to indicate whether a hash contains the given key:

    my %addresses =
    (
        Leonardo => '1123 Fib Place',
        Utako    => 'Cantor Hotel, Room 1',
    );

    say "Have Leonardo's address"
        if exists $addresses{Leonardo};
    say "Have Warnie's address"
        if exists $addresses{Warnie};

Using exists instead of accessing the hash key directly avoids two problems. First, it does not check the boolean nature of the hash value; a hash key may exist with a value even if that value evaluates to a boolean false (including undef):

    my  %false_key_value = ( 0 => '' );
    ok( %false_key_value,
         'hash containing false key & value
          should evaluate to a true value' );

Second, exists avoids autovivification (Autovivification) within nested data structures (Nested Data Structures).

If a hash key exists, its value may be undef. Check that with defined:

    $addresses{Leibniz} = undef;

    say "Gottfried lives at $addresses{Leibniz}"
        if exists  $addresses{Leibniz}
        && defined $addresses{Leibniz};

Accessing Hash Keys and Values

Hashes are aggregate variables, but their pairwise nature is unique. Perl allows you to iterate over the keys of a hash, over the values of a hash, or over pairs of keys and values. The keys operator produces a list of hash keys:

    for my $addressee (keys %addresses)
    {
        say "Found an address for $addressee!";
    }

The values operator produces a list of hash values:

    for my $address (values %addresses)
    {
        say "Someone lives at $address";
    }

The each operator produces a list of two-element lists of the key and the value:

    while (my ($addressee, $address) = each %addresses)
    {
        say "$addressee lives at $address";
    }

Unlike arrays, there is no obvious ordering to these lists. The ordering depends on the internal implementation of the hash, the particular version of Perl you are using, the size of the hash, and a random factor. Even so, the order of hash items is consistent between keys, values, and each. Modifying the hash may change the order, but you can rely on that order if the hash remains the same. However, even if two hashes have the same keys and values, you cannot rely on the iteration order between those hashes being the same. They may have been constructed differently or have had elements removed. In Perl 5.18, even if they were constructed the same way, you cannot depend on the same iteration order between them.

Each hash has only a single iterator for the each operator. You cannot reliably iterate over a hash with each more than once; if you begin a new iteration while another is in progress, the former will end prematurely and the latter will begin partway through the hash. During such iteration, beware not to call any function which may itself try to iterate over the hash with each.

In practice this occurs rarely. Reset a hash's iterator with keys or values in void context when you need it:

    # reset hash iterator
    keys %addresses;

    while (my ($addressee, $address) = each %addresses)
    {
        ...
    }

Hash Slices

A hash slice is a list of keys or values of a hash indexed in a single operation. To initialize multiple elements of a hash at once:

    # %cats already contains elements
    @cats{qw( Jack Brad Mars Grumpy )} = (1) x 4;

This is equivalent to the initialization:

    my %cats = map { $_ => 1 }
               qw( Jack Brad Mars Grumpy );

... except that the hash slice initialization does not replace the existing contents of the hash.

Hash slices also allow you to retrieve multiple values from a hash in a single operation. As with array slices, the sigil of the hash changes to @ to indicate list context. The use of the curly braces indicates keyed access and makes the fact that you're working with a hash unambiguous:

    my @buyer_addresses = @addresses{ @buyers };

Hash slices make it easy to merge two hashes:

    my %addresses        = ( ... );
    my %canada_addresses = ( ... );

    @addresses{ keys   %canada_addresses }
              = values %canada_addresses;

This is equivalent to looping over the contents of %canada_addresses manually, but is much shorter. Note that this relies on the iteration order of the hash remaining consistent between keys and values. Perl guarantees this, but only because these operations occur on the same hash and because nothing modifies the hash between the keys and values operations.

What if the same key occurs in both hashes? The hash slice approach always overwrites existing key/value pairs in %addresses. If you want other behavior, looping is more appropriate.

The Empty Hash

An empty hash contains no keys or values. It evaluates to a false value in a boolean context. A hash which contains at least one key/value pair evaluates to a true value in boolean context even if all of the keys or all of the values or both would themselves evaluate to boolean false values.

    use Test::More;

    my %empty;
    ok( ! %empty, 'empty hash should evaluate false' );

    my %false_key = ( 0 => 'true value' );
    ok( %false_key, 'hash containing false key
                     should evaluate to true' );

    my %false_value = ( 'true key' => 0 );
    ok( %false_value, 'hash containing false value
                       should evaluate to true' );

    done_testing();

In scalar context, a hash evaluates to a string which represents the ratio of full buckets in the hash—internal details about the hash implementation that you can safely ignore. (In a boolean scalar context, this ratio evaluates to a false value, so remember that instead of the ratio details.)

In list context, a hash evaluates to a list of key/value pairs similar to the list produced by the each operator. However, you cannot iterate over this list the same way you can iterate over the list produced by each. This loop will never terminate:

    # infinite loop for non-empty hashes
    while (my ($key, $value) = %hash)
    {
        ...
    }

You can loop over the list of keys and values with a for loop, but the iterator variable will get a key on one iteration and its value on the next, because Perl will flatten the hash into a single list of interleaved keys and values.

Hash Idioms

Because each key exists only once in a hash, assigning the same key to a hash multiple times stores only the most recent value associated with that key. This behavior has advantages! For example, to find unique elements of a listlist :

    my %uniq;
    undef @uniq{ @items };
    my @uniques = keys %uniq;

Using undef with a hash slice sets the values of the hash to undef. This idiom is the cheapest way to perform set operations with a hash.

Hashes are also useful for counting elements, such as IP addresses in a log file:

    my %ip_addresses;

    while (my $line = <$logfile>)
    {
        chomp $line;
        my ($ip, $resource) = analyze_line( $line );
        $ip_addresses{$ip}++;
        ...
    }

The initial value of a hash value is undef. The postincrement operator (++) treats that as zero. This in-place modification of the value increments an existing value for that key. If no value exists for that key, Perl creates a value (undef) and immediately increments it to one, as the numification of undef produces the value 0.

This strategy provides a useful caching mechanism to store the result of an expensive operation with little overhead:

    {
        my %user_cache;

        sub fetch_user
        {
            my $id = shift;
            $user_cache{$id} //= create_user($id);
            return $user_cache{$id};
        }
    }

This orcish maneuver Or-cache, if you like puns spelled out. returns the value from the hash, if it exists. Otherwise, it calculates, caches, and returns the value. The defined-or assignment operator (//=) evaluates its left operand. If that operand is not defined, the operator assigns to the lvalue the value of its right operand. In other words, if there's no value in the hash for the given key, this function will call create_user() with the key and update the hash.

Perl 5.10 introduced the defined-or and defined-or assignment operators. Prior to 5.10, most code used the boolean-or assignment operator (||=) for this purpose. Unfortunately, some valid values evaluate to a false value in boolean context, so evaluating the definedness of values is almost always more accurate. This lazy orcish maneuver tests for the definedness of the cached value, not truthiness. You may still see code with the pre-5.10 behavior. When you do, consider whether the defined-or operator makes more sense.

If your function takes several arguments, use a slurpy hash (Slurping) to gather key/value pairs into a single hash as named function arguments:

    sub make_sundae
    {
        my %parameters = @_;
        ...
    }

    make_sundae( flavor  => 'Lemon Burst',
                 topping => 'cookie bits' );

This approach allows you to set default values:

    sub make_sundae
    {
        my %parameters           = @_;
        $parameters{flavor}    //= 'Vanilla';
        $parameters{topping}   //= 'fudge';
        $parameters{sprinkles} //= 100;
        ...
    }

... or include them in the hash initialization, as latter assignments take precedence over earlier assignments:

    sub make_sundae
    {
        my %parameters =
        (
            flavor    => 'Vanilla',
            topping   => 'fudge',
            sprinkles => 100,
            @_,
        );
        ...
    }

Locking Hashes

As hash keys are barewords, they offer little typo protection compared to the function and variable name protection offered by the strict pragma. The little-used core module Hash::Util can make hashes safer.

To prevent someone from accidentally adding a hash key you did not intend (whether as a typo or from untrusted user input), use the lock_keys() function to restrict the hash to its current set of keys. Any attempt to add a new key to the hash will raise an exception. Similarly you can lock or unlock the existing value for a given key in the hash (lock_value() and unlock_value()) and make or unmake the entire hash read-only with lock_hash() and unlock_hash().

This is lax security; anyone can use the appropriate unlocking functions to work around the locking. Yet it does protect against typos and other accidental behavior.

Coercion

Throughout its lifetime, a Perl variable may contain values of different types—strings, integers, rational numbers, and more. Rather than attaching type information to variables, Perl relies on the context provided by operators (Numeric, String, and Boolean Context) to determine how to handle values. By design, Perl attempts to do what you mean Called DWIM for do what I mean or dwimmery., though you must be specific about your intentions. If you treat a variable which happens to contain a number as a string, Perl will do its best to coerce that number into a string.

Boolean Coercion

Boolean coercion occurs when you test the truthiness of a value, such as in an if or while condition. Numeric 0, undef, the empty string, and the string '0' all evaluate as false values. All other values—including strings which may be numerically equal to zero (such as '0.0', '0e', and '0 but true')—evaluate as true values.

When a scalar has both string and numeric components (Dualvars), Perl prefers to check the string component for boolean truth. '0 but true' evaluates to zero numerically, but it is not an empty string, and so it evaluates to a true value in boolean context.

String Coercion

String coercion occurs when using string operators such as comparisons (eq and cmp), concatenation, split, substr, and regular expressions, as well as when using a value or an expression as a hash key. The undefined value stringifies to an empty string, but produces a "use of uninitialized value" warning. Numbers stringify to strings containing their values, so the value 10 stringifies to the string 10. You can even split a number into individual digits with:

    my @digits = split '', 1234567890;

Numeric Coercion

Numeric coercion occurs when using numeric comparison operators (such as == and <=>), when performing mathematic operations, and when using a value or expression as an array or list index. The undefined value numifies to zero and produces a "Use of uninitialized value" warning. Strings which do not begin with numeric portions also numify to zero and produce an "Argument isn't numeric" warning. Strings which begin with characters allowed in numeric literals numify to those values and produce no warnings, such that 10 leptons leaping numifies to 10 and 6.022e23 moles marauding numifies to 6.022e23.

The core module Scalar::Util contains a looks_like_number() function which uses the same parsing rules as the Perl grammar to extract a number from a string.

Mathematicians Rejoice

The strings Inf and Infinity represent the infinite value and behave as numbers. The string NaN represents the concept "not a number". Numifying them produces no "Argument isn't numeric" warning. Beware that Perl's ideas of infinity and not a number may not match your platform's ideas; these notions aren't always portable across operating systems. Perl is consistent even if the rest of the universe isn't.

Reference Coercion

Using a dereferencing operation on a non-reference turns that value into a reference. This process of autovivification (Autovivification) is handy when manipulating nested data structures (Nested Data Structures):

    my %users;

    $users{Brad}{id} = 228;
    $users{Jack}{id} = 229;

Although the hash never contained values for Brad and Jack, Perl helpfully created hash references for them, then assigned each a key/value pair keyed on id.

Cached Coercions

Perl's internal representation of values stores both string and numeric values. Stringifying a numeric value does not replace the numeric value. Instead, it adds a stringified value to the internal representation, which then contains both components. Similarly, numifying a string value populates the numeric component while leaving the string component untouched.

Certain Perl operations prefer to use one component of a value over another—boolean checks prefer strings, for example. If a value has a cached representation in a form you do not expect, relying on an implicit conversion may produce surprising results. You almost never need to be explicit about what you expect Your author can recall doing so twice in fifteen years of programming Perl., but knowing that this caching occurs may someday help you diagnose an odd situation.

Dualvars

The multi-component nature of Perl values is available to users in the form of dualvars. The core module Scalar::Util provides a function dualvar() which allows you to bypass Perl coercion and manipulate the string and numeric components of a value separately:

    use Scalar::Util 'dualvar';
    my $false_name = dualvar 0, 'Sparkles & Blue';

    say 'Boolean true!'  if        !! $false_name;
    say 'Numeric false!' unless  0  + $false_name;
    say 'String true!'   if     ''  . $false_name;

Packages

A Perl namespace associates and encapsulates various named entities within a named category. It's like your family name or a brand name. Unlike a real-world name, a namespace implies no direct relationship between entities. Such relationships may exist, but they do not have to.

A package in Perl is a collection of code in a single namespace. The distinction is subtle: the package represents the source code and the namespace represents the entity created when Perl parses that code.

The package builtin declares a package and a namespace:

    package MyCode;

    our @boxes;

    sub add_box { ... }

All global variables and functions declared or referred to after the package declaration refer to symbols within the MyCode namespace. You can refer to the @boxes variable from the main namespace only by its fully qualified name of @MyCode::boxes. A fully qualified name includes a complete package name, so you can call the add_box() function only by MyCode::add_box().

The scope of a package continues until the next package declaration or the end of the file, whichever comes first. With package, you may provide a block which explicitly delineates the scope of the declaration:

    package Pinball::Wizard
    {
        our $VERSION = 1969;
    }

The default package is the main package. Without a package declaration, the current package is main. This rule applies to one-liners, standalone programs, and even .pm files.

Besides a name, a package has a version and three implicit methods, import() (Importing), unimport(), and VERSION(). VERSION() returns the package's version number. This number is a series of numbers contained in a package global named $VERSION. By rough convention, versions are a series of integers separated by dots, as in 1.23 or 1.1.10.

Perl includes a stricter syntax for version numbers, as documented in perldoc version::Internals. These version numbers must have a leading v character and at least three integer components separated by periods:

    package MyCode v1.2.1;

Combined with the block form of a package declaration, you can write:

    package Pinball::Wizard v1969.3.7 { ... }

This syntax is still rare, though. You're more likely to see the pre-5.14 version:

    package MyCode;

    our $VERSION = 1.21;

Every package inherits a VERSION() method from the UNIVERSAL base class. You may override VERSION(), though there are few reasons to do so. This method returns the value of $VERSION:

    my $version = Some::Plugin->VERSION;

If you provide a version number as an argument, this method will throw an exception unless the version of the module is equal to or greater than the argument:

    # require at least 2.1
    Some::Plugin->VERSION( 2.1 );

    die "Your plugin $version is too old"
        unless $version > 2;

Packages and Namespaces

Every package declaration creates a new namespace, if necessary, and causes the parser to put all subsequent package global symbols (global variables and functions) into that namespace.

Perl has open namespaces. You can add functions or variables to a namespace at any point, either with a new package declaration:

    package Pack
    {
        sub first_sub { ... }
    }

    Pack::first_sub();

    package Pack
    {
        sub second_sub { ... }
    }

    Pack::second_sub();

... or by fully qualifying function names at the point of declaration:

    # implicit
    package main;

    sub Pack::third_sub { ... }

You can add to a package at any point during compilation or runtime, regardless of the current file, though building up a package from multiple separate declarations (in multiple files!) can make code difficult to spelunk.

Namespaces can have as many levels as your organizational scheme requires, though namespaces are not hierarchical. The only relationship between packages is semantic, not technical. Many projects and businesses create their own top-level namespaces. This reduces the possibility of global conflicts and helps to organize code on disk. For example:

StrangeMonkey is the project name
StrangeMonkey::UI organizes user interface code
StrangeMonkey::Persistence organizes data management code
StrangeMonkey::Test organizes testing code for the project

... and so on. This is a convention, but it's a useful one.

References

Perl usually does what you expect, even if what you expect is subtle. Consider what happens when you pass values to functions:

    sub reverse_greeting
    {
        my $name = reverse shift;
        return "Hello, $name!";
    }

    my $name = 'Chuck';
    say reverse_greeting( $name );
    say $name;

Outside of the function, $name contains Chuck, even though the value passed into the function gets reversed into kcuhC. You probably expected that. The value of $name outside the function is separate from the $name inside the function. Modifying one has no effect on the other.

Consider the alternative. If you had to make copies of every value before anything could possibly change them out from under you, you'd have to write lots of extra defensive code.

Sometimes it's useful to modify values in place. If you want to pass a hash full of data to a function to modify it, creating and returning a new hash for each change could be tedious (to say nothing of inefficient).

Perl provides a mechanism by which to refer to a value without making a copy. Any changes made to that reference will update the value in place, such that all references to that value see the modified value. A reference is a first-class scalar data type which refers to another first-class data type.

Scalar References

The reference operator is the backslash (\). In scalar context, it creates a single reference which refers to another value. In list context, it creates a list of references. To take a reference to $name:

    my $name     = 'Larry';
    my $name_ref = \$name;

You must dereference a reference to evaluate the value to which it refers. Dereferencing requires you to add an extra sigil for each level of dereferencing:

    sub reverse_in_place
    {
        my $name_ref = shift;
        $$name_ref   = reverse $$name_ref;
    }

    my $name = 'Blabby';
    reverse_in_place( \$name );
    say $name;

The double scalar sigil ($$) dereferences a scalar reference.

While in @_, parameters behave as aliases to caller variables Remember that for loops produce a similar aliasing behavior (Iteration and Aliasing)., so you can modify them in place:

    sub reverse_value_in_place
    {
        $_[0] = reverse $_[0];
    }

    my $name = 'allizocohC';
    reverse_value_in_place( $name );
    say $name;

You usually don't want to modify values this way—callers rarely expect it, for example. Assigning parameters to lexicals within your functions removes this aliasing behavior.

Saving Memory with References

Modifying a value in place, or returning a reference to a scalar can save memory. Because Perl copies values on assignment, you could end up with multiple copies of a large string. Passing around references means that Perl will only copy the references—a far cheaper operation. Before you modify your code to pass only references, however, measure to see if this will make a difference.

Complex references may require a curly-brace block to disambiguate portions of the expression. You may always use this syntax, though sometimes it clarifies and other times it obscures:

    sub reverse_in_place
    {
        my $name_ref   = shift;
        ${ $name_ref } = reverse ${ $name_ref };
    }

If you forget to dereference a scalar reference, Perl will likely coerce the reference into a string value of the form SCALAR(0x93339e8) or a numeric value like 0x93339e8. This value indicates the type of reference (in this case, SCALAR) and the location in memory of the reference ... not that that is useful for anything beyond distinguishing between references..

References Aren't Pointers

Perl does not offer native access to memory locations. The address of the reference is a value used as an identifier. Unlike pointers in a language such as C, you cannot modify the address of a reference or treat it as an address into memory. These addresses are only mostly unique because Perl may reuse storage locations as it reclaims unused memory.

Array References

Array references are useful in several circumstances:

To pass and return arrays from functions without list flattening
To create multi-dimensional data structures
To avoid unnecessary array copying
To hold anonymous data structures

Use the reference operator to create a reference to a declared array:

    my @cards     = qw( K Q J 10 9 8 7 6 5 4 3 2 A );
    my $cards_ref = \@cards;

Any modifications made through $cards_ref will modify @cards and vice versa. You may access the entire array as a whole with the @ sigil, whether to flatten the array into a list (list context) or count its elements (scalar context):

    my $card_count = @$cards_ref;
    my @card_copy  = @$cards_ref;

Access individual elements by using the dereferencing arrow (->):

    my $first_card = $cards_ref->[0];
    my $last_card  = $cards_ref->[-1];

The arrow is necessary to distinguish between a scalar named $cards_ref and an array named @cards_ref. Note the use of the scalar sigil (Variable Sigils) to access a single element.

Doubling Sigils

An alternate syntax prepends another scalar sigil to the array reference. It's shorter but uglier to write my $first_card = $$cards_ref[0];.

Use the curly-brace dereferencing syntax to slice (Array Slices) an array reference:

    my @high_cards = @{ $cards_ref }[0 .. 2, -1];

You may omit the curly braces, but their grouping often improves readability.

To create an anonymous array—without using a declared array—surround a list of values or a list-producing expression with square brackets:

    my $suits_ref = [qw( Monkeys Robots Dinos Cheese )];

This array reference behaves the same as named array references, except that the anonymous array brackets always create a new reference. Taking a reference to a named array in its scope always refers to the same array. For example:

    my @meals      = qw( soup sandwiches pizza );
    my $sunday_ref = \@meals;
    my $monday_ref = \@meals;

    push @meals, 'ice cream sundae';

... both $sunday_ref and $monday_ref now contain a dessert, while:

    my @meals      = qw( soup sandwiches pizza );
    my $sunday_ref = [ @meals ];
    my $monday_ref = [ @meals ];

    push @meals, 'berry pie';

... neither $sunday_ref nor $monday_ref contains a dessert. Within the square braces used to create the anonymous array, list context flattens the @meals array into a list unconnected to @meals.

Hash References

Use the reference operator on a named hash to create a hash reference:

    my %colors = (
        blue   => 'azul',
        gold   => 'dorado',
        red    => 'rojo',
        yellow => 'amarillo',
        purple => 'morado',
    );

    my $colors_ref = \%colors;

Access the keys or values of the hash by prepending the reference with the hash sigil %:

    my @english_colors = keys   %$colors_ref;
    my @spanish_colors = values %$colors_ref;

Access individual values of the hash (to store, delete, check the existence of, or retrieve) by using the dereferencing arrow or double sigils:

    sub translate_to_spanish
    {
        my $color = shift;
        return $colors_ref->{$color};
        # or return $$colors_ref{$color};
    }

Use the array sigil (@) and disambiguation braces to slice a hash reference:

    my @colors  = qw( red blue green );
    my @colores = @{ $colors_ref }{@colors};

Create anonymous hashes in place with curly braces:

    my $food_ref = {
        'birthday cake' => 'la torta de cumpleaños',
        candy           => 'dulces',
        cupcake         => 'bizcochito',
        'ice cream'     => 'helado',
    };

As with anonymous arrays, anonymous hashes create a new anonymous hash on every execution.

Watch Those Braces!

The common novice error of assigning an anonymous hash to a standard hash produces a warning about an odd number of elements in the hash. Use parentheses for a named hash and curly brackets for an anonymous hash.

Function References

Perl supports first-class functions in that a function is a data type just as is an array or hash. In other words, Perl supports function references. This enables many advanced features (Closures). Create a function reference by using the reference operator and the function sigil (&) on the name of a function:

    sub bake_cake { say 'Baking a wonderful cake!' };

    my $cake_ref = \&bake_cake;

Without the function sigil (&), you will take a reference to the function's return value or values.

Create anonymous functions with the bare sub keyword:

    my $pie_ref = sub { say 'Making a delicious pie!' };

The use of the sub builtin without a name compiles the function but does not install it in the current namespace. The only way to access this function is via the reference returned from sub. Invoke the function reference with the dereferencing arrow:

    $cake_ref->();
    $pie_ref->();

Perl 4 Function Calls

An alternate invocation syntax for function references uses the function sigil (&) instead of the dereferencing arrow. Avoid this syntax; it has subtle implications for parsing and argument passing.

Think of the empty parentheses as denoting an invocation dereferencing operation in the same way that square brackets indicate an indexed (array) lookup and curly brackets a keyed (hash) lookup. Pass arguments to the function within the parentheses:

    $bake_something_ref->( 'cupcakes' );

You may also use function references as methods with objects (Moose). This is useful when you've already looked up the method (Reflection):

    my $clean = $robot_maid->can( 'cleanup' );
    $robot_maid->$clean( $kitchen );

Filehandle References

When you use open's (and opendir's) lexical filehandle form, you deal with filehandle references. Internally, these filehandles are objects of the class IO::File. You can call methods on them directly:

    use autodie 'open';

    open my $out_fh, '>', 'output_file.txt';
    $out_fh->say( 'Have some text!' );

Old code might use IO::Handle;. Older code may take references to typeglobs:

    local *FH;
    open FH, "> $file" or die "Can't write '$file': $!";
    my $fh = \*FH;

This idiom predates lexical filehandles Introduced with Perl 5.6.0 in March 2000, so this code is stuck in the previous millennium.. You may still use the reference operator on typeglobs to take references to package-global filehandles such as STDIN, STDOUT, STDERR, or DATA—but these are all global names anyhow.

Prefer lexical filehandles when possible. With the benefit of explicit scoping, lexical filehandles allow you to manage the lifespan of filehandles as a feature of Perl's memory management.

Reference Counts

Perl uses a memory management technique known as reference counting. Every Perl value has an attached counter. Perl increases this counter every time something takes a reference to the value, whether implicitly or explicitly. Perl decreases that counter every time a reference goes away. When the counter reaches zero, Perl knows it can safely recycle that value. Consider the filehandle opened in this inner scope:

    say 'file not open';

    {
        open my $fh, '>', 'inner_scope.txt';
        $fh->say( 'file open here' );
    }

    say 'file closed here';

Within the inner block in the example, there's one $fh. (Multiple lines in the source code mention it, but there's only one variable, the one named $fh.) $fh is only in scope in the block. Its value never leaves the block. When execution reaches the end of the block, Perl recycles the variable $fh and decreases the reference count of the filehandle referred to by $fh. The filehandle's reference count reaches zero, so Perl recycles it to reclaim memory, and calls close() implicitly.

You don't have to understand the details of how all of this works. You only need to understand that your actions in taking references and passing them around affect how Perl manages memory (see Circular References).

References and Functions

When you use references as arguments to functions, document your intent carefully. Modifying the values of a reference from within a function may surprise the calling code, which never expected anything else to modify its data. To modify the contents of a reference without affecting the reference itself, copy its values to a new variable:

    my @new_array = @{ $array_ref };
    my %new_hash  = %{ $hash_ref  };

This is only necessary in a few cases, but explicit cloning helps avoid nasty surprises for the calling code. If you use nested data structures or other complex references, consider the use of the core module Storable and its dclone (deep cloning) function.

Nested Data Structures

Perl's aggregate data types—arrays and hashes—allow you to store scalars indexed by integer or string keys. Note the word scalar. If you try to store an array in an array, Perl's automatic list flattening will make everything into a single array:

    my @counts = qw( eenie miney moe   );
    my @ducks  = qw( huey  dewey louie );
    my @game   = qw( duck  duck  goose );

    my @famous_triplets = (
        @counts, @ducks, @game
    );

Perl's solution to this is references (References), which are special scalars that can refer to other variables (scalars, arrays, and hashes). Nested data structures in Perl, such as an array of arrays or a hash of hashes, are possible through the use of references. References are useful and you need to understand them, but you don't have to like their syntax—they're one of Perl's uglier features.

Use the reference operator, \, to produce a reference to a named variable:

    my @famous_triplets = (
        \@counts, \@ducks, \@game
    );

... or the anonymous reference declaration syntax to avoid the use of named variables:

    my @famous_triplets = (
        [qw( eenie miney moe   )],
        [qw( huey  dewey louie )],
        [qw( duck  duck  goose )],
    );

    my %meals = (
        breakfast => { entree => 'eggs',
                       side   => 'hash browns'   },
        lunch     => { entree => 'panini',
                       side   => 'apple'         },
        dinner    => { entree => 'steak',
                       side   => 'avocado salad' },
    );

Commas are Free

Perl allows an optional trailing comma after the last element of a list. This makes it easy to add more elements in the future.

Use Perl's reference syntax to access elements in nested data structures. The sigil denotes the amount of data to retrieve. The dereferencing arrow indicates that the value of one portion of the data structure is a reference:

    my $last_nephew = $famous_triplets[1]->[2];
    my $meal_side   = $meals{breakfast}->{side};

The only way to nest a multi-level data structure is through references, so the arrow in the previous examples is superfluous. You may omit it for clarity, except for invoking function references:

    my $nephew = $famous_triplets[1][2];
    my $meal   = $meals{breakfast}{side};

    $actions{generous}{buy_food}->( $nephew, $meal );

Use disambiguation blocks to access components of nested data structures as if they were first-class arrays or hashes:

    my $nephew_count   = @{ $famous_triplets[1] };
    my $dinner_courses = keys %{ $meals{dinner} };

... or to slice a nested data structure:

    my ($entree, $side) =
     @{ $meals{breakfast} }{ qw( entree side ) };

Whitespace helps, but does not entirely eliminate the noise of this construct. Sometimes a temporary variable provides more clarity:

    my $meal_ref        = $meals{breakfast};
    my ($entree, $side) = @$meal_ref{qw( entree side )};

... or use for's implicit aliasing to avoid the use of an intermediate reference (though note the lack of my):

    ($entree, $side) = @{ $_ }{qw( entree side )}
                       for $meals{breakfast};

perldoc perldsc, the data structures cookbook, gives copious examples of how to use Perl's various data structures.

Autovivification

When you attempt to write to a component of a nested data structure, Perl will create the path through the data structure to the destination as necessary:

    my @aoaoaoa;
    $aoaoaoa[0][0][0][0] = 'nested deeply';

After the second line of code, this array of arrays of arrays of arrays contains an array reference in an array reference in an array reference in an array reference. Each array reference contains one element.

Similarly, when you ask Perl to treat an undefined value as if it were a hash reference, Perl will turn that undefined value into a hash reference:

    my %hohoh;
    $hohoh{Robot}{Santa} = 'mostly harmful';

This behavior is autovivification. While it reduces the initialization code of nested data structures, it cannot distinguish between the honest intent to create missing elements in nested data structures or an accidental typo.

You may wonder at the contradiction between taking advantage of autovivification while enabling strictures. The question is one of balance. Is it more convenient to catch errors which change the behavior of your program at the expense of disabling error checks for a few well-encapsulated symbolic references? Is it more convenient to allow data structures to grow or safer to require a fixed size and an allowed set of keys?

Controlling Autovivification

The autovivification pragma (Pragmas) from the CPAN lets you disable autovivification in a lexical scope for specific types of operations.

The answers depend on your project. During early development, allow yourself the freedom to experiment. While testing and deploying, consider an increase of strictness to prevent unwanted side effects. Thanks to the lexical scoping of the strict and autovivification pragmas, you can enable these behaviors where and as necessary.

You can verify your expectations before dereferencing each level of a complex data structure, but the resulting code is often lengthy and tedious. It's better to avoid deeply nested data structures by revising your data model to provide better encapsulation.

Debugging Nested Data Structures

The complexity of Perl's dereferencing syntax combined with the potential for confusion with multiple levels of references can make debugging nested data structures difficult. Two good visualization tools exist.

The core module Data::Dumper converts values of arbitrary complexity into strings of Perl code:

    use Data::Dumper;

    print Dumper( $my_complex_structure );

Use this when you need to figure out what a data structure contains, what you should access, and what you accessed instead. Data::Dumper can dump objects as well as function references (if you set $Data::Dumper::Deparse to a true value).

While Data::Dumper is a core module and prints Perl code, its output is verbose. Some developers prefer the use of the YAML::XS or JSON modules for debugging. They do not produce Perl code, but their outputs can be much clearer to read and to understand.

Circular References

Perl's memory management system of reference counting (Reference Counts) has one drawback. Two references which point to each other (directly or indirectly) form a circular reference that Perl cannot destroy on its own. Consider a biological model, where each entity has two parents and zero or more children:

    my $alice  = { mother => '',     father => ''     };
    my $robin  = { mother => '',     father => ''     };
    my $cianne = { mother => $alice, father => $robin };

    push @{ $alice->{children} }, $cianne;
    push @{ $robin->{children} }, $cianne;

Both $alice and $robin contain an array reference which contains $cianne. Because $cianne is a hash reference which contains $alice and $robin, Perl will never decrease the reference count of any of these three people to zero. It doesn't recognize that these circular references exist, and it can't manage the lifespan of these entities.

Either break the reference count manually yourself (by clearing the children of $alice and $robin or the parents of $cianne), or use weak references. A weak reference is a reference which does not increase the reference count of its referent. Use the core module Scalar::Util's weaken() function to weaken a reference:

    use Scalar::Util 'weaken';

    my $alice  = { mother => '',     father => ''     };
    my $robin  = { mother => '',     father => ''     };
    my $cianne = { mother => $alice, father => $robin };

    push @{ $alice->{children} }, $cianne;
    push @{ $robin->{children} }, $cianne;

    weaken( $cianne->{mother} );
    weaken( $cianne->{father} );

$cianne will retain usable references to $alice and $robin, but those weak references do not count toward the number of remaining references to the parents. If the reference count of $alice reaches zero, Perl's garbage collector will reclaim her record, even though $cianne has a weak reference to $alice. Be aware that, when $alice gets reclaimed, $cianne's reference to $alice will be set to undef.

Most data structures do not need weak references, but when they're necessary, they're invaluable.

Alternatives to Nested Data Structures

While Perl is content to process data structures nested as deeply as you can imagine, the human cost of understanding these data structures and their relationships—to say nothing of the complex syntax—is high. Beyond two or three levels of nesting, consider whether modeling various components of your system with classes and objects (Moose) will allow for clearer code.