strictperl

| 6 Comments

As I mentioned in Why corehackers Matters, the ability to fork and modify your own version of bleadperl -- and perhaps get it merged back into Chip's staging tree -- opens a lot of room for experimentation.

I alluded to a minor feature branch I've worked on for a couple of days: unilaterally enabling strict for all code not run through -e. This is available from my strict_by_default bleadperl tree on GitHub. You're welcome to download it, play with it, fork it, submit patches, or do whatever you want.

If Perl is a Shinto shrine, forking is an act of love... provided there's a merge sometime in the future.

Playing with strictperl

To build strictperl, first clone my bleadperl tree from GitHub. Check out the strict_by_default branch:

$ git clone git://github.com/chromatic/perl.git
$ cd perl
$ git checkout origin/strict_by_default

Then configure and build Perl as normal:

$ sh Configure -de -Dusedevel
$ make

This will build the familiar perl binary. Now build strictperl:

$ make strictperl

This will build a separate binary named strictperl. If I've written the code (and especially the Makefile rules) correctly, these will be two completely separate binaries with different behaviors:

$ ./perl       -e 'print $c' # no error
$ ./strictperl -e 'print $c' # no error

$ echo 'print $c' > printc.pl
$ ./perl       printc.pl  # no error
$ ./strictperl printc.pl
Global symbol "$c" requires explicit package name at printc.pl line 1.
Execution of printc.pl aborted due to compilation errors.

You can use strictperl in place of regular perl any place you like... except that several core modules are not strict safe. In particular, Exporter and vars are the first two problematic core libraries.

Similarly, any module which does not use strict may have strictness errors when running under strictperl.

I don't think that's a bad thing, however; think of it as an opportunity to make lots of code strict safe even if it doesn't use strict right now. (You could argue "Why in the world would you ever want to touch all of that code for no benefit?" You can also argue why you'd want to make your C code lint-safe, or run Perl::Critic on a codebase.) These "errors" may not be errors in practice, but if we evaluate them all, we can note declaratively in our source code that we've considered each one carefully and avoid further potential maintenance problems. Right now strictperl is an experiment and a tool to help us identify these situations.

Patches and pull requests very welcome to help patch up the core modules for strict safety.

How it Works

strictperl works by changing the default hintset of nextstate nodes in the Perl 5 optree.

Don't be scared. The implementation is slightly ugly (thanks to the way strict itself works), but it's much less invasive or difficult than rewriting optrees as something like Devel::Declare must do.

If you look in the strict pragma, you'll see several auspicious lines:

my %bitmask = (
    refs => 0x00000002,
    subs => 0x00000200,
    vars => 0x00000400
);

# ...

sub import {
    shift;
    $^H |= @_ ? bits(@_) : $default_bits;
}

This code ORs together a bitmask of all of the strict features you've requested and toggles them on in the magic $^H pseudo global variable. These constants correspond to three constants #defined in perl.h:

#define HINT_STRICT_REFS    0x00000002 /* strict pragma */
/* ... */
#define HINT_STRICT_SUBS    0x00000200 /* strict pragma */
#define HINT_STRICT_VARS    0x00000400 /* strict pragma */

These hints are part of a particular type of node in the optree called a COP (control op, I presume). These are always nodes of type nextstate; you see them often when you use B::Concise, for example:

$ perl -MO=Concise
print "Hello, world!"
6  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 1 -:1) v:{ ->3
5     <@> print vK ->6
3        <0> pushmark s ->4
4        <$> const[PV "Hello, world!"] s ->5
- syntax OK

Each COP contains information about the package and line number of the Perl code the next ops represent, as well as hint information such as which strict pragma features are in effect. (They contain more information as well.)

When you modify the hints through $^H, you modify the flags in the previously-executed nextstate op. (If you're very curious, see the cop_hints member of the cop struct in cop.h.

There's a complicating factor. nextstate hints nest in a similar way that lexical scopes nest. If you enable strict in an outer scope, its effect remains in place in inner scopes unless they explicitly disable it.

That's actually fortunate, in this case.

I knew that enabling strict meant setting the appropriate hints flags when building COP nodes in the optree. That meant modifying Perl's parser. My original approach was to modify the function used to create new COP nodes, a function named newSTATEOP. That's where I discovered the pseudo-inheritance scheme which allows strict nesting. (I admit that I don't understand all of its implications).

After a couple of blind alleys, I realized that the only way to enable strict pervasively was to find the creation point of the parentmost COP node in the optree and set these hint flags there.

Perl 5 uses a top-down parser; it starts by matching the most general rule and descending into subrules to try to build a whole program. The topmost rule is prog; a program matches the progstart and lineseq rules. progstart is simple:

progstart:  { PL_parser->expect = XSTATE; $$ = block_start(TRUE); };

You can ignore the contents of this rule. The important point is that this is the first rule matched in a program -- a file, actually.

There's one more piece of the puzzle. If you look in the implementation of the newSTATEOP function, you'll see that it uses a globalish (interpreter-local, anyhow) variable PL_hints to set the hints flags on the newly-created COP:

    CopHINTS_set(cop, PL_hints);

Thus my patch is very simple; progstart now reads:

progstart:
        {
            PL_hints |= PL_e_script ? DEFAULT_CLI_HINTS : DEFAULT_PROGRAM_HINTS;
            PL_parser->expect = XSTATE; $$ = block_start(TRUE);
        }
    ;

PL_e_script is another interpreter-local variable which contains the text of code run with -e. It's empty unless the invoking command line used the -e flag. DEFAULT_CLI_HINTS and DEFAULT_PROGRAM_HINTS are new constants I added to perl.h:

/* which hints are in $^H by default */
#define DEFAULT_CLI_HINTS 0
#ifdef STRICTPERL
#   define DEFAULT_PROGRAM_HINTS \
               HINT_STRICT_REFS | HINT_STRICT_VARS | HINT_STRICT_SUBS
#else
#   define DEFAULT_PROGRAM_HINTS 0
#endif

I made them conditional on the STRICTPERL symbol for one specific reason: the compilation rules I added to the Makefile to build strictperl define -DSTRICTPERL and rebuild the Perl 5 parser. Thus the DEFAULT_PROGRAM_HINTS constant enables all strictures only when building strictperl.

(Yes, cautious Makefile hackers, those rules clean up after themselves so that the relevant files always get rebuilt when building strictperl and get removed after building strictperl so that any subsequent non-strictperl builds do not use object files with the wrong constants defined.)

The hardest part of this whole process was getting the Makefile rules right. I'm not quite sure they're cross-platform enough, but they work with my testing.

The Value of strictperl

Was this process worthwhile? It was entertaining. It gave me the chance to write code to implement a feature I believe is worth considering. It helped me understand the optree in a bit more detail. It gave me a good opportunity to explain some of that here.

Perhaps the best result of this process is that we now do have a Perl with strictures enabled by default. We can experiment with that to see how writing code works in this case. Admittedly there's a lot of work necessary to make core libraries play nicely with strictperl, but we can do that in pieces because this is an optional feature you have to enable by default, one which does not interfere with regular perl.

Those are the kinds of experiments I want to encourage.

6 Comments

FWIW, in blead 'strict' is now enabled when you 'use 5.011;' or 'use feature ":5.11"'.

Which I think is a good compromise for those who want to preserve backwards compatibility.

@moritz, thanks for mentioning that. I meant to do so, but I forgot. I still have mixed feelings about the feature pragma, but I'm glad to see that enhancement.

I think that strict on by default is a sane way going forward.

Just to set the record straight on one point: 'but it's much less invasive or difficult than rewriting optrees as something like Devel::Declare must do.'

Devel::Declare does nothing to the optree. It simply(!) installs a PL_check handler and has worked out a way to change the about-to-be-parsed line string that perl sees, and supplies a few hooks into perl to change this string.

Does 'no strict' work with this strictperl? That would seriously lower the cost of updating code that's too hairy to get strict safe easily. And then the 'no strict' wart would be an incentive to examine the code in detail when there's time.

Thanks for setting the record straight, Ash. I remember working on something recently and manipulating the optree, but it must not have been with Devel::Declare.

@careytilden, no strict works exactly as you expect. Your strategy is sound.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on July 9, 2009 4:11 PM.

Why Corehackers Matters was the previous entry in this blog.

A Stupid Experiment in Reliability and Maintainability is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?