What the Perl 5 Compiler Modules Could Have Been

Once in a while, someone asks "How can I compile my Perl program to a binary?" Once in a while, someone answers "Use B::CC, at which point many someones shudder and reply "No, please never suggest such a thing, you horrible person."

Set aside that thought for a second.

You may have heard of Devel::Declare, which allows you to bend, fold, spindle, and mangle Perl syntax in a way that's safer than source filters but which allows nicer code such as signatures to work without making some poor fool like me patch the Perl parser. Unfortunately, D::D works by hijacking parts of the parsing phase to inject bits and pieces of alternate Perl code in place of non-Perl code.

The good news is that it's fairly well encapsulated and respects lexical scope. The bad news is that you're using Perl to generate Perl, which has many of the same drawbacks as when you use eval. (The good news is that you don't have to parse all of Perl. Make that great news.)

What you can't do easily is manipulate code that's already been parsed or compiled. Sure, you can manipulate the symbol table and examine things, if you know the relationships between and representations of Perl's internal data structures, but you're at the mercy of binary representations written in C, which can vary between major releases.

The B:: family of modules are not the answer because they exist at the wrong level of representation. It's not their fault—they do the best they can with what they can access—but they're doomed to hacks and workarounds and incompletenesses because of other incorrect decisions.

I've released Pod::PseudoPod::DOM on Github (it needs documentation and more work on XHTML output before it's ready for the CPAN) as part of my work on two Onyx Neon books, Liftoff and the upcoming second edition of Modern Perl: the book. I've written about the reasons why I revised the internals of the PseudoPod parser so heavily (everything is a compiler).

The same reasoning applies to the Perl parsing and compilation process.

If Perl had an intermediate layer between lexing/parsing and producing the optrees which the runtime uses to execute code, and if that intermediate form were a sufficient representation of a program, and if that intermediate form were accessible from C as well as Perl itself, we could solve a lot of problems.

(I've used B::Generate productively. It's difficult to do so. You get to dodge segfaults. You have to become an expert on the internals of the versions of perl you want to use. Note the plurals. Whee.)

In particular, a good macro system (one which is not "Run these substitutions over that code") would be possible. It might also be possible to translate certain classes of Perl code to other languages with substantially more ease, or to identify error patterns, or to perform better syntax highlighting, or to canonicalize the formatting and idioms of code in one fell swoop.

(You still have to deal with XS modules and the BEGIN problem, but you can embrace some ambiguity in the grammar and the abstract representation and still produce a valid and parsed representation even if you have to coalesce two alternatives into a single representation with out of band knowledge. It's not impossible to get 90% of all programs represented perfectly, and another 5% shouldn't be too much more work.)

Unfortunately, a proof of concept would likely take a good hacker a month of work. A solid demo is likely six months of work. The entire project probably represents two years of work.

It's still a pleasant daydream though.

6 Comments

https://me.yahoo.com/a/eWaljkEXsutgnNBICriFMtXpPuhy#76632 | November 10, 2011 1:21 AM

Search on the perl5-porters list archives for messages with the word "macro" on their title around 2006 and you will find some ideas (and patches) exploring that area.

jnareb.myopenid.com | November 10, 2011 4:04 PM

About compiling Perl to binary: what about pp from PAR - Perl Archive Toolkit?

About macros, Devel::Declare, etc.: Isn't how Perl 6 macros are supposed to work?

Darren Duncan | November 12, 2011 7:34 PM

Quoth the article: "If Perl 5 had an intermediate layer between lexing/parsing and producing the optrees which the runtime uses to execute code, and if that intermediate form were a sufficient representation of a program, and if that intermediate form were accessible from C as well as Perl 5 itself, we could solve a lot of problems."

I consider such a feature useful myself. So, I have it as a core feature in my heavily-Perl-inspired Muldis D language for databases. In Muldis D, the string-based source code that users normally program in gets parsed into a collection of "system catalog" data structures that can be read or manipulated like Perl hashes or arrays, or code can be written natively in such structures by other code, and such structures are what is actually compiled into the optrees that are executed. The catalog structures are quite descriptive and maintain all the syntactical details that programmers care about, such that typical string-based code can be generated from them that closely matches the original, but they are easy to manipulate with code as well. One can also more easily generate equivalent code in other languages (such as Perl 5 or 6 or SQL) if they want to, or make arbitrarily different new syntaxes as they like, if they want to. Perl could stand to learn from this effort itself, where that would be useful.

chromatic replied to comment from Darren Duncan | November 14, 2011 12:02 PM

Exactly. Any general purpose programming language designed in the 21st century needs this feature.

rurban.myopenid.com | November 16, 2011 3:52 AM

I don't really understand why someone would shudder on plain B::C, compared to B::CC.
If I would rewrite from scratch the compiler it would still look the same as now.

For your hook requests:
Besides B at CHECK-time there's also MAD and -u.
If you are more comfortable with xml than perl MAD is for you.

If you want to skip the optree generation and optimization at all,
you can overwrite the new*OP functions but this is *much* more work than with B.
Because a lot of perl logic is not in the parser but in our op generation. And you still have
the bigger problem with the data, SV,AV,HV,... Ops are easy compared to the data.

For a macro system we need parser hooks of course. B is too late.
perl -P on the line level was nice, but limited.
For a better macro system on the statement/expression level we'd need a AST tree-representation which B certainly is not good at. But zefram's work is good enough for the start.

chromatic replied to comment from rurban.myopenid.com | November 16, 2011 10:40 AM

I'm somewhat familiar with MAD, but you need to compile it in, and (to my knowledge) you can't manipulate it from a program as you could an AST.

You're right that half the problem is in op generation. The coupling between parsing and op generation is troublesome; I ran into that when I added the class and method keywords.

What the Perl 5 Compiler Modules Could Have Been

Tags:

6 Comments

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry