I spend a lot of time thinking about how to prevent bugs in Parrot. My first contribution to the project was a patch in late 2001 to make an essential Perl 5 program used in the build compatible with Perl 5.004. (My, how times change.) I've spent countless hours in the intervening seven and a half years helping the project become correct, complete, viable, and competitive.
Many of my opinions about the maintainability and sustainability of software projects come from experiences with Parrot (sometimes to the chagrin of people who don't know the other projects I can't talk about which have similar characteristics).
Fiddly Bits of Parrot Not Always Easy to Write Correctly
Parrot uses pervasively a data structure called a PMC,a PolyMorphic Container (or Parrot Magic Cookie). A PMC represents anything that's not a primitive value -- anything more complex than an integer, a floating point value, or a string. In Perl 5 terms, a PMC resembles an SV. Don't take that line of thinking too far; PMCs take the good parts of SVs and avoid the scary, complex parts of SVs.
Because Parrot hasn't quite managed to get rid of C entirely yet (see the Lorito plan for more about that), we have several dozen core PMCs written in C.
A PMC has several well-defined behaviors which forms the vtable interface. These are common operations that any PMC should be able to perform: get a scalar value, set an integer value, access a nested PMC, invoke the PMC as a callable function. Not every PMC performs every defined vtable function, but unimplemented functions produce Parrot exceptions rather than interpreter crashes.
Additionally, most PMCs have attributes. Think of a PMC as a class, with instances of that PMC as objects and PMC attributes as instance attributes and vtable functions as instance methods, and you have a conceptual understanding which works at a high level.
Because of our current use of C as the PMC declaration language, PMCs need to understand their memory management characteristics. In other words, if your PMC has two INTVAL attributes and one PMC attribute, the PMC init
ializer (like a constructor, in OO terms) needs to allocate enough memory to store these three attributes. Similarly, the PMC's garbage collection mark
vtable function needs to be able to mark any PMC stored as an attribute as live. The PMC's destroy
vtable function (a destructor, of sorts), needs to release the memory allocated for attribute storage back to the system.
(Don't you have a garbage collector?, you may ask. That's a good
question. We could let the garbage collector manage the lifecycle of
all of these pieces of memory, but they're already attached to GCable elements,
so we don't need to mark or sweep or trace them. The
malloc
/free
memory model works here well enough, even
though we use memory pools to avoid the costs of
malloc
/free
.)
Why Fiddly Bits are a Problem
Thus to write a PMC without any garbage collection errors, without any memory leaks, and without any random corruption waiting to happen, you had to remember several steps. In practice, people writing their own custom PMCs copied and pasted behavior from an existing PMC, then refactored it until it did what they wanted.
I spent a couple of weeks reading every line of every core PMC in Parrot. I fixed a lot of bugs. I can spot GC and memory bugs in patches. The problem is that I don't scale and you can't get the experience I have without going through all of the bugs I've gone through -- and if I never read your patch, you may still have that bug.
Properly Encapsulated Complexity
Julian Albo and Andrew Whitworth (and several other Parrot developers) made an improvement recently in this area.
PMCs with attributes need to declare them. We use a mini-language built around C to define PMCs. For example, the PMC which represents an object in Parrot (the Class
PMC) has two attributes, a PMC which represents the class of the object and a PMC which contains the instance variables of the object. The code looks like:
pmclass Object need_ext {
ATTR PMC *_class;
ATTR PMC *attrib_store;
/* vtable entries go here */
/* PMC methods go here */
The PMC to C conversion step creates a C struct to hold this PMC attribute data:
/* Object PMC's underlying struct. */
typedef struct Parrot_Object_attributes {
PMC * _class;
PMC * attrib_store;
} Parrot_Object_attributes;
Thus at Parrot's compilation time -- when we compile the Parrot virtual machine -- we know how much memory to store the attributes of each PMC. We know which PMCs have attributes (not all do). We know which PMCs need to mark their attributes specially (this one does, as its attributes are GCables and not primitive values).
Julian's idea was to store the size of the attribute structure in the PMC structure. When allocating a new PMC, the PMC initialization code also allocates memory to contain the PMC's attributes and attaches it. Thus all of the bookkeeping code in PMC init
vtable functions can go away. When destroying an unsed PMC, the PMC destruction code can free this memory. Thus all of the bookkeeping code in PMC destroy
vtable functions can go away.
We can even get rid of a special PMC flag value which meant something to the garbage collector but was fiddly to get right, because people often forgot to enable it.
This new code is obvious to prove correct. It either works or it doesn't. It's one codepath to examine and patch, not dozens of core PMCs and countless other PMCs existing now or in the future. This reduces the amount of code people need to write and reduces the amount of code existing in our system.
We've moved the internal bookkeeping mechanism from the user-visible portions of Parrot. If you want to hack on the GC, feel free -- but most people shouldn't have to. They shouldn't even have to know how it does what it does. (That won't hurt, but they shouldn't have to know the mechanisms by which it does what it does.)
That's one principle of software development I always encourage. Encapsulate confusing or dangerous or difficult code behind a nice interface. Now you don't have to worry about doing the wrong thing because you don't know how to write code which does the wrong thing. If you don't write any code at all, Parrot will do the right thing for you.
Yes, we changed the way you define PMCs -- but tell me that this isn't an improvement for everyone. That's a principle of modern Perl I want to encourage.
"PMCs take the good parts of SVs and avoid the scary, complex parts of SVs."
Given you know enough about SVs to know they're scary and complex, maybe you could make some notes for the perl5 docs or on the corehackers wiki or ... somewhere, anywhere ... that explains the scaryness and complexity a bit better for those of us who don't yet? You could even take the opportunity to squeeze in a few notes about why you think PMCs are better, and maybe perl5's guts can learn a few things from parrot's research work just like the userland is learning a few things from perl6's ...