In Features Perl 5 Needs in 2012, I wrote that Perl 5 needs "compact, native-type data structures".
While I generally like the polymorphism by which I can read external data into a data structure and then treat it like the kind of data I want it to be (string, integer, numeric), that flexibility is never free. Sometimes that flexibility comes at a cost.
I have mixed feelings about the Computer Language Benchmarks Game when used to compare languages, but its microbenchmarks are good targets for profiling data. (I used that strategy to good effect when I worked on Parrot. Certainly Parrot could benefit from other performance improvements, but removing low-hanging bottlenecks to maximum performance helped real programs in measurable ways.)
Earlier this year, I profiled the benchmarks game Perl meteor-contest entry. I didn't expect to find any obvious candidates for optimizing Perl 5, nor did I. I did find results that confirmed some of my expectations.
The program spends a lot of its time manipulating SVs. In fact, about 20% of the runtime of the program goes to this. This is almost exclusively upgrading SVs. (An SV is the basic data type in Perl 5. It can store an integer value, a string value, a numeric value, and more.)
Here's the problem. The profile shows that this program performs almost
65,000 array assignment operations (that's separate from a push
or
a splice
). This ends up in the guts of the Perl 5 function
Perl_pp_aassign
in pp_hot.c, which eventually gets to
this code:
while (relem <= lastrelem) { /* gobble up all the rest */
SV **didstore;
assert(*relem);
sv = newSV(0);
sv_setsv(sv, *relem);
*(relem++) = sv;
didstore = av_store(ary,i++,sv);
if (magic) {
if (SvSMAGICAL(sv))
mg_set(sv);
if (!didstore)
sv_2mortal(sv);
}
TAINT_NOT;
}
Essentially, for every element to assign to the array, create an entirely
new SV, copy in the contents of the old SV (remember, copy semantics!), and
move on. That copy operation ends up in Perl_sv_setsv_flags()
in
sv.c which goes through all sorts of gyrations to figure out what type
of SV the destination should be.
Hold on, didn't that SV already get created? Yes it did, but it's not the right type. It might not be big enough to hold the right data from the source. After some 400 lines of heavily macroed code, it does eventually get the right data.
This program profiles at 1.1 million SV copy operations. In almost 450,000
of those, the source is undef
, so nothing has to happen. In about
580,000 of those, the source is a plain integer value.
An integer is just data. It doesn't even need an SV if it's only ever treated as an integer. It doesn't need copying, because it can happily remain an integer value.
(Yes, if you're concerned that integer could ever overflow the 64 bits you're probably using to store integers, you have to figure out a way to upgrade to some sort of arbitrary sided big integer, but that's very doable.)
It turns out that this program spends a lot of time bookkeeping polymorphic internal data structures to store raw integers. That's not counting all of the time spent manipulating the reference counts of raw integers or checking to see if you need to do anything special to get or store the value of and from a raw integer.
If it were possible to add type annotations to Perl 5 such that an integer could only ever be an integer and never silently upgraded or an array could only contain integers, it would be possible to update this program to get a significant amount of speed. A 10% improvement is within the realm of possibility, if not more. (I find the program a little too dense to skim and explain at the same time I've been digging into the Perl 5 source code, so I wave my hands a little bit.)
Sure, PDL is great and I've used it effectively, and I'm fully capable of translating this code into XS where I can perform these manipulations in C where I can specify their types, but crossing the barrier between Perl and PDL can be costly unless you're very clever, and I don't want to write any more C than I have to when Perl could make my job much more convenient.
Unfortunately, adding typed data structures to Perl 5 is a huge
amount of work. It would probably make the already complicated
Perl_sv_setsv_flags()
even more difficult to follow. It would
require rewriting a lot of the internals. It probably depends on attaching
things like special behaviors to SV data structures instead of the bodies of
Perl 5 operations, but that's also a good thing.
On the other hand, using less memory and going faster is a good thing—and optional typing could open the door to further improvements such as type-directed optimizations and JITting.
I know I said "Perl 5 needs this feature in 2012", but given the amount of work necessary to implement it, I have to back off a little bit to say that it would be nice to have. I hope you can see its value, and I hope it's possible to get it in a version of the Perl language sometime in the next couple of years.
Native arrays are pretty easy to implement actually.
The harder part was const and native hashes. As hashes can be optimized to perfect hashes when being const or study'ied. This is a completely different implementation.
Native arrays not, as they just replace the SV slot with a int/number or string SV.
There is a free bit to be used for the SVpad_TYPED check (0x00020000) for typed lexicals, and then a subsequent ptr comparison with the %int:: or %num:: or %string:: stash pointers.
Work is being done at https://github.com/rurban/perl/commits/typed/av but I haven't pushed anything there yet. Just to the typed/ro branch.
Not exactly the types I was thinking perl needs... personally I was thinking more of an Object Type system. But then I think that's easy...
With those kinds of types, you get a couple of possibilities: find errors (hopefully at compilation time) and perform optimizations (both at compilation time and during runtime). You could say what I wrote about here is "avoid unnecessary boxing and unboxing".