Perl 5's type system has flaws. Those flaws are fixable (with a supreme act of will, lots of patience for discussion on p5p, and ... years of waiting for the state of the art in writing Perl 5 code to catch up with the historical baggage of a decade and a half of buggy code).
Are they preventable?
One sign of effective design is when people can use the feature correctly without training. Subtle design cues should encourage them toward proper uses and away from ineffective and dangerous uses. My paper shredder has a feed slot too narrow to contain my fingers, so it's unlikely for me to harm myself with the default operation. Of greater interest is the feature by which it refuses to operate if the top section with the blades has tilted — if I have removed that section to clear a paper jam, I don't want the blades to run. Arguably I should turn the shredder off and unplug it (and I do), but the danger is sufficiently great that the design actively protects my tender fingers even if I have forgotten to do so.
I've argued before that the lack of the right way to inspect capabilities of Perl 5's primitives causes bugs. Several design misfeatures combine to cause these problems, however.
People want to know what they can do with objects and references. The desire may be for defensive coding, or it may be to take advantage of genericity and polymorphism. Both are valid uses.
People can know some of this information through runtime type checking and reflection. Perl 5 offers some possibilities here, but it often answers the wrong questions. Worse, performing these checks safely requires a lot of code with a lot of subtleties to allow a lot of rare cases that are extremely important when they occur.
Consider the unfortunate case of UNIVERSAL::can (the CPAN module, not the method). By now, you should know why I believe that calling methods as functions is a mistake. U::c
replaces the default can()
method with a custom variant which warns when invoked directly on an invocant which has its own can()
method.
That's the intent, anyhow.
The logic is simple: if I've overridden can()
and you ignore that by calling UNIVERSAL::can( $instance_of_my_class, 'some_method' );
, you've introduced a bug. This is not an academic, ivory tower concern over purity. I have a fairly popular CPAN module which relies on you not writing buggy code to work properly, and I've had way too many false bug reports that my code doesn't work because of this bug.
Unfortunately, U::c
is unreliable because Perl 5 doesn't give
sufficient information to know how control flow eventually wound up in its
can()
method. The current approach works 80% of the time; if the invocant has an overridden can()
and the caller
of UNIVERSAL::can()
isn't a function or method named can
, it's probably a bug.
That is, it's okay for an overridden can()
to call UNIVERSAL->can()
, because they've probably done so through SUPER::can()
. In all other cases, someone's probably called it directly as a function, because if they've called it as a method, they'd have ended up in the overridden can()
instead.
This is all a workaround for the fact that it's very difficult to tell how any particular invocation happened in Perl 5. Within pure Perl, I know no way of asking "Did a method call end up here?" or "Was this a function call?" If I could tell that, I wouldn't need this workaround.
I could write code which grabs the calling code, dematerializes it to its opcodes, walks the optree until it reaches the appropriate position of the call, then looks for the op which performs method dispatch, and I know how to do all of that, but that requires lots of internal introspection I don't want to write, introduces a few more heuristics which are tricky to get right, will be substantially slower, completely fails for XS calls, and is a lot more work than I want to perform for this task, especially when I could be doing something much more fun. (Trying to help people not write buggy code when they don't realize it's buggy and they don't want to hear it anyway is much less fun than almost anything else.)
The current heuristic has some awful flaws too. Consider this code, inspired by actual code in autobox:
sub gen_override_for_class
{
my $class = shift;
my $can_override = sub
{
my $self = shift;
return $class->SUPER::can( @_ );
}
no strict 'refs';
${ $class . '::can' } = $can_override;
}
autobox
creates classes named SCALAR
, HASH
, ARRAY
, and the like. You can call methods on references of those types. The gen_override_for_class()
function installs a can()
method in those classes which dispatches to the correct package. (If you don't understand the rationale for redispatching, that's fine.)
Unfortunately, the U::c
heuristic fails here... because the generated method is an anonymous function without the all-important name of can
. Yes, it's in the right slot in the namespace, and it's a proper call of UNIVERSAL->can()
, but U::c
gives a warning in this case because it can't tell that this is a method call.
A correct use of methods in Perl 5 causes a warning because code that tries to detect incorrect uses of methods in Perl 5 can't determine if a particular invocation is a method or a function call. People use methods as functions in Perl 5 in this case because getting the method form right is difficult. People use these functions in Perl 5 because getting the type information for primitives is difficult and subtle.
If you believe in irony, autobox
should make all of the introspection easier by allowing you to call methods on primitives, adding genericity and polymorphism where Perl 5 needs it the most.
That's several bugs all jammed together in something I'm not sure I can fix. Perhaps the best approach is to add a warning flag to Test::MockObject to enable U::c
and UNIVERSAL::isa, so that they're not on by default and so that people getting weird behavior from buggy code will at least have the option of figuring out that the bugs are in code that uses methods as functions and not in T::MO
... but I despair, considering the flood of new bug reports.
Some of this problem comes from Python, which also makes little distinction (syntactic or semantic) between functions and methods:
class Foo(object):
def bar(self):
print self, ': bar'
def baz(param):
print param, ': baz'
Foo.baz = baz
foo = Foo()
foo.bar()
foo.baz()
Yes, I deliberately obfuscated the Python code by naming the parameter to baz
param
instead of self
. (Thus I disprove the claim that it's impossible to write unreadable Python.) Even still, Python does get this behavior more correct, in that grabbing the first-class function from either the class itself or from an instance produces a first-class function that knows it's all objecty:
quux = Foo.baz
quux('Not an object')
TypeError: unbound method baz() must be called with Foo instance
as first argument (got str instance instead)
quux = foo.baz
quux('Not an object')
TypeError: baz() takes exactly 1 argument (2 given)
Compare that to Perl 5, where you can slap any old argument in that unspectacular invocant slot and get... well, you get all of the pieces when it breaks.
Sure, at the lowest levels in a VM or a processor core, the invocation mechanism is "shuffle some args around, keep track of the current location in code, then branch somewhere else" regardless of whether you've invoked a function, a method, a coroutine, a continuation, or an exception. That's fine. Stack those turtles as high as you can.
At the language level, however, they're all very different. A language design should encourage people to treat them differently, even if there's only one stack of turtles, else the apparent consistency may be a foolish and tempting consistency which produces subtle inconsistencies. You can't prevent malicious or incompetent people from doing malicious and incompetent things and you shouldn't prevent clever people from doing clever things.
I believe it's possible (and good!) to encourage the rest of us to do smart and safe things.