One of the satisfying aspects of writing an opinionated book like Modern Perl is writing a section like Avoid Method-Function Equivalence. Explaining to a novice programmer a potential pitfall and how to avoid it always seems to me like reducing the amount of potential misery in the world.
That's satisfying.
I've been revising a proof of concept document categorization system into shape for the past year, by adding tests and refactoring and cleaning things up and even adding features. Every week it gets a little bit better, and it's fascinating to discover the patterns of this style of programming. (It's related to debuggability-driven design.) I've enjoyed the experience of watching code get more general and useful and powerful even as that's meant shuffling around code and concepts far beyond the initial design. While there are still messes (what working code doesn't have a mess somewhere?), the code has a goodness to it.
Just when you get a big head, the universe punishes you for your unwarranted hubris. (Annie Dillard once wrote "I no longer believe in divine playfulness." Sometimes "divine antiauthoritarianism" is more like it.)
Monday night, my business partner found a bug. We have a categorization system and several topics into which these documents could find themselves. We added several new categories last month, and I had to revise the sharing system such that documents in one cluster of categories never appeared in other clusters. (Think of it this way: you have a newspaper and want to group articles about food, television, movies, and books in a Life and Culture section and articles about basketball, lacrosse, and hockey into a Sports section, but you never accidentally want an article about food to show up in the Sports section or an article about the felonious tax evasion of Kenny Mauer to show up in the Life and Culture section.)
One line of filtering that's easy to explain to users is keyword filtering. Any article in this topic (food, television, books) must contain one of these keywords: food, television, cuisine, literature, novel, bestseller, author. You get the picture.
Monday's bug was that documents in a single cluster which obviously belonged to a single topic ("Which Television Shows Won't Be Back Next Season", for a fake example) within a cluster showed up as belonging to the cluster as a whole ("Life and Culture") and not the topic within the cluster ("Television").
Fortunately I had most of the necessary scaffolding to build in debugging support. I expected that the keyword filtering was to blame, whether missing the appropriate keywords or not applying them appropriately. (I wondered if the system used a case-sensitive regular expression match or didn't stem noun phrases for comparison appropriately.)
Turns out it was my silly mistake.
All of this filtering for validity and cross-topic intra-cluster association
is in the single module MyApp::Filter
. This started life as a
couple of functions that didn't belong elsewhere. As I moved more and
more code around and defined the filtering behavior more concretely, it grew
until it made more sense to treat these functions as methods. It's not an
object yet. It may never become an object; it manages no state. Yet I changed
its invocation mechanism from:
=head2 make_bounded_regex
Turns a list of arguments into an optimized, case-insensitive regex which
matches any of them and requires boundaries at their ends.
=cut
sub make_bounded_regex
{
return unless @_;
my @keywords = map { s/\s/./; $_ } @_;
my $ra = Regexp::Assemble->new( flags => 'i' );
my $re = $ra->add( map { '\b' . $_ . '\b' } @keywords )->re;
return qr/$re/;
}
... to:
sub make_bounded_regex
{
my $class = shift;
return unless @_;
...
}
I made all of these functions into methods in one fell refactoring swoop.
(Why not? Be consistent! Do more than the bare minimum! Eat your vegetables!) I missed one place which called make_bounded_regex()
:
sub _build_keyword_filter
{
my $self = shift;
my $kw = $self->keywords;
return unless @$keywords;
return Feedie::Filter::make_bounded_regex( @$keywords );
}
... such that the first keyword (and usually the most important, because
that's what users put in first) becomes the $class
parameter to
the method. Because it's a class method, nothing ever uses $class
,
so there's no error message about wrong package names.
The tests don't catch this either because of the distribution of test data. (Obviously a mistake to rectify.)
Sure, a language with integrated refactoring support (you don't even need an early binding language with a static type system to get this) could have shown me the error right away. That's one thing I do like about Java. Sure, you need that scaffolding to get anything done, but it does occasionally help you not write bugs.)
What bothers me most of all is that Perl itself has no means by which it could even give an optional warning when you treat a method as a function or vice versa. You don't have even a runtime safety net here.
Warnings will never replace the need for programmer caution, but bugs happen. Bugs always happen. I keep the error log as squeaky clean as possible, and warnings have caught a lot of bugs and potential bugs even during testing, sometimes in our deployed software.
In lieu of warnings though, the best I can do is document my mistakes and explain why they make them in the hope that I won't make them again and you'll be more cautious than I was. (At least this one was easy to fix.)
I guess you could make something like
part of a function → class method transition, to be taken out at a later date, as insurance against incomplete refactoring. But that does not exactly “solve” the problem.