February 2012 Archives

Modern Perl 2011-2012 PDFs Available!

| 1 Comment

Update: ePub and Mobi files are available, and you can read Modern Perl: The Book online now!

We've just put letter and A4 sized PDFs of Modern Perl: the Book online. This is the new edition, updated for 5.14 and 2011-2012.

As usual, these electronic versions are free to download. Please do. Please share them with friends, family, colleagues, coworkers, and interested people.

Of course we're always thrilled if you buy a printed copy of Modern Perl: the book. Yet even if you don't, please share a copy with your friends, tell other people about it, and (especially) post kind reviews far and wide.

We'd love to see reviews on places like Slashdot, LWN, any bookseller, and any other popular tech site.

We're working on other forms, like ePub and Kindle. That's been the delay (along with personal business); the previous edition's Kindle formatting didn't have the quality we wanted, so we're doing it ourselves to get things right. I hope to have those available in the next couple of weeks, but that depends on how much more debugging we have to do.

Thanks, as always, for reading.

The Memoization of Lazy Attributes

| 5 Comments

Great software changes the way you use computers. Great languages, libraries, and tools change the way you code. Moose qualifies as great.

Moose includes a feature known as attribute builders. When you declare the attributes of an object, you can specify default values:

use Moose;

has 'name', is => 'ro', default => 'Binky';

When someone builds a new object of this type, they can provide a name for the object, or get a default name of Binky.

For more dynamic behavior, provide a function reference as the default. For example, if an object needs to know its creation time (and not the time the system started):

use Moose;
use DateTime;

has 'birthtime', is => 'ro', default => sub { DateTime->now };

For even more flexibility, you can use the name of a method as the builder. This allows you to customize the behavior further with a subclass, role, or other mechanism:

use Moose;
use DateTime;

has 'birthtime', is => 'ro', builder => '_build_birthtime';

sub _build_birthtime { DateTime->now };

Overriding this builder is easy (far easier than intercepting all calls to the object's constructor.) As another benefit, you know that because Moose calls this builder during object construction, your object always gets constructed in a consistent, known-good state. You can rely on it being correct throughout the rest of its lifespan. (Obviously you have to avoid poking into it to do the wrong things, but that's up to you.)

This feature in and of itself would be good, but Moose went further to make it great. If you specify the attribute builder as lazy, Moose will only call the builder when someone uses the attribute's accessor:

use Moose;
use DateTime;

has 'first_access_time', is   => 'ro', builder => '_build_firstaccesstime',
                         lazy => 1;

sub _build_firstaccesstime { DateTime->now };

This makes the most sense when calculating an attribute is somehow expensive or the use of an attribute is rare. For example, I have an object which represents the calculation and projection of ten year free cash flow growth of a stock (like earnings, but measures liquid assets and less prone to manipulation via accrual accounting methods). This object is responsible for calculating the projected ten year average growth rate in free cash flow and produces a graph of ten year trailing free cash flow, the trendline for those ten years and ten years in the future, and projected growth curve ten years in the future.

That requires some math.

That math has a lot of interdependencies. Finding a good trendline for earnings which fluctuate means statistical analysis, such as with Statistics::Basic::LeastSquareFit. As it happens, the lazy lsf attribute nicely encapsulates that statistics object. Further, the growth rate is another lazy attribute, as are the sets of points of the trendline and the trend curve.

While the original proof of concept of this code performed all of the calculations in one large function of a couple of hundred lines, the current code is several methods of 10 - 50 lines apiece, with much better calculation accuracy, and better factoring. This is all thanks to Moose's lazy attributes.

The monolithic code was monolithic because I wanted to calculate everything once and only once: growth rate, number of points on the lines and curves, everything. Variables would stay in scope over tens and hundreds of lines because I'd need to use them later.

By factoring individual calculations into their own lazy builders, I can call the accessors when I need attribute data and everything snaps into place behind the scenes thanks to Moose. Oh, and if I've already calculated the data for the object, it's already there, and I don't have to recalculate it.

Lazy attributes give me memoization for free, and I don't even have to think about the data dependency graph of my calculations. It's an emergent property of the source code. That's one more step in expressing what I want to happen and not how to make it happen. Thanks, Moose!

Nagged by a Test Harness

| 1 Comment

I was serious when I wrote Why Test::Harness Should Not Enable Global Warnings.

Test::Harness's behavior here is still nonsense, and that's the politest word I have for it.

Here's a warning from one of my test suites:

Name "SQL::Translator::Schema::DEBUG" used only once: possible typo at /home/chromatic/.perl5/perls/perl-5.14.2/lib/site_perl/5.14.2/Class/Base.pm line 51, <DATA> line 998.

Tell me that's useful information.

I know what the problem is. The problem is this section of code in Class::Base's new() method:

   no strict 'refs';
    my $debug = defined $config->{ debug }
                      ? $config->{ debug }
              : defined $config->{ DEBUG }
                      ? $config->{ DEBUG }
                      : ( ${"$class\::DEBUG"} || 0 );

... which attempts to access a global variable in the caller's namespace. You can get rid of the warning by patching Class::Base to add no warnings 'once'; (patch submitted to RT, in fact) or by declaring that global variable in SQL::Translator::Schema or by changing the way the latter invokes its parent constructor to pass in a default debug parameter. That's all reasonably easy. It's even possible to rewrite this code so that it doesn't trigger this warning.

Do note that I don't use Class::Base directly. I don't use SQL::Translator::Schema directly. No, I use DBICx::TestDatabase which uses DBIx::Class which uses SQL::Translator.

Do note also that this warning doesn't mark the spot where a bug may lurk. Read the code. The logic is fine. It's sound.

Yet every time I run my test suite through the test harness (not prove, which does the right thing), this warning pops up. It's not an error. It's not a bug. It's a visual blip that interrupts my expected stream of success messages. It's in a dependency of a dependency of a dependency, and writing code to shut up that warning is busy work that has little practical value except getting that message out of my test output to stop interrupting me when all I want to know is "Do my tests still pass after that change?"

Sure, maybe it's naughty, naughty, naughty that Class::Base doesn't use the warnings pragma, and I'm certainly not a fan of OO modules poking into package global variables for their configuration information, especially in dependencies, but for goodness's sake, Test::Harness, your nagging is getting in the way of me getting useful things done.

Nonsense is really the politest word for it.

Fear Not the Subroutines

| 1 Comment

One of my motivations in writing Modern Perl: the book came from many years of watching novices learn to program. I've used the "programming is like following a recipe" metaphor countless times, because it's accessible and easy and tactile. Yet programming is also about symbolic computation and the abstraction thereof, and that's not so tactile.

Programming is also craft work. In my mind, that's the foundation of the software patterns mindset: not finding mathematical descriptions and mechanisms of computability but discovering ways to arrange those mechanisms to achieve an aesthetic result while maintaining essential function.

Novices, of course, tend not to have the sort of good taste refined over time by hard-won experiences.

As a case in point, yesterday I found myself having to write code to traverse a tree structure. (It's the HTML/ePub emitter for the table of contents for the Onyx Neon book rendering pipeline, and it's why the free versions of Modern Perl are taking longer than we promised. That, and I had jury duty.) Anyone who's written this code successfully at least twice knows that the most natural approach is the recursive approach.

For various reasons of encapsulation, I have a document object which represents each input file, and a method on these document objects which can render the table of contents:

sub emit_toc
{
    my $self     = shift;
    my $headings = $self->extract_headings;

    return $self->walk_headings( $headings, filename => $self->filename );
}

That's simple and natural, and I'm sure most of you reading this could write walk_headings() in your sleep, knowing that $headings is an AoAoA, where the leaves are heading objects and the branches are nested array references.

For whatever reason, when I first started to write this code, I intended to perform the recursion within emit_toc() itself. I don't know why; it seemed like a good idea at the time. I'd written a couple of lines. When it came time to recurse, I noticed how much extra work I'd have to do to do the right thing in the initial case but avoid doing the wrong thing in the recursive cases.

At that point the natural solution was obvious: I'll introduce a new method which only does the recursion.

Maybe I'm especially stubborn, but I imagined myself as a novice programmer trying to cram everything in to one method. Maybe novices see programming as a lot of heavy lifting busywork brute force to get something—anything—to work that they think putting more characters on the screen is the only visible sign of progress. Maybe those of us who teach novices talk too much about functions "doing one thing and one thing only" and focus on the visible task the functions accomplish and neglect to mention that well-factored functions can just as well perform structural and architectural duties that supplement and ease the work of those visible tasks.

Certainly walk_headings() does one and only one thing: it processes the current level's leaves and branches. Yet only emit_toc() and walk_headings() have to know that, and outside of the class, it might as well not exist.

I didn't stop to think about any of this when I realized I needed another method to do the recursion itself. It was a very natural thing, born from experience in solving this problem many times.

That process fascinates me, and not only because I want to understand how I program and design in order to improve but also because I want to help new programmers improve without having to make as many mistakes and messes as I did.

If you allow users to log in to your system, you need to hash their passwords with a cryptographically secure mechanism. That means not MD5 or basic Unix crypt(3) (though credit goes to OpenBSD for allowing the use of Blowfish with their crypt(3).

It's obvious why hashing passwords is necessary: storing passwords in plain text offers an attack vector by which you can inadvertently expose private user data to the wild world. Even if an attacker somehow gets access to hashed passwords, extracting the plain text password (or its cryptographic equivalent) from the hash is an expensive operation.

Yet the quality of the hash matters. Faster computers can perform brute force attacks against simpler hashing functions. DES and RSA aren't sufficient. MD5 is not sufficient. Even SHA-1—which worked really well until a couple of years ago—isn't sufficient. The modern consensus seems to prefer the Blowfish algorithm as sufficiently secure.

Yet I had deployed code which used SHA-1 to hash sensitive user information.

Upgrading passwords in place is reasonably simple, though. (Here's the part where posting code about a security issue has the potential to expose a really silly bug, in which case thank you for helping me fix my code.)

I have a user model built with DBIx::Class. Its check_password() function (slightly misnamed, but misnamed to integrate with other components) verifies that the given plaintext password hashes to the stored hash value. If so—and if the user has the is_active flag set to a true value—the login can continue. Otherwise, the login fails.

I'd previously extracted the hashing function into a single module which exports a hash() function. This turned out to be a wise approach; it's one and only one place in the code where I can change the underlying hashing mechanism.

That abstraction meant that switching new users to use Blowfish could happen automatically. Upgrading existing users took a little more work:

sub check_password
{
    my ($self, $attempt) = @_;
    my $password         = $self->password;

    # upgrade existing passwords to Bcrypt passwords
    if (old_hash($attempt) eq $password)
    {
        # crypt with blowfish
        $password = hash($attempt);
        $self->update({ password => $password });

        return $self->is_active;
    }

    return unless hash($attempt, $password) eq $password;
    return $self->is_active;
}

When a user attempts to log in, first compare with the old hashing mechanism. If that matches, update the user's password in the database with the new hashing mechanism. Otherwise, use the new hashing mechanism.

This code could be even easier; Crypt::Eksblowfish::Bcrypt returns a hashed password encoded in Base-64 with a special string prepended which contains the password hashing settings. Those magic characters never appear in the encoding mechanism I used for SHA-1 passwords, so only encoded passwords which start with that sequence can use the new hash. Everything else should attempt to match against the old hash.

One drawback of this technique is that it makes what would normally be a read-only operation (compare passwords) perform a write (update passwords), but this is a temporary workaround. When all of the users have logged in (and had their passwords update transparently), I can remove this workaround. Detecting that is easy: check the first few characters of each password for the special prefix.

This technique isn't particularly original or difficult, but it's worked very well for me so far. Better yet, my users haven't noticed at all. They get more security for free with no work on their part. That's a good tradeoff.

Suppose you like Dist::Zilla and cpanminus. Suppose you also need to install the new version of a distribution you're working on locally. (I do this all the time.)

Sure, you can use dzil install, which builds, tests, and installs the distribution. That's too slow though—by the time the normal CPAN shell has loaded its indexes, cpanm has often finished.

I've already spent more time writing this post than it took to write this silly little shell alias:

alias dzinstall='dzil build; cpanm install *.tar.gz; dzil clean'

Sure, you have to manage your distro with dzil and you have to have cpanm configured to use the correct Perl and you can't have any other tarballs in your directory, but this works really well for me.

Experts versus Novices

| 1 Comment

Why I Left Perl is 80% silliness and 20% truth. Some of that truth is "Why did it take seventeen years to make strict mode anything close to a default in Perl 5?" and another part of that truth is "Sometimes you get nagged about unrelated things when you ask a question about Perl".

Not that, of course, you can avoid nagging and insults and bad advice on the Internet in any discussion of programming languages. (Maybe if you only ever program Haskell and can call Brian O'Sullivan and Simon Peyton-Jones on the phone, but while I've done that, I'd feel guilty making it a habit.)

Here's the thing about experts. They're experts because they've made a lot of mistakes building real things and they've learned from those mistakes.

Do you know why most Perl 5 experts don't use inside-out objects? They turned out not to work so well in practice.

Do you know why so many good Perl 5 web developers use PSGI and Plack? They solve real problems elegantly.

Do you know why so many good Perl 5 programmers use automated testing built around Test::Builder? Because it works and it makes them more productive.

Do you know why most good Perl 5 tutorials recommend the use of strict and warnings? Because they ask perl to help you find likely errors and dubious constructs to save you time debugging and to help you write better code.

Do you know how many times your average Perl expert has answered the question "Why isn't my program doing anything?" with "Why aren't you checking to see if your open call succeeded?" Yes, that's an argument for changing Perl 5 so that it has better defaults. Yes, it's a usability problem. Yet it's also something that Modern Perl: the book gets around by recommending the use of autodie.

None of that advice excuses recommending things like strict when they won't help, for example. I've responded to several well-meaning posts on Perlmonks that say "I don't know what your problem is, but you need to use strict and warnings" by asking what that would solve. These pragmas aren't magic pony-colored bandages that make you vomit glitter and candy. (I think that's the Ruby DSL generator called called Hipstr. Download from Github.)

None of that advice necessitates scolding people for not doing things your way. Way back in the mists of time (the late '80s), people like Larry and Randal posted "Here's how you'd do it in Perl!" to Unix administration and programming forae on Usenet.

The tricky part is when someone's obviously doing something the wrong way, say parsing context-free grammars with simple regular expressions. The expert has to find the balance between never condescending ("What is broken in your head that you would ever consider doing such a thing?") and giving the right answer ("Sure, it takes a few minutes to install and read the documentation and learn how to use this tool, but you'll get the right answer and spend much less time debugging frustrating errors none of us wants to debug.")

The novice, of course, has to meet the expert partway and acknowledge that asking for help means, of course, being willing to receive help. Otherwise you're wasting everyone's time. If you want to do that, at least have the decency to explain that you're doing so. (Posting questions without defensive coding squanders a community resource of altruism and good will.)

The best option I've seen is to answer a question directly, and only then explain how to avoid the problem in the first case. Sometimes that works.

I think one of the hallmarks of an expert versus a novice is that the expert knows full well that we always underestimate the time and effort of debugging and so we try to optimize to avoid the need to debug, while the novice thinks that all programming is a matter of juggling magic symbols until things seem like they work and you can escape with only a few bumps, bruises, and scrapes. Thus doing what's obviously extra work now seems like a terrible bargain because you don't know yet everything that can and will go wrong.

Then again, of course I'd say that. I enjoy the act of solving problems far more fun than the act of chasing down bugs (especially bugs I can and should have avoided.)

The Values and Costs of Automation

| 6 Comments

Reini Urban's ExtUtils::MakeMaker make release post demonstrates how to add a few of the features of Dist::Zilla to the why-does-this-still-exist ExtUtils::MakeMaker. (Spoiler: it still exists because it's not worth rewriting everything that uses it and continues to work.)

While I think everything about MakeMaker's implementation is blepharitic and, quite likely, contagious, you might find it surprising that I agree in spirit with the point both Reini and educated_foo make in the post and its comments. In summary:

Automating a task is only worthwhile when the benefit of automation outweighs the cost of doing so.

That should be staggeringly obvious to everyone with more than six months of serious programming experience, but it's not.

I spent a few days getting my head around dzil and still haven't updated all of my releasable code to use it because the act of conversion takes a few minutes. If I have no plans to release new versions of that code any time soon, there's little value in performing the conversion. Similarly, if I had a good release strategy in place before dzil came about, the cost of switching would have to be less than the cost of keeping things the same for things to work out. (You can make the same argument about switching between technologies such as languages or editors.)

Sometimes a technology is measurably better. For me, a git-based workflow using dzil beats all of the alternatives I've tried. The same goes for Plack for deployment and cpanminus and perlbrew for managing Perl installations. Yet I happily used Subversion until the pain of managing branches and merges and repositories outweighed the pain of figuring out git.

With that said, Aristotle is right (as usual): automating away all of the silly little niggly details that are tedious to remember and get right is almost always worthwhile. Whether that's Reini's EUMM recipe or the dzil ecosystem depends on who has to automate things. That's the same reason make test or dzil test or prove -lr t/ is much better than scanning the output of multiple test files and trying to summarize in your mind.

(Fun fact: in my first few public Perl 5 projects I manually removed all of the CVS directories because I didn't know about CVS export, to say nothing of EUMM. In my defense, this was 1998.)

Sometimes you get lucky and find an automation that lets you share smaller pieces of hard work between lots of other people—there's one advantage of dzil over EUMM. Where a user of EUMM might happily copy Reini's code into every Makefile.PL to add those nice features, a dzil user can install a plugin and use it on every project. (... though I didn't see a NYTProf plugin when I looked yesterday, which surprised me.)

Today's CPAN experiment was 15 minutes with Chart::Clicker. I'd heard great things about this distribution before, but had never tried it.

I've been working lately on financial analysis of publicly traded companies. In particular, I've been analyzing trends in the growth of owner earnings, given a ten year window of SEC reports. (Don't worry if you don't know everything that means yet.) The goal of this work is to find a trendline which smooths out yearly ups and downs and gives a good idea of the company's expected growth.

(That number is particularly important if you want to project the intrinsic value of a company into the future to decide the value of an individual share of that company right now. This is very standard Graham/Dodd/Buffett stuff, but it's also specific domain knowledge interesting only to this post as background information.)

My statistics are a bit rusty, so I wanted to see the resulting information before I trusted my calculations. My first instinct was to copy and paste information into a spreadsheet and create a graph there. Yes, I did that manually a couple of times. Then I remembered I have the full power of Perl available.

Chart::Clicker installed easily. Its documentation is a bit on the thin side, if you need to customize things (and I did), but if you poke around at the various components and their methods, you can make sense of things. (In particular, I wanted to change the underlying grid lines to correspond with the data points on the X axis. They're years, after all.)

My analysis code produces a list of values for free cash flow in thousands of dollars over a ten year range. It also uses the least square fit technique to plot a line representing the change in those values. That line should show the trend in values with as much accuracy as possible. While I have ten points for the free cash flow line, I need only two points for the trend line, because it's a straight line.

Chart::Clicker makes it really easy to add two datasets with different numbers of points. (I'm fortunate that the first and last X coordinates are the same.) Here's the code:

use Chart::Clicker;
use Chart::Clicker::Data::Series;
use Chart::Clicker::Data::DataSet;

my $chart = Chart::Clicker->new;

my $fcf_line = Chart::Clicker::Data::Series->new(
    keys   => [ 0 .. $#{ $fcf_values } ],
    values => $fcf_values,
    name   => 'Free Cash Flow (thousands)',
);

my $trend_line = Chart::Clicker::Data::Series->new(
    keys   => [ 0, $#{ $fcf_values } ],
    values => [ $first_y, $last_y ],
    name   => 'Free Cash Flow trendline',
);
my $dataset = Chart::Clicker::Data::DataSet->new(
    series => [ $fcf_line, $trend_line ],
);

$chart->add_to_datasets( $dataset );
my $context = $chart->get_context('default');
$context->range_axis->format('$%.0f');
$context->domain_axis->hidden(1);
$context->domain_axis->ticks( $#{ $fcf_values } );
$chart->write_output( "${symbol}.png" );

A chart contains one or more datasets, and a dataset contains one or more series. Each series corresponds to a line. Populating a series is easy, given arrays of data; keys represents the X axis and values represents the Y axis.

Most of the rest of my code customizes the display of the data. I haven't found the right way to display the X axis yet, so I've elided that for now. I also had to customize the underlying graph lines, as mentioned before. That customization was the only tricky part of using Chart::Clicker, and that only because it took a few minutes to figure out how to do it.

The results are attractive. Here's a chart showing the earnings for Coca-Cola (NYSE:KO) over the past decade:

Free Cash Flow and Trendline for NYSE:KO
Figure 1. Free Cash Flow and Trendline for NYSE:KO.

Chart::Clicker is fast, too. I added this to my analysis step for the 30 stocks in the Dow Jones Industrial Average, and I can't measure the increase in time required to create these images. (This analysis step has network IO as its bottleneck.)

I'm not often wholly impressed by Perl and the CPAN anymore; I expect things to work. I didn't expect things to work as easily as they did today. The whole experiment demonstrates the best the CPAN has to offer.

After a disastrous attempt to write my own templating language as about the third program I ever wrote in Perl (it was the dot-com boom of the '90s, not that that's any excuse), I moved to Template::Toolkit and have been relatively happy with it ever since.

My first real paid programming job was a little GUI app for a customer service group at HP. Customer service agents who needed to escalate to second line support would click on the button for the printer line about which they had to ask a question, and the program recorded the vote, then printed a nice report at the end of the day. It solved a problem. As far as I know, it was still running when I left HP a couple of years later.

A couple of years later, I took a job where we refactored, maintained, and extended a GUI point of sale system.

Since then, I've avoided most graphical programming. Sure, I put together websites for clients once in a while, but most of my work has been emitting the most basic semantically-useful HTML possible such that a Real Designer can manipulate things with CSS (CSS being, of course, one of those so-horrible-it's-almost-good things in that it's the only way to get things done, but you always want to take a shower after you use it, lest you think you start to appreciate it for anything other than its efficacy. See also JavaScript and PHP.).

Most of this meant dropping a big blob of content in a Template Toolkit wrapper in the middle of some HTML, or maybe templatizing some repeated HTML element while iterating over a collection in the toolkit.

Then I decided to redesign a site.

Twitter has its problems (I hope never to understand how or why someone would take a perfectly functional website then try to make it work like a buggy phone app in the same way that I never understood why the first Harry Potter movie hewed so closely to the book it was as boring as a Merchant and Ivory movie and it had wizards in it), but Twitter's Bootstrap CSS framework actually made sense to me, and it let me put together a couple of pages that looked good—far better than the Frankenstein's monster I cobbled together from the "Hey, everything's a blog now, right?" OSWD designs I liked.

(Bootstrap has its problems too, but the worst one is that the Less CSS abstraction layer of CSS is intricately tied to the Lovecraftian bonepile of crazy that is Node.js, because if there's anything I want to do in JavaScript less than write a templating system to perform text substitutions in a cooperative multitasking system, I don't know what it is. It probably involves sharks, skydiving, live volcanoes, and dental work. Yet even only being able to extract repeated colors into named variables and build a static CSS file is a huge improvement, so I put on protective eyeware.)

The experience turned out relatively enjoyable. I had a nice looking wrapper and a decent framework for displaying and managing content.

Then I wanted to change the way I displayed certain elements.

Here's the thing about web programming, or at least the way I'm doing this project. I don't think in terms of pages. I think in terms of components of pages. I have a templates/components/ directory full of reusable Template Toolkit components processed with INCLUDE and PROCESS. The big blurbs of marketing text and instructions and explanations on various pages all live in individual components. Sure, it's a little bit of work to figure out the layout, but this separation of concerns makes editing the site much easier.

It also makes revising the layout more difficult—if changing the layout requires modifying lots of templates with the wrong <div> names and classes and such.

It's possible to write more TT components to abstract away these changes, but the point of diminishing returns appears quickly: TT's syntax and semantics just aren't strong enough to define functions and manage parameters.

Good thing Perl is.

Fewer than 20 minutes after I realized I needed a custom plugin, I had it written:

package MyProject::Template::Plugin::Bootstrap;
# ABSTRACT: basic Bootstrap helpers for the Template system

use Modern::Perl;

use parent 'Template::Plugin';

sub new
{
    my ($class, $context, @params) = @_;

    $class->add_functions( $context );

    return $class->SUPER::new( $context, @params );
}

sub add_functions
{
    my ($class, $context) = @_;
    my $stash             = $context->stash;

    for my $function (qw( row sidebar sideblock maincontent fullcontent span ))
    {
        $stash->set( $function, $class->can( $function ) );
    }

    $stash->set( process => sub { $context->process( @_ ) } );
}

sub row
{
    return <<END_HTML;
<div class="row">
    @_
</div>
END_HTML
}

sub sidebar
{
    return <<END_HTML;
<div class="span4">
    @_
</div>
END_HTML
}

sub sideblock
{
    return <<END_HTML;
<div class="well">
    @_
</div>
END_HTML
}

sub maincontent
{
    return <<END_HTML
<div class="span8 maincontent">
    <div class="hero-unit">
        @_
    </div>
</div>
END_HTML
}

sub fullcontent
{
    return <<END_HTML
<div class="maincontent">
    <div class="hero-unit">
        @_
    </div>
</div>
END_HTML
}

sub span
{
    my $cols = shift;
    return <<END_HTML;
<div class="span$cols">
    @_
</div>
END_HTML
}

1;

This turned my templates into:


[% USE Bootstrap %]

[% row(
    maincontent( process( 'components/content/home_text.tt' ) ),
    sidebar(
        sideblock( process( 'components/forms/login_box.tt' )),
        sideblock( process( 'components/boxes/top_recommendation.tt' ) ),
        sideblock( process( 'components/content/newsletter_text.tt' ) ),
    ),
) %]

This reminds me a bit of Generating HTML from Smalltalk's Seaside or Ruby HAML, except it's less "Wow, lots of tags here!" than Seaside and "So cute people hate you for saying how ugly it is!" than HAML.

I haven't convinced myself I have the right abstractions yet—it's not quite to the point of the semantics I like, it produces its own repetitions, and the idea of returning strings of HTML from a template plugin feels a little wrong to me, but the big advantage is that it's taken a lot of repetitive niggly code and turned it into less code that's much more declarative. The repetition is in the structure of the semantics of the site and not the mechanics of how to produce those semantics.

In other words, this is a step toward thinking in widgets rather than UI primitives.

A Practical Use for Macros in Perl generated several thoughtful comments. While Aristotle Pagaltzis identified the real semantic difficulty with the code I wanted to write (and mentioned the Null Object pattern, which I always keep in mind), Chas. Owens asked perhaps the best philosophical question:

Why not modify add_txn to reject undefs?

The code in review is:

while (my $stock = $stock_rs->next)
{
    my $pe_update = $self->analyze_pe( $stock );
    $stock_txn->add( $pe_update ) if $pe_update;

    my $cash_yield_update = $self->analyze_cash_yield( $stock );
    $analysis_txn->add( $cash_yield_update ) if $cash_yield_update;
}

... and the near duplication obscures (to me) the point of the code. Both Aristotle and Chas. are right—perhaps it's clearer to allow $transaction->add to do nothing when it receives nothing. I write "perhaps" because I see the appeal of that change, but I'm not sure I like it.

As usual with my software, the system has a fundamental design principle: either succeed in full or do nothing. It's fine to skip half of the analysis steps if the data just isn't there. (The project as a whole can succeed if it's only 60% correct; the joy of a margin of error. It's a lot more accurate than that.)

I take this principle to mean that robustness is more important than completeness. Skipping bad data and moving on is perfectly fine. The next run may improve transient errors, and catastrophic errors will require human intervention anyhow.

When these principles translate into design, I prefer to handle errors at the point of detection and not spread error handling throughout the system. All of these analysis methods should return something. When they succeed, they return a hash reference mapping column names to values in a database table. When these methods fail—whether the existing data isn't sufficient to calculate updated values or something else went wrong—they return;. As you well know, that's an empty list in list context and undef in the scalar context of the example code.

Why add nothing to a transaction when I know there's nothing to add? Yes, add() could check that it has nothing to do and do nothing, and that's fine, but it seems like that expands the behavior of add()'s API to include caller errors. Then again, the add() method must check that each hash reference contains a value for the transaction's bound primary key, or it will generate buggy output.

I suspect that both Aristotle and Chas. have in mind Postel's Law:

Be generous in what you accept and picky about what you emit.

The result might look something like:

while (my $stock = $stock_rs->next)
{
    $stock_txn->add(    $self->analyze_pe( $stock )         );
    $analysis_txn->add( $self->analyze_cash_yield( $stock ) );
}

This change has an advantage: it only necessitates a change in the add() method. All of the analyze_*() methods can continue to work as implemented.

Of course, there's a slight performance penalty to doing this. In my case, it's immaterial, but it wouldn't be present with macros. This is an IO-bound application anyhow, and the transaction manager exists to avoid very real, very measured bottlenecks.

Finally, Aristotle's mention of the null object pattern was about real objects, and not methods which return empty lists or hash references. If that's your style, good for you—but it's not mine in this case. While it's not obvious from the small snippets I've posted so far, the responsibility of the analysis methods is smaller in scope than the responsibility of the transaction objects. Coupling transaction management to the analysis methods—in as much that they have to know about transactions to return the right objects—would turn the design of the system inside out. The result would very likely not be an improvement.

A Practical Use for Macros in Perl

| 10 Comments

People occasionally ask for practical examples of macros when I lament the lack of macros in Perl. While I'm usually pleased at the degree to which Perl lets me design code to get and stay out of my way, sometimes its abstractions just aren't quite enough enough to remove all of the duplication available.

(I've been refactoring one of our business projects in preparation for another round of deployment in the next couple of weeks. We could launch without these improvements, but administrative work took almost two weeks longer than the afternoon I'd planned for it, so I decided it was worth my time to reduce technical friction so that further improvements are easier. More users means more work, so why not accelerate that work while I have the chance? I have another longer technical post to write to praise the use of Moose roles for a plugin system and to show off the stupidly-great task launcher, but that's for later.)

I found myself writing two code couplets that were similar enough they triggered my "Hey, refactor away this duplication!" alert. It's extra sensitive, because I know I'll have a few more couplets like this in the very near future:

while (my $stock = $stock_rs->next)
{
    my $pe_update = $self->analyze_pe( $stock );
    $stock_txn->add( $pe_update ) if $pe_update;

    my $cash_yield_update = $self->analyze_cash_yield( $stock );
    $analysis_txn->add( $cash_yield_update ) if $cash_yield_update;
}

The *_txn variables contain objects representing deferred and scoped SQL updates. I'll talk about that at YAPC::NA 2012 in When Wrong is Better.

The general pattern is this: for every stock in the appropriate resultset, call a method in this plugin. The method will return nothing if it fails (or has nothing to do) or it will return data to be added to the appropriate transaction. I have at least two types of transactions available here at the moment, and may have more later: one transaction updates stock data and the other updates analysis data.

I have several options. I could rework the data model so that this stage always only updates one transaction, in which the loop body could instead look like:

{
    for my $method (qw( analyze_pe analyze_cash_yield ))
    {
        next unless my $result = $self->$method( $stock );
        $txn->add( $result );
    }
}

This technique of hoisting the variants into an ad hoc data structure and using existing looping techniques works well sometimes. (I use it in other parts of the system.) It's relatively easy to expand, even though it moves interesting information ("I'm calling the analyze_pe method!") to a place where tools have more trouble finding it. (I search for >analyze_pe when I want to find method calls.) You may have used something similar to define several parametric methods at BEGIN time. It's the same type of pattern, and while Perl provides most of the tools necessary to allow this, it doesn't natively express this pattern well.

I could also change the transaction object's add() method to do nothing when it receives an empty list of arguments. I like that in some ways, but I don't like it in others. I've come down on the side of keeping its invariant (it always takes only one scalar as an object) pure for now. If I change it to take a list of updates, that might be the right time to reconsider this.

What I notice in the code as it stands right now is that the individual variables $pe_update and $cash_yield_update are synthetic variables. They only exist to support the code as written; they're not necessary for the algorithm. If I were to modify this code but only this code, I'd really rather write:

{
    ADD_TXN_WITH( $self, analyze_pe,         $stock, $stock_txn    );
    ADD_TXN_WITH( $self, analyze_cash_yield, $stock, $analysis_txn );
}

... though that syntax doesn't thrill me either. The clearest possibility I see right now is:

{
    $stock_txn->add(    SKIP unless $self->analyze_pe( $stock )         );
    $analysis_txn->add( SKIP unless $self->analyze_cash_yield( $stock ) );
}

... where SKIP does some magic to move to the next statement, not the next loop iteration. (I have some ideas how to write XS to make this work, but that creepy yak needs a shave and some mouthwash.)

The second best option right now is adding a function or method as indirection to encapsulate the synthetic code. I'd rather avoid synthetic code, but at least it reduces the possibility of copy and paste bugs.

For now, with only two steps in this analysis, I'm leaving it as it is. Two repetitions of something this similar set off my refactoring alarm, but I resist the urge for refactorings this small until I see three instances of near-duplicate code.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Archive

This page is an archive of entries from February 2012 listed from newest to oldest.

January 2012 is the previous archive.

March 2012 is the next archive.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?