Simplified HTML Testing with Mojo::DOM and Mech

Scraping the HTML output of a web application to see if your actions produced the right results is messy. It's also the most accurate way I know of to verify that your application behaves correctly from a user-initiated request to the server to a user-visible response.

I've used modules like Test::WWW::Mechanize and Test::WWW::Mechanize::Catalyst with a fair degree of satisfaction. I appreciate how they simplify the business of setting up a local server, making requests, filling out forms, and following links. I'm less satisfied with the methods content_contains() and ,content_like() for testing the presence substrings within the HTML output. When the tests pass, these methods work pretty well. When the tests fail, debugging is often tedious. I find myself writing code like:

sub test_index
{
    my $ua = get_ua();
    $ua->get( '/stocks' );
    exit diag $ua->content unless
    $ua->content_contains( 'Become a Great Investor',
        '/stocks should redirect to main page');
}

... and then removing those statements before I check in the passing code. That's tedious.

Besides improving the diagnostic messages, I'd like to check my substrings against only a subset of the produced HTML. There's no reason I need to worry about the navigation of the site (which is always the same and tested elsewhere) or the chrome of the particular page (also repeated).

I could cut off the UI layer and test that the values passed into the templates are appropriate, but that couples the tests to the templates and means I have to test the templates on their own anyhow. That's a mess.

I could instrument the application to render only a fragment of the whole template when given a special parameter, but that's extra code in the application I have to maintain and test.

What I'd rather do is give the test method some sort of selector (XPath, CSS) to grab a single HTML element out of the DOM and run the comparison against the contents of that element and its children.

You can accomplish this in multiple ways. I wanted to try out the use of this approach, so I hacked up a little test. This is not clean. You should probably not do this unless you want to maintain your own code. I might change this API. With that said, I like the results.

I have a small test library which handles the busy work of setting up a SQLite database in memory with DBICx::TestDatabase. It also loads Test::WWW::Mechanize::Catalyst and swaps its schema for the test schema. (I could do this from a separate initialization file, but I haven't done that yet.)

This test library now monkeypatches Test::WWW::Mechanize::Catalyst:

package Test::WWW::Mechanize::Catalyst;

sub dom_id_contains
{
    my ($self, $id, $string, $desc) = @_;

    my $dom  = Mojo::DOM->new( $self->content );
    my $text = $dom->at( $id )->content_xml;

    local $Test::Builder::Level = $Test::Builder::Level + 1;
    my $status = Test::WWW::Mechanize::contains_string( $text, $string, $desc );
    ::diag $text unless $status;
    return $status;
}

It's a little messy, but it works. (In particular, I dislike the ::diag() call, but it's fine for a proof of concept.)

This code takes the current context of the most recent request, creates a Mojo::DOM object, then uses the provided identifier as a CSS selector to find a node within the resulting DOM. It stringifies that node and its contents, then matches the provided substring against that stringified content.

The rest of the code makes this test method conform to the expected interface of other test methods.

Using this code has simplified both my testing and my debugging:


    $stock->update({ fcf_average => 0.20 });
    $ua->get( '/stocks/AA/view' );
    $ua->dom_id_contains( '#textual_pros', 'making money at a great rate',
        '... good results should be in the pros column' );

    $stock->update({ fcf_average => -0.02 });
    $ua->get( '/stocks/AA/view' );
    $ua->dom_id_contains( '#textual_cons', 'not making money',
        '... poor results should be in the cons column' );

The 15 minutes I spent coding this (I first tried XPath, but CSS selectors are so much nicer) were worth it to prove the merit of this idea. What's left is implementation.

This code would be even easier if the contains_string() method provided diagnostics, but I can understand why it doesn't.

Ideally this could be a role on Test::WWW::Mechanize::Catalyst which maintains its own Mojo::DOM DOM and clears that whenever a request occurs. It should also be more aware of Test::Builder to manage diagnostics in a cleaner way.

The biggest drawback of course is monkeypatching. The inheritance relationship which eventually lets Mech and Mech::Catalyst work is incompatible with things like Mech::PSGI. Monkeypatching one of these modules doesn't necessarily work for monkeypatching another.

This is a problem of object orientation and components that needs much more thought to solve well, but for now my tests are simpler and easier to maintain, and I'm comfortable enough with this little bit of mess until I find a better way to clean it up.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on August 24, 2012 10:27 AM.

Refining Data Collection for Cohort Tracking was the previous entry in this blog.

Testing Catalyst and DBIC with an In-Memory Database is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?