Scraping the HTML output of a web application to see if your actions produced the right results is messy. It's also the most accurate way I know of to verify that your application behaves correctly from a user-initiated request to the server to a user-visible response.
I've used modules like Test::WWW::Mechanize
and Test::WWW::Mechanize::Catalyst
with a fair degree of satisfaction. I appreciate how they simplify the business
of setting up a local server, making requests, filling out forms, and following
links. I'm less satisfied with the methods content_contains()
and
,content_like()
for testing the presence substrings within the
HTML output. When the tests pass, these methods work pretty well. When the
tests fail, debugging is often tedious. I find myself writing code like:
sub test_index
{
my $ua = get_ua();
$ua->get( '/stocks' );
exit diag $ua->content unless
$ua->content_contains( 'Become a Great Investor',
'/stocks should redirect to main page');
}
... and then removing those statements before I check in the passing code. That's tedious.
Besides improving the diagnostic messages, I'd like to check my substrings against only a subset of the produced HTML. There's no reason I need to worry about the navigation of the site (which is always the same and tested elsewhere) or the chrome of the particular page (also repeated).
I could cut off the UI layer and test that the values passed into the templates are appropriate, but that couples the tests to the templates and means I have to test the templates on their own anyhow. That's a mess.
I could instrument the application to render only a fragment of the whole template when given a special parameter, but that's extra code in the application I have to maintain and test.
What I'd rather do is give the test method some sort of selector (XPath, CSS) to grab a single HTML element out of the DOM and run the comparison against the contents of that element and its children.
You can accomplish this in multiple ways. I wanted to try out the use of this approach, so I hacked up a little test. This is not clean. You should probably not do this unless you want to maintain your own code. I might change this API. With that said, I like the results.
I have a small test library which handles the busy work of setting up a
SQLite database in memory with DBICx::TestDatabase.
It also loads Test::WWW::Mechanize::Catalyst
and swaps its schema
for the test schema. (I could do this from a separate initialization
file, but I haven't done that yet.)
This test library now monkeypatches
Test::WWW::Mechanize::Catalyst
:
package Test::WWW::Mechanize::Catalyst;
sub dom_id_contains
{
my ($self, $id, $string, $desc) = @_;
my $dom = Mojo::DOM->new( $self->content );
my $text = $dom->at( $id )->content_xml;
local $Test::Builder::Level = $Test::Builder::Level + 1;
my $status = Test::WWW::Mechanize::contains_string( $text, $string, $desc );
::diag $text unless $status;
return $status;
}
It's a little messy, but it works. (In particular, I dislike the
::diag()
call, but it's fine for a proof of concept.)
This code takes the current context of the most recent request, creates a Mojo::DOM object, then uses the provided identifier as a CSS selector to find a node within the resulting DOM. It stringifies that node and its contents, then matches the provided substring against that stringified content.
The rest of the code makes this test method conform to the expected interface of other test methods.
Using this code has simplified both my testing and my debugging:
$stock->update({ fcf_average => 0.20 });
$ua->get( '/stocks/AA/view' );
$ua->dom_id_contains( '#textual_pros', 'making money at a great rate',
'... good results should be in the pros column' );
$stock->update({ fcf_average => -0.02 });
$ua->get( '/stocks/AA/view' );
$ua->dom_id_contains( '#textual_cons', 'not making money',
'... poor results should be in the cons column' );
The 15 minutes I spent coding this (I first tried XPath, but CSS selectors are so much nicer) were worth it to prove the merit of this idea. What's left is implementation.
This code would be even easier if the contains_string()
method
provided diagnostics, but I can understand why it doesn't.
Ideally this could be a role on Test::WWW::Mechanize::Catalyst
which maintains its own Mojo::DOM
DOM and clears that whenever a
request occurs. It should also be more aware of Test::Builder to manage
diagnostics in a cleaner way.
The biggest drawback of course is monkeypatching. The inheritance relationship which eventually lets Mech and Mech::Catalyst work is incompatible with things like Mech::PSGI. Monkeypatching one of these modules doesn't necessarily work for monkeypatching another.
This is a problem of object orientation and components that needs much more thought to solve well, but for now my tests are simpler and easier to maintain, and I'm comfortable enough with this little bit of mess until I find a better way to clean it up.