Why I Run Tests on Install

| 4 Comments

Jonathan Swartz makes a polemic statement:

cpanm and perlbrew should not run tests by default.

His points are reasonable, but his complaints are mostly about side effects and not the real problem. (I should clarify: the real problem I encounter.)

If running tests slow down installs, speed up the tests. (Do you want to get the wrong answer faster? Easy: it's 42. No need for a quantum computer to do the calculation in constant time. This algorithm is O(0).)

If running tests exposes the fragility of the dependency chain, improve the dependency chain.

If dependency test failures prevent the installation of downstream clients... this is a weakness of the CPAN toolchain. A well-written test suite for a downstream client should reveal whether bugs or other sources of test failures in a dependency affect the correctness of the client.

Note the assumptions in that sentence.

Anyone who's experienced the flash of enlightenment that comes from working with well tested code and who's shared that new zeal with co-workers has undoubtedly heard the hoary old truism that testing cannot prove the complete absence of bugs. It's no less true for its age, though it's also true that good testing only improves our confidence in the correctness and efficacy of our code.

For me, a 95% certainty that my code works and continues to work for the things to which I've tested it is more than sufficient. I focus on testing the things I'm most likely to get wrong and the things which need to keep working correctly. (I don't care much about pixel-perfect placement, but I do care that a book's index uses the right escapes for its data and markup.)

Without tests running on the machines themselves in the environments themselves where I expect my code to run, I don't have that confidence.

Put another way, I'm either not smart enough or far too lazy to want to attempt to debug code without good tests. That's why I write tests, and that's why I run them obsessively. That's good for me as a developer, and you're getting the unvarnished developer perspective.

I also care about the perspective of mere users. (Without users, we're amusing ourselves, and I can think of better ways to amuse myself than by writing software no one uses.).

Yes, an excellent test suite can help a user help a developer debug a problem. Many (most?) CPAN authors have had the wonderful experience of receiving a bug report with a failing test case. Sometimes this even includes a code patch.

Not all users are developers of that sort, nor should they be.

The CPAN ecosystem has improved greatly at automated testing and dependency tracking, but we can improve further. What if we could identify the severity of test failures? (We have TODO and SKIP, but they don't convey semantic meaning.) What if we could identify buggy or fragile tests? (My current favorite is XML::Feed tests versus DateTime::Format::Atom because it catches me far too often, it doesn't affect the operation of the code, and it's a stupid fix that's lingered for a few months.) What if the failures are transient (Mechanize relying on your ISP not ruining DNS lookups for you) or specific to your environment (a test suite written without parallelism in mind).

As Jonathan rightly implies, how do you expect an end-user to understand or care about or debug those things?

I'm still reluctant to agree that disabling tests for end-user installations is the right solution. I want to know about failures in the wild wider world. I want that confidence, but I can't bring myself to trade away that confidence for the sake of a little more speed of installation.

Yet his point about lingering points of fragility in the ecosystem are true and important, even if the proposed solution of skipping tests isn't right. Fortunately, improving dependency management and tracking and use and testing can help solve both issues: perhaps to the point where we can run only those tests users most care about and identify and report material failures in dependencies.

4 Comments

Not running tests on installation has the nasty side effect of an end user encountering failure at some indeterminate time in the future and having very little clue about the nature of the failure. Testing on install at least lets whoever is doing the installation know up-front that there may be problems and by requiring those tests, a human being is needed to make the call on whether or not to force the installation. This is a good thing.

Though, I wonder if there is some value in making tests an installable component and providing a tool to run those tests? At least then, the end users would have another tool to help them diagnose problems they may encounter (like when a BOFH does a force install because he's too lazy to figure out why some tests are failing). I haven't imagined very far on this, but it would look something like this:

    User: I'm having trouble with module XYZ
    Us:   Have you tried running the test suite for that module?  
          Type "test_module XYZ" to do so.
    User: Ah, no.  Let me try that
          ... time passes ...
    User: It fails test 53.
    Us:   Run that test individually with "test_module XYZ 53" to see what's up.
    User: Ok.
    User: Ah, it's failing because an external XML lib isn't installed.
    Us:   So, you need to install that or have your sysadmin install it, 
          then you can run the tests again to make sure.
    User: thanks!

App::Reprove is my solution for testing an already-installed module. You type:

reprove Foo::Bar

... at the command line, it tries to find the appropriate distribution on CPAN, downloads the MANIFEST, greps it for files matching m{^t/}, downloads those into a temporary directory and points App::Prove in their direction.

It's not perfect - some distributions may rely on files outside "t" to pass the tests. And sometimes it may need additional hints to find the correct distribution (e.g. version, cpanid). But it seems to work in many cases.

A solution for that is on my todo list for the QA hackathon, but I can't make any promises. In particular, the problem tobyink pointed out is tricky to solve, and probably requires metafile support or some such.

tobyink, Leon, that sounds nice. I am not sure it needs to be solved for the general case. It can be a strong recommendation that tests should not assume any file present outside of t/ or that tests should have a mode to 'test against installed versions'.

Then some CPAN Testers could setup smokers that would test each module that way too and start reporting which ones fail in that case.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on January 31, 2012 10:41 AM.

Speed up Perlbrew with Test Parallelism was the previous entry in this blog.

A Practical Use for Macros in Perl is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?