I've been adding a test suite to a young project over the past couple of weeks. (Seems like a good use of my skills.)
Going from zero tests to the basic "does it compile?" test is a huge improvement, the same way that going from zero tests for a feature to one test for a feature is a dramatic improvement. You do get more value for each test you add, but novice testers don't always realize the nuance that there's a point of diminishing returns with test coverage.
Like code, tests have a cost. Every additional line of code is code that has to be maintained. If the value of the feature that enables is greater than the cost of maintenance, you've made a wise investment. Sometimes the opposite is true.
One of the most dramatic examples of cost I've seen in the past couple of weeks is the cost of time spent running the test suite. Every new test file added to the system adds about three seconds to the length of time required to run the entire suite. That's almost entirely setup time, dominated by disk IO. (The IO-heavy setup means that running the test suite in parallel actually takes longer than running the tests serially.) After two weeks of work, the test suite has gone from a few assertions which run (and fail) in a couple of seconds to over 1600 assertions which run in somewhere between 70 an 80 seconds.
The amount of coverage is decent, and the ability to write code and to change things without fear of breaking important features is very nice, but the longer the test suite takes to run, the less likely people are to run it before making commits. (This has happened.)
As my colleage Jim Shore wrote, a ten minute build is the golden standard for deployment, testing, and automation quality. (If I were to add anything, I would suggest that you should be able to check out the project on a fresh machine and generate a deployable version—completely tested—within ten minutes.)
Automation is step one. Test quality is another step—and getting the speed of the test suite back under 20 seconds is a goal for sure.
Sounds like it might be worth reading the data in first, then forking to run each test. Do any of the test architectures support that?
Also, if you haven't seen it, IncPy (http://www.pgbovine.net/incpy.html) is a hugely awesome idea that could hold promise in testing situations. Developers could repeatedly & incrementally run only the tests that need re-running after making a small change, then at the end of a dev cycle, run the whole thing turning memoization off. I've never tried IncPy, and it looks like it might have gone stale, but what a cool idea.