Per my experiments with parallelism in Perl test suites, I've adopted several patterns. One such pattern allows me to manipulate the filesystem in a parallel-friendly way.
A lot of my code handles batch processing: fetch some data, sort it into various logical buckets, manipulate the contents of each bucket, then produce some sort of output for everything that made it that far. While most of these steps only manipulate active data in the queue, some steps require me to read and write files in the filesystem—see Pod::PseudoPod::Book (soon to be on the CPAN) for example.
As with any parallelism, multiple units of execution contending over the same single shared resource is an exercise in conflict, or at least complicated locking.
For simple needs, File::Tempdir is great. You can create a temporary directory with a lifespan tied to the object representing it. When that object gets destroyed, its destructor removes the temporary directory.
I needed something more. I wrote the very silly, very simple
Tempdir
solely for one project's test suite:
package Tempdir;
use Cwd;
use autodie;
use File::Temp;
use File::Path;
use base 'File::Temp::Dir';
sub new
{
my ($class, %options) = @_;
my $self = File::Temp->newdir;
@{ $self }{ keys %options } = values %options;
$self->{original_dir} = cwd();
chdir $self->dirname;
File::Path::make_path( @{ $self->{mkdirs} } );
bless $self, $class;
}
sub write_file
{
my ($self, $name, $contents) = @_;
open my $outfh, '>', $name;
print {$outfh} $contents;
}
sub DESTROY
{
my $self = shift;
chdir delete $self->{original_dir};
$self->SUPER::DESTROY( @_ );
}
1;
Like File::Tempdir
, creating a new Tempdir
object
creates a temporary directory. In addition, it saves the current working
directory and chdir
s to the new temporary directory. Because I'm
careful to use only relative paths within my code (business requirement: prefer
running multiple related instances of a project on a single machine to
separate virtual machines), as long as the relative necessary files and
directories are present, everything continues to work correctly. (Also because
this temporary directory manipulation happens at runtime, the test file's
connection to the work queue is already in place, so chdir
works
just fine.)
If you provide the constructor a mkdirs
key with an array
reference as its values, the object will create, relative to the temporary
directory, additional subdirectories of arbitrary depth. I also added a very
simple convenience feature to write a file. I haven't needed more than this
yet:
# create the storage directories for topics/2
my $tempdir = Tempdir->new(mkdirs => [ 'sites/Bravo', 'sites/Bravo/css' ]);
...
$tempdir->write_file( $css->filepath, $css->contents );
When $tempdir
goes out of scope, all of these files and
directories go away. Even if I were to run a hundred instances of the same test
file simultaneously, they would all run successfully because they do not
interfere with each other.
Though I chose an OO interface for this behavior, I prefer a higher-order interface in some ways. I'd like to be able to write:
within_tempdir mkdirs => [qw( some list of directories )]
{
# do something
...
};
... but I haven't convinced myself quite yet that it's an improvement.
Certainly it has the potential to be more correct, as nested lexical scoping
has a better chance of applying and unapplying chdir
calls in the
correct order (it behaves more like properly associated
pushd
/popd
calls in bash), but Perl 5's limited
abilities for parameterization of these thunks is clunkier than it ought to be.
I could experiment with an interface where you specify parameters to
import
which produces and exports a partially applied function,
but the OO version is good enough for me for now and continues to stay out of
my way.