In Perl 5, $0
is the magic superglobal which contains the name
of the program being executed. This is the name you see in the output of
ps
or in the top
utility.
Some clever programs provide several symlinks to the main program and
examine $0
to enable or disable certain behaviors. This is an easy
way to hide the details of execution from users while making those behavior
mnemonic.
I usually don't write those kinds of programs, but this past year I've
written several batch processing programs which have several interdependent
states. For example, one program runs from cron
regularly to run
through a pipeline of behaviors. Data moves through that pipeline; it's
basically one big state machine.
The core of the program is a pipeline manager which runs the appropriate processing stages in order, such that on every invocation, the program moves data through at least one stage and potentially every stage. It doesn't have to move everything through the pipeline all in one invocation, but it does have to make progress on every invocation.
For various uninteresting optimization and locking reasons, I made this program a single execution unit. (I do use asynchronous IO for things like network access, but that's because the program is largely IO bound.) The program also has copious logging of the stage traversal, split between one log which tracks stage transitions and timings and stage-specific log files which have more details on the progress of those stages.
Until a few minutes ago, the easiest way to see the program's current stage
was to tail
the top-level log file. While running some live tests
on a new feature, I found myself with free time and the desire not to
switch back and forth to a tail -f
screen again, so I checked the
documentation for $0
again.
I knew that on certain platforms (GNU/Linux, which makes my life easier) you
can actually write to it. If you do this, you can control what appears
in the output of ps
and top
.
Every stage runs from a closure (shades of Plack):
my $sub = sub
{
my ($self, $config) = @_;
my $log = $self->get_fh_for_step( $config, lc $app );
# show app stage in ps output
local $0 = $app;
my $app = $module->new(
logger => $log,
map { $_ => $config->{General}{$_} } @keys,
);
$app->run;
$log->log( sprintf $message, $app->count ) if $app->count;
};
A loop in the pipeline manager creates a new closure over the name of the module which implements the stage to create a new object for the stage, set up the logger, provide the appropriate configuration, and run the stage. The emboldened code shows the change I made.
Right now, my top
window shows that the image processing stage
has just given way to the report writing stage—and now the program has
exited. In a couple of minutes, everything will start again.
Writing this entry took longer than implementing this feature. Five minutes of experimenting has improved the visibility and monitoring of this program immensely. Maybe it'll help you.
One thing to be aware of is that with 5.14, assignment to $0 on linux changed in a way that could break process-name-based monitoring.
Previously, perl changed the name through the time-honored strategy of mucking around with the process environment space, but as of 5.14, it uses prctl to make the change, which---at least under linux---means that anything that's looking for the original execution name (say, ps -C, or snmpd, or likely any other monitoring tool) will no longer be able to find the process.
To retain the old way of doing things (without having to maintain a custom perl install), you can hack Sys::Proctitle to not use prctl and then use its facilities.
Thanks for the note. I remembered some change in 5.14, but it hasn't bothered me in practice yet.
I learned, about two years ago, that you cannot get the `ps` output to change on Solaris by assigning to $0.
I must've spent two days afterward feeling sorry for myself.
I've used another Unix where that doesn't work either. Maybe it was HP-UX, though maybe it was one of the BSDs. When I miss this behavior, I really miss it!
Assigning to $0 has always been a bit hit-and-miss and decidedly un-portable. But when it works, it's great, and this is a nifty trick.
I've been fortunate in that it's worked for me everywhere I've deployed it, which admittedly is mostly Debian and Ubuntu. We have a FreeBSD server but it's only a static web host and doesn't run any long-running processes. So far, so good.