The heart of every successful agile or iterative process is thoughtful measurement and refinement. This requires measurement. In software development terms, you might ask "Can we deploy features more quickly?" or "Can we provide more accurate estimates?" or "Can we improve quality and reduce defects?"
In business terms—especially in startups and other small businesses searching for niches and customers and revenue—you might ask "How can we improve customer engagement?" and "How can we improve our rates of visitor to paying customer conversion?"
I've been experimenting with something called cohort analysis lately. The results are heartening. In short, you instrument your code to record notable user events. Then you analyze them.
I started by adding a single table to my database:
CREATE TABLE cohort_log
(
id INTEGER PRIMARY KEY AUTOINCREMENT,
usertoken VARCHAR(255) NOT NULL,
day INTEGER NOT NULL,
month INTEGER NOT NULL,
year INTEGER NOT NULL,
event TEXT(25) NOT NULL,
notes VARCHAR(255) DEFAULT ''
);
A user may generate multiple events. Every event has a canonical name. I haven't made these into a formal enumeration yet, but that's on my list. Every event has a token I'll explain soon. Every event also has a notes field for additional information, such as the user-agent string for the "new visitor has appeared" event or the name of the offending template for the "wow, there's a bug in the template and the system had to bail out this request!" event.
(Separating the timestamp into discrete components is a deliberate denormalization I don't necessarily recommend for your uses. There's a reason for it, but I won't tell you which side of the argument I argued.)
I use DBIx::Class
to help manage our data layer, so I have a CohortLog
class. The
resultset includes several methods to help generate reports, but it also has a
special method to insert a new event into the table:
=head2 log_event
Given a hash reference containing key/value pairs of C<usertoken>, C<event>,
and optionally C<notes>, logs a new cohort event. Throws an exception without
both required keys.
=cut
sub log_event
{
my ($self, $args) = @_;
do { die "Missing cohort event parameter '$_'\n" unless $args->{$_} }
for qw( usertoken event );
my $dt = DateTime->now;
$args->{$_} = $dt->$_ for qw( year month day );
$self->create( $args );
}
This automatically inserts the current (timezone-adjusted) time values into the appropriate columns. (Again, a good default value in the database would make this work correctly, but we're sticking with this tradeoff for now.)
I added a couple of methods to the Catalyst context object so as to log these events:
=head2 log_cohort_event
Logs a cohort event. At the end of the request, these get cleared.
=cut
sub log_cohort_event
{
my ($self, %event) = @_;
$event{usertoken} ||= $self->sessionid || 'unknownuser';
push @{ $self->cohort_events }, \%event;
}
=head2 log_cohort_template_error
Turns the previous cohort event into a template error.
=cut
sub log_cohort_template_error
{
my $self = shift;
my $template = $self->stash->{template};
my $page = $self->stash->{page} || '';
my $event = @{ $self->cohort_events }[-1];
$event->{event} = 'TEMPLATEERROR';
$event->{notes} .= $template . ' ' . $page;
}
=head2 record_cohort_events
=cut
sub record_cohort_events
{
my $self = shift;
my $events = $self->cohort_events;
my $cohort_log_rs = $self->model( 'DB::CohortLog' );
for my $event (@$events)
{
$cohort_log_rs->log_event( $event );
}
@$events = ();
}
The most important method is log_cohort_event()
, which takes
named parameters corresponding to the cohort's data. The token associated with
each event comes from the user's session id. (You can see a couple of flaws to
work around, namely that some requests have no session information, such as
those from bots and spiders, and that session ids may change over time. There
are ways to work around these.)
The log_cohort_template_error()
method is more diagnostic in nature. It modifies the previous event to record an error in the template, as there's no sense in recording that a user performed an event when that event never occurred successfully. (Another part of the system detects these catastrophic events and calls this method. Hopefully it never gets called.)
Finally, record_cohort_events()
inserts these events into the
database. This method gets called at the end of the request, after everything
has rendered properly and has been sent to the user. This prevents any error in
the event system from causing the request to fail and it reduces the apparent
user latency.
How does it look to use this logging? It's almost trivial:
=head2 index
The root page (/)
=cut
sub index :Path :Args(0)
{
my ( $self, $c ) = @_;
$c->log_cohort_event( event => 'VIEWEDHOMEPAGE' );
$c->stash( template => 'index.tt' );
}
=head2 send_feedback
Allows the user to send feedback about what just happened.
=cut
sub send_feedback :Path('/send_feedback') :Args(0)
{
my ($self, $c) = @_;
my $method = lc $c->req->method;
return $c->res->redirect( '/users' ) unless $method eq 'post';
my $params = $self->get_params_for( $c, 'feedback' );
$c->model( 'UserMail' )->send_feedback( $c, $params );
$c->add_message( 'Feedback received! '.
'Thanks for helping us make things better!' );
$c->log_cohort_event( event => 'SENTFEEDBACK' );
return $c->res->redirect( $params->{path} || '/users' );
}
These two controller actions each call $c->log_cohort_event
with a specific event string. (Again, these could easily be constants generated
from an enumeration in the database, but we haven't needed to formalize them
yet.) While I considered making a Catalyst method attribute (like
:Local
or :Args
to enforce this logging with an
annotation, we decided that the flexibility of logging an event selectively
outweighed the syntactic concerns of adding a line of code. Only after a user
has actually sent feedback, for example, does the SENTFEEDBACK
event get logged.
Testing for this logging is almost trivial.
Reporting is slightly more interesting, but how you do that depends on how you divide your userset into distinct cohorts.
The last exciting problem is how to detect spiders, bots, and other non-human user agents to exclude them from this analysis. Optimizing the sales and conversion and retention and engagement funnels for automated processes makes little sense. I have some ideas—some of them amazing failures—but that's a story for another time.