I dislike administering systems. If all I ever had to do were to type apt-get update
and have all of my system administration done for me, that would be fine. Unfortunately, I have to administer systems now and then.
Fortunately, the free software world has a lot of people in the same
situation, and a lot of smart people have written useful software to manage
their systems. As a case in point, consider fail2ban, which I'd have had to invent if it
didn't already exist. fail2ban
watches log files for suspicious
patterns and sends traffic from the offending IP addresses to a blackhole. For
example, if some malicious remote machine in a botnet comes knocking at your
SSH server with a dictionary full of usernames, fail2ban
will let
the kernel silently drop all network traffic from that machine for an hour
after the third failed login.
That's all configurable. In fact, you can configure all of the existing rules and add new rules yourself.
I did that the other day on a client's server. Somehow, the Internet at large had decided that a web-based system administration service called phpMyAdmin was running on the server. That meant thousands of attempts to find dozens of versions of phpMyAdmin. (I assure you—there is no PHP running on that machine. phpMyAdmin has security holes? Who would have guessed?) That meant a lot of wasted resources and a lot of useless entries in the log files. (We hadn't yet made it around to monitoring log files for reporting yet, so it was worse than it should have been.)
"Self," I told myself. "You should add a fail2ban
rule to
detect phpMyAdmin scans and drop that traffic."
I did. It was more difficult than it should have been.
fail2ban
uses regular expressions to find individual entries in
log files which represent suspicious access patterns. One line in a log file
represents one event. This is the Unix way. This has been the Unix way for 40
years. It's been the Unix way for 40 years for one reason: it works pretty
well, for the most part. (I like Unix, but I see its flaws
sometimes.)
The web application I intended to secure has an administrative interface
available from /admin
. This makes sense. One of the places you can
install phpMyAdmin is also to /admin
. This also makes a certain
amount of sense.
The routing system in the client's web application redirects all requests
under the Admin controller (the code counterpart to /admin
) to a
catchall action so as not to expose internal details of what is and isn't
available with or without specific authentication credentials. This makes sense
when I think about it one way and doesn't necessarily make sense another way.
(It's not entirely what someone might call RESTful and it's almost certainly a
violation of the HATEOAS concordat. Then again, it's an administrative
interface hidden from the Internet at large behind authentication
credentials.)
The first version of my regular expression looked for all attempts to access
/admin
, /phpmyadmin
, /PhpMyAdmin
, et all
which resulted in a redirection.
Of course, /admin
also redirects real users with real web
browsers to /admin/login
to give them a chance to use a login
mechanism that's not nearly as hateful as the basic authentication dialog
that's been largely unchanged in web browsers since 1994. (You remember 1994.
That's before PHP existed and before Windows machines were on the Internet in
such droves that it made sense to gather a huge botnet of poorly secured
Windows machines to search for phpMyAdmin vulnerabilities. Also you could have
bought AAPL at a deep discount compared to now.)
Unfortunately, my first regular expression matched users going to
/admin
and getting redirected to /admin/login
just as
well as it matched bots going to /phpMyAdmin
and getting
redirected to an error page.
I changed the regular expression. We could also have made
/admin
display a login form to an unauthorized user. We could have
done a lot of things. I changed the regular expression.
The next day, I realized the problem was that the standard Unix mechanism of
logging plain text in a well-understood format and parsing it with regular
expressions (or even a grammar) threw away information and tried to reconstruct
it badly. At the point in the web application where the router received a
remote request and redirected it, the router knows exactly why it is
redirecting the request. It knows that /phpMyAdmin
is an
invalid route. It knows than an authenticated user requesting
/admin
should get redirected to the administrative dashboard. It
knows that an unauthenticated user requesting /admin
should get
redirected to /admin/login
.
Unfortunately, none of that reasoning gets into the Apache httpd-style log
file. It gets a datestamp, an IP address, the URL request path, and an HTTP
status response code. From there, fail2ban
and the regular
expression guess at why that log entry is there.
Guessing what semi-structured data means is unreliable.
Fortunately, fail2ban
is a good Unix program and is flexible
about which log file it scans. I could add another log file to the web
application to write entries only when something makes a request for a path
that's completely unknown; if there's no controller mapped to the request path
prefix /phpmyadmin
, write to the log. That's only slightly more
difficult to create and to configure than it is to explain. You probably
already know how to do it already.
Unfortunately, writing a separate log file only works around the problem. I
still have to write a regular expression to parse lines in that log file so
that fail2ban
will handle them appropriately. That's the Unix
philosophy at work. It works pretty well and it's worked pretty well for
decades. Sure, there are ambiguities, but you can work around them pretty well
too.
Sometimes, though, I tell myself what I think I want is the ability to send structured data as events to a centralized event listener system to which other processes can connect as listeners. I know there are things like systemd and D-Bus in the freedesktop.org specification, but I rewrote the regex because pretty well gets the job done now and I don't expect this system to last 40 years.
(In fact, that sums up Unix pretty well too.)