Don't TSA That Data!

A Vanity Fair article asks Does Airport Security Really Make Us Safer?. Fortunately, the writer of the article used Bruce Schneier as a source. (If you've been to an airport in the US, you know that the answer is "No; why would you even ask?")

The article's penultimate paragraph makes what should be an obvious point. (At least, it's obvious if you want to prevent terrorism as much as possible. If your goal is to spend lots of taxpayer money in a very flashy, showy way without worrying about efficacy, please continue.) In particular:

What the government should be doing is focusing on the terrorists when they are planning their plots. "That's how the British caught the liquid bombers," Schneier says. "They never got anywhere near the plane. That's what you want--not catching them at the last minute as they try to board the flight."

I read this article moments after sending an email commiserating about the silly (lack of) Unicode handling in a programming language which isn't Perl. Then something clicked.

One of my persistent desires for Parrot was to simplify the internals by reducing the amount of complexity and genericity in the core. In terms of Unicode, this means knowing the encoding of incoming data and the desired encoding of outgoing data, then transcoding to and from a single internal encoding. This way the core could operate on a single encoding and push the complexity of transcoding to the edges.

If Parrot hasn't changed this since I looked at it most recently, its string system requires each string to carry information about its encoding (which makes each string structure that much larger, increasing memory pressure) and each string operation to check for the need to transcode strings to mutually compatible encodings (which takes time for the comparison in every case, as well as time and memory for the transcoding in other cases).

Worse yet, string literals encoded in the source code of Parrot itself tend to have a specific encoding (ASCII or at least Latin-1 in the case of literals in the C code) and they ought to be constant, so transcoding in place isn't an option and, if you're working primarily with another encoding, that means always performing transcoding from that incompatible encoding.

It's not free to perform encoding at the edges, and you sometimes notice this when working with large chunks of data (though if you're processing multi-terabyte satellite images, treat them as binary and skip this encoding altogether), but it's the right thing to do.

The same principle applies for trusting incoming data. Secure it at the borders of the application. Don't spread those checks throughout the system. Harden the edges and don't let nonsense through. Fail early for suspicious things.

Otherwise you'll go mad trying to track down all of the possible interactions and possibilities of maliciousnesses that people could perpetuate if you lack a sane sanity policy. In other words, stop doing a lot of busy work to make it look like you know what you're doing. Do it right.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide



About this Entry

This page contains a single entry by chromatic published on December 22, 2011 3:04 PM.

How Would You Track User Behavior with Plack and Catalyst? was the previous entry in this blog.

Perl Documentation in Terms of Tasks is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by the Perl programming language

what is programming?