My esteemed colleague Ovid has led a small private discussion about teaching Perl to people who already know how to program, and the subject inevitably turned to the topics in my Modern Perl book. Blame me.
Like half of the people reading this, I have years of experience unsnarling
and helping novices unsnarl code written without a firm grasp of programming.
Again, there's
nothing wrong with writing Baby Perl, but the difference in
maintainability, quality, concision, and reliability between Baby Perl and
grown-up Perl is striking.
These experiences tempt me to generalize that novice programmers have
trouble learning three important principles: abstraction and composition, user
input, and robustness.
Abstraction and Composition
Inefficient programmers tend to experiment randomly until they find a combination that seems to work.
— Steve McConnell, Code Complete
Programming—symbolic computation—is primarily the act of
representing a problem in terms of the proper abstractions so as to manipulate
the individual entities of the problem efficiently and correctly. I suspect
that you'll struggle with even simple programming unless you have the basics of
algebra correct, and something more than a + b = 10
.
The first peak novices must scale is understanding the difference between a
value and a name which stands for that value, and that the
computer doesn't care about the name.
The second order effect of this is when a novice realizes why it's stupid to use a variable
as a variable name. This is where composition comes in. You know how to
manipulate scalar values. You know how to manipulate aggregates, such as arrays
and hashes. What happens if you combine those principles?
Some languages do better at this than others. PHP is terrible; witness the
comment section on PHP.net sometime. If
genericity, abstraction, and composition are the disease, PHP.net is the rusty
horse needle containing 100L of vaccine. (I know the difference between cc and
L,thank you.) PHP encourages people to reach the "I can search the Internet for
code to copy and paste and experiment randomly with gluesticks and glitter"
stage of development, then chains them to tables making rhinestone-encrusted
wallets to sell to tourists during the next dry season.
Compare that with Scheme, at least as taught in The
Little Schemer where certainly it's impractical to write addition
and subtraction primarily in terms of recursion, but by the end you're going to
know how recursion works and how symbolic computation works and that you can
define what you thought of as primitives in terms of other primitives and, by
gum, you'll be a better programmer for it.
I think this is what some people mean when they say "Never trust
any developer who doesn't understand C pointers", because it'd be crazy to
claim that knowing how to work around the limitations of the PDP-8 memory model
in 2011 is ceteris paribus all that useful for most programmers. Understanding
enough of the von Neumann model in practice such that all of the convenient
parts of modern programming languages are just names hung on buckets somewhere
in silicon should suffice.
From there, the next hurdle to overcome is understanding genericity, whether
in terms of polymorphism or type substitution. If properly explained, Perl can
make sense here. When you write my $birthday = $YYYY . $MM . $DD;
it doesn't matter so much that $YYYY
and $MM
and
$DD
are all numbers as it matters that they know how to stringify.
Yes, that's polymorphism.
You're welcome to explain that in terms of duck typing, if your language
isn't cool enough to support better
abstractions, or you could pull out Real World Haskell and begin to think
about type classes.
("What about Java?" you ask? You could probably learn most of these concepts
if you were properly motivated, and if you kept away from your IDE's
autocomplete and copy and pasting example code you found on the Internet. If
you find yourself managing large lists of JARs and trying to resolve conflicts
between them, or if your best impression of genericity and polymorphism is
writing big XML files to take advantage of dependency
injection or at least slapping annotations on everything and hoping. That's
not to say that it's impossible to write great code in Java, but you're going
to need a lot of self-discipline to do it.)
Haskell and other languages with real higher-order functions (Java and
Python don't count, JavaScript and Perl and Ruby do, I haven't used C#, and I
don't have the heart to test PHP) can take you to the next level, where you
manipulate functions as if they were data, because they are. You don't
necessarily have to be able to build your own object system out of closures,
but you should be able to understand the equivalence of encapsulation and
abstraction and polymorphism available when you limit yourself to the primitive
abstraction of apply
. (Hello, CLOS! How've you been all of these
years?) Certainly patterns such as Haskell monads count, because they're
language-supported abstractions over the structure and execution of program
components.
You don't have to master all of these techniques to move past novice status,
but you should be comfortable knowing that these techniques exist and beginning
to recognize and understand them when you see them.
User Input
In my conversation with Ovid, I mentioned "Handling Unicode", but I can
expand this point into something more specific.
User input lies.
Sometimes it lies deliberately, as when a malicious or curious user gets the curious idea to provide bad data to see what happens. Sometimes it lies to your expectations, when a user like me has a single-word pseudonym and if you force capitalize it, you're doing it wrong. (You'd think the soi disant "world's largest encyclopedia" would bother to get spelling correct, but Wikipedia editors are apparently too busy deleting all things non-Pokémon and Star Wars Extended Universe to fix their software.)
Sometimes it lies because the world isn't solely the realm of ASCII or even
Latin-1, and if you don't document your expectations and bother to find out the
truth about the data you're getting or what other people expect of the data
you're producing, you'll see what real line noise looks like.
I had an interesting discussion with a motivated novice a while back, when I reviewed some code and found a SQL injection attack vector in cookie-handling code. "Never trust the user," I said.
"Why would anyone do that?" he asked.
"Never trust the user," I repeated. You're not paranoid enough. Even if you
think you're paranoid enough, you're probably not. In a subsequent discussion,
I said "Never parse XML with simplistic regular expressions—you're
expecting that that start element will always occur on a newline with a four
space indent."
Unfortunately, only bitter experience can burn this lesson into the minds of
some developers. Certainly several years of doing things quickly the wrong way
and having to fix them later taught me better. Mostly. (I wrote my own web
server without reading the HTTP RFCs, and it mostly worked. Mostly. See also
"Why your CGI parameter processing code has at least five bugs and one latent
DoS attack vector, and then learn how to use a library and get on to more
interesting things.")
Robustness
If the latter sounds like robustness, it's true. Perhaps handling input and
output properly is a special case of robustness, but I separate them because
you really can't trust user input.
A robust program knows what's likely to go wrong, how it can recover, when
it can't, and does the right thing in those circumstances.
McConnell's flailing novices tend to be so happy they managed to find the
right magic incantations to get something working that they're
reluctant to rejigger their Jenga
towers (American cultural imperialism warning here) for fear that
everything will collapse.
Robustness doesn't mean that every program needs configuration-driven
logging and a failover heartbeat monitoring system. Sometimes it's enough to
throw an exception, then let the next cron
invocation restart the
process and pick up where it's left off. (I really like this pattern.)
Yet if you haven't thought about what's likely to happen, what's unlikely to
happen, and the consequences of each, your program is probably much more
fragile than it needs to be. Yes, understanding risk and consequences and error
patterns takes experience, but error handling should rarely be optional. (See
The
Tower of Defaults.)
Software isn't magic. It's part science, in that we can make empirical
statements about what we understand to be reality, test them, and reassemble
them into larger frameworks of axioms. It's also part art, as we have a
fantastic creative palette with which to do so. Both require knowledge and a
thirst for knowledge and a desire for a depth of understanding.
When novices expand their minds and experiences to encompass all of these
tools, they become all the better for it. That is what I want to
encourage in any book aimed at novice developers or novices in any particular
language.