Chunks and Syntax Highlighting

If I'm right—if reading source code requires identifying parts of speech—then familiarity with syntax and grammar is important to programming as an adept.

Consider Damian Conway's SelfGOL. As an experienced Perl programmer, I can pick out various pieces of the code at a glance. There's an assignment. There's quoting. That's a variable. That's a list slice.

If you've never encountered Perl before (or programming in general), you might recognize some English words, such as print and die, and that's all.

One of Perl's design ideas borrowed from linguistics is that "different things should look different". To novices, everything looks different. $name isn't obviously a single chunk. It's an English identifier and one of several punctuation symbols apparently sprinkled at random throughout the program.

Good use of whitespace helps. So does the good use of parentheses as grouping constructs (though as in prose, they often get overused by novices).

One of the most subtle mechanisms to identify individual chunks floating in a sea of code is with syntax highlighting. I can't prove this. I haven't studied it in repeatable situations. Even so, I hypothesize that (modulo color choice concerns) merely highlighting different types of terms in the grammar in different ways will help novices understand how to pick out individual chunks in code.

This requires training. This demands practice. Unless you spend time reading code, you won't understand how expressions fit together, and you have little hope of understanding code. I believe it's impossible to skip this step, and thus I don't care if someone who's used C or ML has trouble reading Perl 5 code. Of course people have trouble reading when they don't know the grammar.

(Don't worry, Lisp fans. Homoiconicity—apart from additional complexity of quoting forms and reader macros—means that novices have to spend their time learning to recognize idioms and abstractions at a level higher than tokens and chunks without the benefit of patterns of chunk types as mnemonics to idioms. Then again, I think in patterns, rarely words.)

1 Comment

morungos | February 20, 2010 4:44 PM

"I can't prove this. I haven't studied it in repeatable situations."

I did test part of this experimentally for email. I timed people's ability to classify with and without layout information, and with and without the text itself. People were much faster recognising texts with the format, but they could do it almost as accurately without, they were just a good bit slower. Ron Baecker and Aaron Marcus did much more detailed proposals for typography, although I can't remember how much experimental work they did, if any.

However, if you want different things to look different, how about %foo and $foo. Yes, they are different, but then so are $foo{a} and $foo->{a}. Now these look very similar, and a lot of my novice colleagues struggle with seeing these as two different variables. Lisp only has one "value slot" per symbol (the function slot was separate) and in some ways this is more intuitive. Maybe this kind of not quite overloading should be detected and flagged by Perl::Critic.

Personally, I love Perl (and I loved Lisp too) but I do find Perl slightly more unpredictable syntactically. I am more dependent on tool support than I was with Lisp, even after 10+ years of Perl coding. It's probably just age, but TMTOWTDI in syntax may mean there is less consistency in patterns for novices to experience, making it harder for them to learn the chunks. (I could argue that might pre precisely why barewords are still in use as file handles.) Just my C$0.02.

Tags:

1 Comment

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry