From Alchemy to Science in Programming


Want to derail any serious discussion of programming language tools or techniques? Ask "Yeah, but does it scale?"

Sure, it's not science. It's alchemy and astrology, but you can demonstrate your world-weary superiority. Better yet, you can distract people from getting real things done.

Sometime between when I learned to program (when we counted processor speeds in megahertz and fractions thereof) and today, the question flipped. Back in the day, when BASIC didn't have a SLEEP keyword and cooperative multitasking still made sense (invoke a callback to your own joke here, but please don't block other people from moving to the next sentence) such that you could insert a do-nothing counting loop to delay things because even then computers were faster than human beings, we counted cycles.

We cheated.

Maybe we could have solved bigger problems if we were more clever, but we spent our time trying to cram as much program as possible into as few clock cycles as possible. If that meant rewriting a loop in assembly to get both the memory count down and to take advantage of internal details of the processor we'd read in one of the copious manuals, we'd do it.

Features were important, but the rule of the day seemed to be to use limited resources to their full amount. If that meant skipping a builtin feature because you wanted to unmap the memory it took and use it for something else, that's what you did. If you could save a few bytes by taking advantage of a builtin timer instead of writing your own, you let the screen refresh rate dictate what happened.

I don't lament that loss. (I liked the challenges, but there are always challenges.) I do find the switch fascinating though. Perhaps because I'm not writing silly little games or demos anymore, because I'm writing programs that are supposed to help real users manage their information and be more productive, maybe the switch flipped in me rather than in the world.

(Then again, I did learn to program by the osmosis of typing a lot of code, changing it, and eventually learning what worked and didn't. As above, so below.

The programs I write now care more about dealing with lots of data than they do about fitting in limited computing resources. (Sometimes resource limits are still important: I've had to change algorithms more than once to make the working set of at least one project fit in available memory.) In fact, the resources I have at my disposal are so embarassingly large compared to thirty years ago that I can waste a lot of processor time and memory to avoid waiting for things like speed of light latency accessing remote resources.

I didn't see that coming.

This all comes to mind when I see discussions of programming languages, techniques, and tools. The pervasive criticism flung and intended to be stinging is often "But does it scale to large projects?"

... as if the skills needed to manage a project intended to deploy to an 8-bit microcontroller with 32kb of RAM were so similar to a CRUD application running in a web browser used at most by 35 people within a 500 person company? (As if other skills are so different!)

Put another way, I don't care if you can't figure out how to make (for the sake of argument) agile development with pervasive refactoring, coding standards, and a relentless focus on refactoring and simplicity work with a team of 80 programmers distributed across four time zones and six teams.

I don't care if you think Java or PHP is the only language in which you can hire enough warm bodies to fill your open programming reqs because you think the problem is so large you have to throw more people at it.

I don't care if you think PostgreSQL is inappropriate because it's a relational database and they're slower than NoSQL if you have to scale to 50 million hits during the Olympics when I'm profitable with a few orders of magnitude fewer users.

Your large isn't my large isn't everyone's large, and the way you scale isn't the way I scale isn't the way everyone scales.

You're not doing science. You're not measuring what works and doesn't work. You're not accounting for control variables (could you even list all of the control variables necessary to produce a valid, reproduceable experiment related to software development tools and techniques?).

Conventional wisdom says "Don't optimize until after you profile and find a valid target to optimize and a coherent way to measure the effects of your optimizations." Is it too much to ask to come up with ways to measure the ineffable second-order artifacts of software development like bug likelihood, user satisfaction, safety, reliability, and maintainability so we can measure the effects of things like static typing, automated refactoring tools, the presence and lack of higher order programming techniques, and incremental design?

Otherwise we're stuck in a world of alchemy, before the natural philosophers clawed their way to the point where a unified theory of energy and matter and motion and interaction made any sense. Maybe someday soon the smartest person in the room will answer the question "How does this work?" with "Let's try and find out!" rather than donning wizard robes and hat and waving some sort of mystical wand about wildly.


You know, this has gotten to me too. Recently I proposed on an SO question that a poster use DBM::Deep to have some persistence.

The response was immediate; how could I possibly suggest DBM::Deep?! its so slow! Why not use DB_File or RDBS?

Because its easy, one module and one line and you have persistence! Its portable and I have never seen a speed problem. I don't know how big a system the OP was using, and maybe he does need something faster, but they didn't even ask! DBM::Deep works, every time, no configuration, nothing else to install; that's been good enough for me. Now leave me alone, and find a bigger barn for that high horse!

we spent our time trying to cram as much program as possible into as few clock cycles as possible. If that meant rewriting a loop in assembly to get both the memory count down and to take advantage of internal details of the processor we'd read in one of the copious manuals, we'd do it.

Just for the record: Even though the United States are largely unaware of it, this is still very much a thing!

"Does it scale" should more appropriately be restated as "does it scale horizontally".

Once its written, slamming more servers in to a cluster (or aws instances or whatever) is far more time effective and most likely cost effective than rewriting everything, moving data to faster disks, changing to a new table format or database software.

This doesn't mean that you don't profile and optimize, but if you are releasing a product with perhaps a 5% chance of commercial success - the emphasis must be on getting it out to market and allowing fast scaling by lower skilled staff. After all, the only code that makes money is code that's actually written.

Furthermore, with 'blah blah cloud' - being horizontally scalable allows your platform to grow with your customer base. Rather than having to stop-sell while your team comes up with another 5% capacity to allow you to start selling again.

Even if you can get 20-40% more performance, would the time of developing that be more cost effective than buying an extra 2-4 servers?

... it would still be great to see more effort in profiling perl and libperl. there is performance hidden in them there hills!

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide



About this Entry

This page contains a single entry by chromatic published on June 4, 2012 11:47 AM.

When You Can't Misuse the Immutable was the previous entry in this blog.

Updating Tests and Code in Small Steps is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by the Perl programming language

what is programming?