March 2013 Archives

When is a Hash the Same?

By chromatic on March 25, 2013 6:00 AM

Suppose you have two hashes which (when you inspect them) contain:

%first = (
    dog => 'Rodney',
    cat => [qw( Daisy Jack )],
);

%second = (
    dog => 'Rodney',
    cat => [qw( Daisy Jack )],
);

Are they the same?

Your answer depends on what identity means to you. At this point in time, they have what appear to be identical contents. They're probably different variables, though. If you look at things from the point of view of the perl program itself, the two names %first and %second refer to two different HV structs.

That's not very interesting to most users, I suspect. The questions of how these two hashes behave when you treat them the same way is more interesting. For example, given the state of the two hashes as demonstrated earlier, do you expect this test to pass?

use Test::More;

my @first_keys  = keys %first;
my @second_keys = keys %second;

is_deeply \@first_keys, \@second_keys, 'keys returned in the same order';
done_testing;

The official answer is "No, you should not." As of Perl 5.18, the official answer is "If this test ever passes, it's an unlikely coincidence."

A hash map data structure (what Perl calls "hashes") associates keys with values by calculating a simple hash of the key and using that as a cheap index into another data structure. That calculated hash is not necessarily unique, and the indexed data structure needs to be able to handle collisions of the calculated hash (even though the keys are different) as well as insertions of new key/value pairs and removals of key/value pairs.

When you traverse the Perl hash with keys or values or each, perl traverses that data structure in a particular order that makes sense given the state of that data structure. If and when that data structure changes, the way perl traverses it will change. The details of that data structure are hidden inside perl; they're not obvious from the point of view of the Perl programmer.

The implication of this is not obvious at first, but it's easy to explain: even if two hashes have the same keys and the same values, the order in which you traverse that hash with keys or values or each may be different between the hashes.

After all, you don't know whether %first was defined all at once, or whether it started empty and had the two key/value pairs added one at a time or both at a time. You don't know whether %second originally had three key/value pairs or four or ten. You can't look at what a hash currently contains and guess at the details of the internal data structure perl uses to represent those keys and values.

As of Perl 5.17.10—what will become Perl 5.18 very soon—this behavior gets a little more predictable. With a recent patch to bleadperl, the calculated hash of two hashes defined with the same keys and values in the same way will be slightly different. In other words, even if you write:

my %first = (
    dog => 'Rodney',
    cat => [qw( Daisy Jack )],
);

my %second = (
    dog => 'Rodney',
    cat => [qw( Daisy Jack )],
);

... the test from earlier may still not pass. (It's very unlikely to pass.) Any code that relies on an accident of implementation which causes that test to pass occasionally is, sadly, buggy code.

The good news is that you almost never need to rely on that behavior which happens by accident. When I come across code which relies on a specific ordering of keys or values of a hash, it's often in tests which care more that the contents of the hashes are equivalent than that the order of traversal is the same between the hashes. (In those cases, you can use a function like Test::More's is_deeply to compare keys or values or both, or sort the keys before comparing them.

In other cases, you must rely on the documented guarantee that using keys and values on the same hash (not a hash with the same contents but a hash which uses the same underlying HV) will always traverse that data structure in the same order until you modify the hash.

Anything else has the potential to do the wrong thing depending on the implementation. Perl 5.18 will be a little more strict about this, which is actually a good thing, as it will identify buggy code much more easily.

Mrs. Feynman's Advice on Programming Language Popularity Contests

By chromatic on March 19, 2013 7:49 PM

It's that time again. Someone notices a line going down on a graph, goes into MUSTFIXNOW mode, and declares that anyone who doesn't agree is a slumbering fool.

Does that solve a real problem?

I assume that my intelligent, clear headed, and hard working readers have already read and thought about VM Brasseur's comments on Perl marketing as well as Ovid's programmers ought to study practical economics if they're going to talk about markets.

Here's the question I try to ask whenever I talk to a stakeholder about a business requirement. (I know it's a question most of you ask too; it's the right question. What problem are you trying to solve?

Is your problem that some measurement on some web page ranks something you don't like higher than something you do like?

Is your problem that something you don't like has been on more magazine covers than something you do like?

Is your problem that the hearty self congratulation mutual admiration society on Hacker News (now there's a middlebrow dismissal!) thinks that something you like isn't as new and shiny as something you don't care about?

Is your problem that you're having trouble finding a job using something you like?

Is your problem that you're having trouble finding other people who use something you like?

Is your problem that you're having trouble finding accurate and useful resources for something you like?

Is your problem that you're worried about the future of something you like?

I don't know what you want. I know what I want and what I think, and if we don't agree, we're going to have trouble communicating.

For example, I don't care if Google prefers Go as a language to anything else. That doesn't affect me much. (Perl's better at Unicode than anything else I've tried, so the Unicode-heavy work I'm doing lately gets done in Perl.)

I don't care if 37 Signals or Heroku thinks the right way to deploy a website is with Ruby and NoSQL (did I mention Unicode? Also I find Mojolicious much easier to manage than Rails, much less magical, and frankly a lot more stable, especially with regard to security lately.)

I don't care if every new startup with a domain name ending in .io (if that's still a fun thing; I don't know) says that Node.js is the best way to write applications, and all of this stuff the dynamic language people have talked about for 20+ years is really done right, and oh boy callbacks (again, Unicode, and a mature library culture that supports things like actually deploying and managing modules and tests and, let's be frank, not writing our own web servers because we like to be close to the metal, although I will admit that asynchronous IO as an expected feature of all libraries is a benefit).

I don't care if a company that wants to sell you recruiting materials publishes a web page with questionable metrics based on very specific web searches says that the global programming language market share lines on a graph go up, down, or sideways (see also Wikipedia drama about how the name of a page affects a language's TIOBE ranking, and weep for the future of humanity).

As Mrs. Richard Feynman asked, "Why do you care what other people think?"

I know—we're lazy techies, and we're eager to jump on silly little technical solutions and argue about them from behind the comfort of our keyboards rather than doing the scary wetwork of actually talking to people.

If we actually talked to people about what they wanted, we'd find out that they care about things like:

How easy is it to hire and/or train people in a language?
What are the deployment concerns for the language?
What are the security and support channels like?
How many programmers do you need to accomplish a task?
Is the language and its ecosystem suitable for one task or another?
Will a cancellation of the language by its primary vendor or abandonment by its single author or forking of its community have any detrimental effect on recruiting and retaining?
Does the language and its ecosystem support the desired platforms?
Will choosing this language solve more problems than it creates?

Note the conspicuous lack of "Do lines go up, down, or sideways on an Internet popularity contest?" (Note the very conspicuous lack of "How long has that version number remained unchanged? Is there a chance you could shove a gourd in there instead? Really? Great!") In fact, you can throw out the opinion of anyone who suggests that's an interesting concern because that person is a buffoon who should manage nothing more technically complicated than a stone wheel. Perhaps the thigh bone of an ape.

I know you all know this; my readers are intelligent, creative, and clever people. All of you. But not everyone knows this, and so when you run across people who have no endgame in mind more interesting than "I don't like the looks of that graph", you have permission to suggest that they find other goals in life.

Please note that I'm not suggesting that everything is alright and that the Perl community should abandon all attempts at advocacy. That would be a silly opinion. Rather instead I'm suggesting that advocacy deserves intent and measurement at least as serious as approaching a technical problem. You wouldn't flail around and hope that random changes here and there in a program would fix bugs without making others worse. Why would you do that in an advocacy context?

Upgrade in Place with Perlbrew

By chromatic on March 12, 2013 7:35 PM | 3 Comments

Per Perl's compatibility and support policy, minor releases of Perl 5 such as Perl 5.16.0 and 5.16.3 share the same level of binary compatibility, while major releases of Perl such as Perl 5.14, 5.16, and Perl 5.18 do not share binary compatibility. That is to say, modules built and installed for one major version of Perl are not necessarily compatible with modules built and installed for another major version.

Because of the new Perl 5 yearly release schedule, you may find yourself wanting to upgrade your Perl far more frequently than in years past. The perlbrew utility helps you manage your own installations, which allows you both to leave the system Perl as it us and to install your own version or versions of Perl for your own purposes.

Because your OS vendor is not in control of these Perls, you can upgrade them or neglect them at your leisure.

When new minor releases come out (as they've done recently), you get to decide when and how to update your managed Perl installations. If you're like me, you have hundreds or thousands of CPAN distributions installed, and the thought of spending a couple of hours reinstalling those seems less than fun. Fortunately, Perlbrew includes a command called upgrade-perl which will download, configure, compile, and install the latest minor version of the current major version family you're using as a replacement. Your modules stay installed and working.

Perlbrew will even figure out if there's a newer minor release available for you.

(As with all updates, there's a small risk of code changes affecting behaviors you care about, but that's why your code has a comprehensive test suite, right?)

From a managed Perl, run the command:

$ perlbrew upgrade-perl

In a few minutes, you'll have a new binary running in your existing environment. Isn't that easy? Thanks, perlbrew!

Clean Your Room

By chromatic on March 4, 2013 10:49 PM | 1 Comment

Like many of you reading this, much of my programming work requires maintaining software, and much of that software has significant amounts of code written by other people. Like most of the software projects in the world, much of that maintenance comes when stakeholders realize that the business could make more money or spend less money if the software did something new, or at least different. Other maintenance comes from a developer realizing that the code would be easier to write or faster or simpler or more secure or more correct if we could do things a little bit differently.

One of these projects is experiencing a great deal of growth right now. We're fortunate to have a small team of very good developers, a single customer/stakeholder/business unit, an ever-improving test suite, the technical freedom to make our own decisions, and a little bit of slack time every milestone with which we can make improvements as we go along.

In fact, one of the core abstractions of the system came about because we noticed a repeated pattern in our code and figured out a way to encapsulate that pattern in a single named entity. That's happened a few other times in the system, and it tends to happen to me a lot.

Because we have good developers, a useful test suite, and slack time, we can take advantage of the single deployment target to move things around as necessary. That means changing APIs. That means splitting apart classes or moving functions or renaming things or even deleting big pieces of code as we replace them with rewritten components.

You can make a lot of progress toward the right design with that freedom.

If you make a habit of improving your design, and if you're diligent about finding the right design, you'll discover that the places where you have the wrong design become more obvious. You'll also discover that the things you want to do tend to become easier over time.

If you discover that you've somehow finished your software or found a design that you'll never have to change again, stop. Then accept your Fields medal.

If, for whatever reason, you can't make improvements, or you can make improvements only very slowly, you're in trouble. There's a strong correlation between having the right design and being able to deliver working software. If your design is wrong, you'll spend more resources trying to achieve the same goal than if your design is right, or at least more right.

If you're running a volunteer project, you'll churn through more volunteers, unless there's an exceedingly attractive reason for people to stick around through that frustration.

All of this came to mind when I read David Golden's Is Perl 6 pointless, hopeless or just not done?. (Spoiler alert: David's article is really about Perl 5.)

The semi-annual sound-and-fury parade about renaming Perl 5 or Perl 6 sometimes sparks the somewhat useful discussion about whether Perl 5 can ever "break backwards compatibility". I take some responsibility for fueling the silliness of that discussion; if you have more free time than I care to devote to the task, you can look back in the archives here for some posts which rather overstated the problem.

Regardless of if or when Perl 6 is ever an option for your projects, Perl 5's future seems to me to walk the knife's edge of one sad fact: the Perl 5 implementation's source code compatibility with itself is the most difficult part of maintaining Perl 5. In other words, it's never had a really good room cleaning. Maybe it's never had the volunteer effort needed to make that happen, or maybe it's never attracted the kind of people with the skill and time to make it happen, or maybe it's never had the kind of project slack necessary to make it happen, or maybe it's all of those and more, but the real problem isn't that there's no will or ability to make changes incompatible with Perl 4 or Perl 5 before 5.10 or 5.12. It's that it's really difficult to do so within the Perl core itself, and that's even before you realize that some of the most innocent looking changes could render significant percentages of the CPAN unusable.

For a long time I believed that Perl might be the second example to contradict Joel Spolsky on whole-cloth rewrites, but I lost the will to work on Perl 6 and someone else can take my place there. If Perl 5 is difficult enough to maintain as it is, what chance is there that a complete rewrite will reach the point of utility before it burns out volunteers?

Even Python 3—the comparatively modest changes involved there—has taken several years before it's a serious target for migration of existing Python projects.

I suspect that the best hope of having a vibrant and vital Perl 5 which supports its full ecosystem is somehow figuring out how to redesign its implementation in place to support both new features and better infrastructure (cheaper Unicode, better concurrency, a core MOP, even macros and JIT).

That will be neither cheap nor easy, and it probably requires recruiting and training at least another couple of full-time developers. Hands wave at this point. The CPAN testers shudder. A dozen projects have come and gone and made little visible progress. A little voice is far too infrequently heard asking "Who's going to do the work? Who's going to maintain it?"

Yet the alternative is that every year Nick and Dave do a great job of keeping things running and making improvements here and there as they have spare time, and pumpkings like Jesse and Ric make wise decisions to cut through technical disgreements, and the thankfully too many to count now release managers take some of the administrative burden off of Nick and Dave and Jesse and Ric so that there are fewer single points of failure, but there just aren't enough resources to make the kind of dramatic improvements that a focused effort on a redesign in place might produce. To be excessively clear: you should blame no one for the Perl 5 internals being what they are. There's been a shortage of resources to make improvements for far too long.

I know it'll be expensive and difficult and painful, and I probably undercut my argument by admitting that I'm still burned out on volunteer projects, but if there were somehow to appear people willing to take on this task, I would donate some time to help and my business would donate some money to TPF to help make it happen.

« February 2013 | Main Index | Archives | April 2013 »

March 2013 Archives

When is a Hash the Same?

Mrs. Feynman's Advice on Programming Language Popularity Contests

Upgrade in Place with Perlbrew

Clean Your Room

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Archive