One of the suggested objectives in Gabor Szabo's Measurable objectives for the Perl ecosystem is "Increase the number of Perl book sales". I like most of the objectives in Gabor's post, but I must caution against taking the numbers presented too seriously. At best, they're incomplete. At worse, they're completely misleading.
Gabor refers to State of the Computer Book Market - Mid-Year 2009, written by Mike Hendrickson. Mike's company publishes similar analyses a few times a year, based on sales data from Nielsen Bookscan. (For more on the culture of Bookscan rankings in the publishing world, see Why writers never reveal how many books their buddies have sold.)
This data sounds wonderful and the pretty graphs and charts give you the impression that you're getting useful information. Yet this is only a picture of the market. As Mike writes later in the piece:
Many publishers report that more than 50% of their revenue is achieved as direct sales, and those numbers do not get reported into Bookscan. Sales at traditional college bookstores are typically not reported into Bookscan as well. Again this is US Retail Sales data recorded at the point of sale to a consumer.
These numbers reflect less than half of revenue. Throw out half of sales by dollar and hope that the results are stochastic.
Yet there's a deeper flaw behind these numbers.
How Book Sales Work
If you plot the sales curve of multiple books, you'll notice that they tend to follow the ubiquitous power law. A book sells as many copies in its first three months as it will the rest of the first year. A book sells half as many copies in the second year as it did in the first. This model is so accurate that the publishing industry calls titles "frontlist" titles if they're in their first year of publishing and "backlist" titles if they're not.
While a few titles have strong backlist sales, they're rare. They're the Bibles and Harry Potters and How to Win Friends and Influence Peoples. The publishing industry's san greal is to find a new strong backlist bestseller.
They tend to exhibit strong frontlist behavior as well.
The retailer's point of view is different. Limited shelf space means that new books still in their three-six-twelve month short snout sales levels often get priority over older books in the long tail sales levels. If you're going to see 3000 copies of a book in the first three months and 1000 copies in the next three years, stock up early.
This is especially true in technical book publishing, where I have trouble giving away Python 2.3 and Oracle 7 and Red Hat Linux 6 books. Publishing dates are expiration dates: best by a year after the copyright date.
Why does this matter? It's a simple matter of economics: people won't buy books you don't publish.
The Freshness Factor
2005 was a good year for Perl book sales. Why? Four strong Perl books came out in 2005. The Perl book sales numbers for that year reflected the short snout of Perl book sales.
Four years later, is PBP selling as many copies? Is the Perl Testing book? Is HOP? Is APP 2e?
Those are rhetorical questions. You already know the answer. You can even answer that question for the Camel 3e. A book published in 2000 may still be useful nine years later, but Camel 3e predates almost every part of the Perl Renaissance. Besides that, the 250k or 300k units already sold have reached a fair amount of the Perl 5 programming market.
Compare that with the Ruby book market in 2006, where you couldn't leave an editorial board meeting without an assignment to publish a new Ruby or Rails book. Initial sales numbers looked great; the growth in that market segment was huge!
Did any Ruby book sell 250k copies, though? That number's missing from the year-by-year analysis.
Look at this year's numbers. Objective-C is huge! It's 1999 all over again! Except that, yet again, the comparison is to an emerging market segment without analysis of historical trends.
The Missing Data
The biggest piece of data obviously missing from these State of the Computer Book Market entries is historical context. Six months or a year of appositional data comparing different market segment maturities is misleading, at beast. Should you go learn Objective-C just because Bookscan reported more Objective-C titles sold than SQL?
No -- but to be fair, Mike doesn't suggest this directly.
Other missing data is more subtle, and perhaps more meaningful. Where's the breakdown of frontlist/backlist for these sales figures? More than nine out of ten books follow the power law I described earlier. If the Objective-C books have all come out in the past year, they're in their short snout period. Of course they're selling more units now than books in the long tail period.
How many total units does the market represent? If the number of books sold in 2009 is half the number sold in 2008, it's difficult to compare the performance of books against each other year-over-year. There are too many other factors to consider. (You can still get interesting information, but you can't compare technologies against each other in meaningful appositive ways.)
How many books are in each category? Title efficiency (average number of unit sales per title and standard deviation) can tell other interesting stories. Is one language category hit driven (iPhone Programming, Ruby on Rails)? Are there niche subjects intended as modest sales targets and not bestsellers? Is every book a moderate success, with no breakout quintessential must-have tome? Is there a gold rush of publishing with 40 new titles produced in a year and each of them selling a dismal 1000 copies apiece?
How many new books are in a market segment this year compared to last year? This is the biggest question that matters to Perl books, especially with regard to Gabor's suggestion. Again, this should be obvious: no one can buy Camel 4e right now.
A Completely Hypothetical Fictional Example I Made Up Completely From Whole Cloth
If that didn't convince you, consider a short fable about oyster farming.
Suppose you own a publishing company. Suppose you discover a new topic area: oyster farming. No one's published on this topic before, but hundreds of thousands of people are doing it. There's a lot of institutional knowledge, but there's a ripe opportunity for documenting best practices and nuanced techniques -- especially given that you have found the person who invented modern oyster farming and convinced him to write a book about it.
You publish the book. It takes off. Its short snout is wide. (My metaphor is awkward.) You've discovered a new market segment; you've invented a new market segment. Life is grand.
You branch out. You publish More Oyster Farming and Learn to Farm Oysters and Pteriidae, Reefs, Bivalves, and Mollusks. You even write a cookbook for Oysters.
Then a catastrophic triploid spawning accident removes the long-beloved MSX resistance in most commercial oyster farms, ruining the market for a year -- maybe longer -- and in a panic you cancel all of your upcoming frontlist titles.
A few other publishers publish one- or two-off titles in the market segment. They sell a few copies. You had a corner on the market though. You were the publishing world's China of oyster farming. Over the next four years, you look at your sales numbers and congratulate yourself for getting out of the oyster farming publishing market segment when you did, because no one's buying oyster farming books anymore.
I am glad you took the time to write such a long reply to one of the entries in my post. I was hoping that people who have more insight in the specific fields will do that and help us establish some metrics but I am not sure I understand your point.
Do you mean none of the book sales data is relevant or only those that are published by O'Reilly in those blog posts? Is there a way to get better data? How would you compare the relevancy of this metric to other possible metrics?
Regarding the power law. How are the 2nd, 3rd editions behave? Do they have similar 3-6-12 effects as the first edition? Is there some data on how much smaller those peaks are than the first one?
If I understand you the reason you think the Perl book sales are so low is because no (or few) new titles were published in the recent years, especially by O'Reilly. They in turn don't publish new books because 1) they don't find new subjects in the Perl world, 2) they think they won't be able to sell enough copies in the first year to make it worth while.
So how would you construct a good metric from book sales? Where could we find more relevant data?
Honestly, I think there's another factor here that makes such a metric completely unnecessary:
Perl has an amazingly strong online community that documents most things in more than adequate ways. Over the past 5 years there are only two books i ever actually needed: PBP and Testing Notebook. All other topics were more than covered by CPAN documentation, perlmonks discussion and blog entries.
I suggest that getting more perl books *written and published* would be a good goal.
If I may be so presumptuous, I think chromatic is saying that;
- Book sales in general are under-reported, roughly by half
- The good data is most likely kept secret
- Roughly 90% of books follow the powerlaw
- O'Reilly owned the perl book market, but Perl6 is taking too long for them to build their business around
Personally, I think the book market is a poor metric to judge the success of a programming language. I know people will use that metric anyway, but it is a specious argument often used to convince someone language a is better than b.
I think things like Padre and Strawberry Perl, two projects you have done so much for, are going to do much more for perl than book metrics and whatever we might glean from them.
So I used to work on software which told many Hollywood studios how much money their movies made. This was real time data. So when you read in the paper that "Riot Grrl Zombies Lick Your Fingers Before Eating Them" only made $3,000 dollars last month, there's a good chance I worked on that software. The number, though, is likely to be distorted.
First (this is all from memory and I possibly have details wrong), we only pulled data from 80% of the theaters in the US. We could have pulled more data, but Hollywood likes wiggle room in reporting data.
Second, if you buy a ticket for a double feature like, say "The Three Faces of Eve" and "The Good, the Bad and the Ugly", who gets credit for the money? Why both, of course! They each get credited for all of the money. So double-features double the income instead of halving it. This is ridiculous, but also standard practice.
Third, sometimes studios lie. There's a particular Hollywood blockbuster for a major star which, nonetheless, came out in second place to another studio's movie on their opening weekend. The first studio contacted the second and made an arrangement to report the blockbuster's sales as higher and the second studio agreed. (I know the movies, I don't know the terms of the deal, and I can't say anything more).
Of course, Hollywood accounting is particularly arcane, but I expect that many large industries have all sorts of creative ways of dealing with numbers. Thus, if we decide to use Perl movie sales as a metric, we'll have to take the resulting figures with a grain of salt :)
I would love to buy:
1. A book about Moose and Meta Object Programming
2. A modern perl book
3. A perl6 book (I know there is now a parrot book, but I don't want to hack parrot, at least not yet. :) )
Who will sell me these?
@jeremiahfoster, watch this space. #2 is in progress, #3 is under discussion, and #1 is definitely in planning.
I take your point, but still applaud Gabor for at least starting a conversation about measurable objectives. It's a necessary conversation, if the Perl community is ever going to be able to (somewhat) objectively say "We're growing, we're standing still, we're shrinking."
In my work, I tend to advise organizations to look at the trends in the numbers, not the raw numbers themselves. For example: have more Perl books been published this year than last, than the year before, and so on. What's the trend.
+1. I just received my copy of The Definitive Guide to Catalyst and, in reading just the first handful of chapters, it's made me realize the critical necessity for a book about "Modern Perl." Here's to hoping that it can help to change the trend on Perl publishing. :-)
Phillip.