Counting Modules

A new website modulecounts.com compares the number of extension modules for Perl, PHP, Python, and Ruby. Unfortunately, it's wrong.

That's not necessarily the fault of the website, but the graph and the extrapolation on the graph make Ruby's momentum with gems look all but unbeatable. That's wrong too.

CPAN's number (as of today) is 18936 modules, which corresponds closely with the number on www.cpan.org. You can forgive someone not closely associated with the Perl 5 community for thinking that number represents all of CPAN.

In truth, it represents the CPAN modules list—specifically the registered modules list. The important notice on that page says:

This Perl 5 Registered Module List document is currently not being maintained and is several years out of date.

search.cpan.org provides a much better set of numbers: 21585 distributions and 88698 modules. Almost 80% of the modules available on CPAN are not on the registered list.

What's the registered list? Back in the day, when you uploaded a new distribution on the CPAN, community standards suggested that you should choose a location in the module naming hierarchy and that you should describe your upload with a short code of metadata to indicate the support level, the type of interface, and the like. This was/is the DSLIP code.

Some experienced CPAN authors might even remember DSLIP codes; they're used even less often than the initial hierarchy of the CPAN index. Registering modules is much less useful than merely uploading them and letting people search for them by name and description with search.cpan.org.

To summarize, the module count on www.cpan.org represents a fraction of the available modules available on the CPAN because it only counts modules which the uploaders have bothered to register with the registered module list. You can write a lot of great modern Perl code while using unregistered modules. There's no correlation between appearance on the module list and quality or utility, except that most of the registered modules are likely older projects first uploaded when the registered module list and DSLIP codes were used more widely.

Any count of CPAN modules should use the numbers from search.cpan.org instead of cpan.org. I've submitted a pull request to update the sources of the CPAN module counts for modulecounts.com.

7 Comments

chris.prather.org | December 20, 2010 12:18 PM

Barbie has been keeping fairly accurate[1] collections of CPAN metadata like this for a while.

http://stats.cpantesters.org/statscpan.html

From the stats there we have 24K unique distributions on the CPAN, and extrapolating from the raw data from http://stats.cpantesters.org/trends.html we averaged 9 *new* distributions per day in the month of November (the last full month we have data for), and 8.9 new distributions this month.

[1] I've used these values at various times to eyeball check CPAN crawling projects I've done. The numbers have always been highly accurate.

fxn.myopenid.com | December 20, 2010 12:31 PM

Ruby gems should be compared to CPAN distributions in my view.

somethingdoug.com | December 20, 2010 1:06 PM

I agree with Xavier, that site needs to compare things as close as possible in the different languages. In Perl we have distributions and modules, where a distribution contains one or more modules but usually comprises of a particular task (and has tests, README, Changes, etc.). A Ruby Gem is pretty much identical to a Perl distribution. The site is also counting Python packages, which are like a Perl distribution as well.

https://me.yahoo.com/mithaldu#29f3a | December 20, 2010 10:53 PM

I'e been in contact with the maintainer and the main issue is that CPAN's use of the word module confused him.

He is in fact trying to compare distributions, but also still trying to properly name and define what he is searching for. I've been trying to explain the Perl situation to him and why module is a bad choice for a name.

Also, for what it's worth: The distribution count on search.cpan is wrong as well. It shows the number of unique distribution releases appearing in the 02_packages list. But due to the nature of that file, this means that it counts multiple releases for some distributions. The actual count of unique distributions is around 20570.

As for why ruby is speeding ahead, very simple: It has an extremely low barrier of entry. You register on the gems site, supplying four pieces of information and you're ready to go without human intervention.

On PAUSE meanwhile registering can take multiple weeks while you watch other people who applied after you be processed before you. This is not hypothetical but happened to a friend of mine who applied back in September:
http://www.nntp.perl.org/group/perl.modules/2010/09/msg72653.html
After that he got a job, which severely diminished his free time and then he suddenly got accepted in November, at which time he was way too busy to contribute:
http://www.nntp.perl.org/group/perl.modules/2010/11/msg73489.html

brian.d.foy.myopenid.com replied to comment from https://me.yahoo.com/mithaldu#29f3a | December 21, 2010 3:43 AM

Yes, the PAUSE admins recently had some trouble keeping up with new user registrations, and sometimes registrations fall through the cracks. Our volunteer who had normally handled that quite religiously disappeared. I'm not trying to excuse it, and it's unfortunate that it happened to your friend.

Anyone who thinks they've been skipped over can send us a note at modules AT perl DOT org.

https://me.yahoo.com/mithaldu#29f3a replied to comment from brian.d.foy.myopenid.com | December 21, 2010 12:43 PM

No worries, i wasn't very bothered and neither was he, merely confused when it happened. I only mentioned it because it is a strong example to highlight the differences between CPAN and ruby gems when it comes to getting in and getting things out.

Also, this really only affects the numerical factor. I'm fairly sure that due to the low level of entry, the quality of ruby gems on average is decidedly lower.

PS: It might be time though to automate some part of it. Or at least mention some information on what to do when the process seems to have broken down on the PAUSE FAQ and/or registration page. ;)

https://me.yahoo.com/a/evZh.8gAt5qa1xDbY_dE.iSYdbI-#2dbce | December 23, 2010 6:19 AM

Would be cool if CPAN authors could do gmail style 'invites' to people to create a PAUSE account. The idea is that would be a sort of express account creation that could be automated.

That and making it easier for people to get started, I know that's a lot of people want to contrib to CPAN but the learning curve is steep. And that's too bad because being a CPAN contributor is fairly easy, just the exist docs are scattered and not always so clear to people. I know people still are using h2xs because that's in the docs.

Tags:

7 Comments

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry