Who Benefits from the CPAN?

First: This is neither a complaint nor a criticism. I understand the intent of the CPAN and its goals. I believe it meets those goals effectively.

If you talk to Jarkko about the CPAN, he'll likely tell you that it's primarily a distribution service. It's a series of regularly updated mirrors containing some metadata and an archive of redistributable code. Many proposals for enhancements and replacements and reinventions in other languages have come and gone. Most of them have tried to add complexity to this simple base. That's one reason they haven't succeeded.

This half of the CPAN makes code available to users.

Another half of the CPAN is PAUSE, the service which allows CPAN developers to upload their code to the metadata analysis and code distribution service.

The third half of the CPAN consists of the tools used to find and install CPAN distributions with partial or full automation. It's an optional part of the CPAN experience, but it demonstrates that the CPAN ecosystem also includes tools which rely on the metadata and mirroring services which the CPAN makes available. Without this metadata, the CPAN would be much less useful.

It's also the metadata which allows services such as search.cpan.org (which many people consider the face of the CPAN), RT for CPAN, CPAN Testers, CPANTS, CPAN Ratings, CPAN Forums, and plenty of other services now and in the future.

That's what the CPAN is: a loose federation of sites and services built around a code and metadata mirroring system, with an upload service for registered developers.

Who's It For?

I believe the primary beneficiaries of the CPAN are active CPAN developers.

By uploading your code to the CPAN, you get worldwide mirroring and distribution. You get test results from a wide variety of platforms and versions. You get bug tracking, documentation hosting, reviews, and feedback on the quality and efficacy of your distribution.

You get to push your installation and dependency management to CPAN installers. Because CPAN tools are effective about gathering dependency information and publishing it in a form that other CPAN tools can understand, the easiest way to install distributions from the CPAN is with a CPAN shell such as CPAN.pm or CPANPLUS. Utilities exist for free software distributions such as Debian and Gentoo to wrap CPAN distributions into OS packages where the packaging system can manage them, but they're necessarily specific to individual platforms, where the CPAN shells can run on any operating system where Perl 5 runs.

One strong benefit of the existing CPAN shells is that they run distribution test suites before installation by default, refusing to install when test failures occur. This provides strong pressure to review, report, and fix test failures; the focus is on quality by default.

Active CPAN developers know when and how to report bugs, how to read CPAN Testers reports, and how to force installations. They may know how to use the BackPAN or to use an earlier version of a dependency.

This brings up a subtler feature of the CPAN which optimizes the experience for active CPAN developers: you always get the newest version of a distribution. While a PAUSE/CPAN shell hack allows developers to upload a development version which people cannot install accidentally, there's little ability to specify in dependencies that you want users to install a specific version of a dependency. One accidental upload in any of a dozen distributions could render half of the distributions on the CPAN uninstallable.

In some ways, this feature creates and exacerbates a problem. It can be difficult to bundle a distribution and all of its dependencies as the dependency graph can change during the bundling process.

A CPAN for Normal Users

What would a CPAN look like for normal users? ActiveState's PPM isn't a bad model in some ways, though it hews too closely to the CPAN itself in others. Binary repositories for Linux distributions have other advantages. I can think of several attributes of a CPAN enhancement for non-developers:

Binary distributions, or at least not requiring the presence of a C or C++ compiler and make utilities. This could be optional.
Run the tests on installation for verification and reporting purposes. This could also be optional, but I like the quality-by-default approach.
Bundling a distribution and all of its dependencies into a single, installable package.
Automatic relocation (perhaps through the use of local::lib or something similar) to allow multiple versions of a single distribution installed and usable.
Regular, tested updates to bundles and the contained dependency graphs.
Working with upstream.
Integration with OS packages.

The latter two I have no good ideas how to accomplish. Working with upstream can be difficult in the normal case; not everyone looks at CPAN Testers reports or the CPAN's RT or other CPAN extensions. Building OS packages seems like a lot of trouble and a lot of duplicate work.

Even so, the Perl 5 ecosystem already has most of the tools necessary to build such a thing. We can build a dependency graph for most CPAN distributions, and we can identify those without accurate graphs. We can calculate the likelihood of tests passing on various Perl 5 versions and platforms given that graph. It only takes a little bit of code to bundle most graphs into a dependency-first installable bundle, and a small loader module could set @INC paths appropriately.

Given a list of dependencies, it's possible to analyze the potential graphs for solutions and identify potential points of conflict or failure. If solutions exist, the software could create an installable bundle. Source code is the easiest, but a binary is possible.

It's also possible to keep these graphs and bundles up to date, with a lag of a few hours to a couple of days. Though calculating the possible solutions from a graph may be expensive, most of the information is cacheable.

Would people use such a system? I don't know. Should it replace elements of the current CPAN system? Never; it addresses a different purpose. Is it worth building? The idea continues to tickle my mind.

2 Comments

jawnsy | August 30, 2009 2:49 PM

Hi:

This is a very interesting article indeed.

I should mention I'm a CPAN developer and also a member of the Debian Perl Group, so I see different sides of the issue. I would have to agree with your assessment of what CPAN is and how it works.

I should mention that the indexing is beneficial for us since we have a program that tracks versions of packages available in CPAN, so we can make sure the packages we have in Debian are the newest version. Without this, it would be much more bandwidth-intensive and time-consuming to probe web sites (possibly many of them, at different addresses, with totally different infrastructures). With CPAN, we have a simple permalink we can expect for all the packages: http://search.cpan.org/dist/Package-Name. That also allows us to pick up the newest available version of things, irrespective of the person that uploaded it, which is really nice (especially for the big projects like Catalyst that have many co-maintainers).

However, unlike the CPAN Shell, we don't run the tests or build from source on end users' machines. Once we upload a Perl module to Debian (or actually, any package), our build daemons build them and make them available to users of those systems. What that means is everything is compiled for the target architecture and made available to users as binaries. In general, users don't go through the hassle of building from source.

In at least the pkg-perl group and hopefully with most of the Perl modules packaged for Debian, we run all of the tests during build on the build daemons, which automatically notify us if tests fail. Then we have the option of patching the issue or forwarding a bug up to the CPAN author; though usually, since the failures are usually simple cases, we fix it first and provide a patch upstream.

I'm not sure what integration with OS tools you're looking to do. It might be difficult to do, since, while I understand how the Debian system works (and by extension the Debian variants), I really would be lost on Red Hat or Mandriva or any other system. CPAN::Dist is an interesting idea there.

I think ideally I'd just like to see more coordination between CPAN authors (especially the major ones) and the various operating systems. Myself, Jonathan Yu (jawnsy; FREQUENCY@PAUSE), Ryan Niebur (Ryan52; RSN@PAUSE), and Jeremiah Foster (jeremiah; JEREMIAH@PAUSE) are all members of the Debian Perl Group and also CPAN, so we try to act as liasions between the major CPAN authors (especially the toolchain authors).

notbenh.myopenid.com | August 30, 2009 7:36 PM

To answer the larger questions, should this system exist, YES. Would I use it directly, probably not. A 'trusted' CPAN is for sysadmins maintaining many boxes, just as you pointed out that the existing CPAN is for me, though, if done right I wouldn't know the difference.

As for the subquestions:

> how to solve upstream: ...leverage something like git. Fold the backPAN in to current, tag every release. You now can pick specific versions with ease, single point of access, thus your upstream issues are mostly resolved (you still have to pull). Though for this to 'work' you could only write to this via PAUSE, so it's just altering the CPAN's storage layer. It would just automate what I've done a few times now:
download from CPAN -> git init -> fix -> patch to author

> OS package integration:

I don't know if thats a problem that you can/should solve, as you start to get in to other political arguments. But given that you would be giving all the data needed, it should be easy for any package maintainer to be able to decide what to do when a new version comes along as you would have likely already run the tests for them and 'proven' that it will work, so it leaves it up to them if there willing to take on the licence restrictions or not and package it up.

Who's It For?

A CPAN for Normal Users

Tags:

2 Comments

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry