The YAPC::NA 2012 call for presentations has opened! As with every YAPC I've attended, this is a great opportunity to meet other programmers, learn things you know better and don't know yet, and to practice your presentation skills.
A few months ago I exchanged emails with JT Smith about my idea for a talk this year. I've mentioned in passing a few times a small side project my business is investing in. It's a side project, deliberately minimal, and—from the development side—definitely the kind of skunkworks, just get it working, maintain it as little as possible and let it run uninterrupted software that you're likely to find.
That doesn't mean it's quick or dirty. That doesn't mean it's not tested well, or that it has a slapdash design. All it means is that the most important criterion for any design or implementation decision is "is this the simplest thing that could possibly work" instead of "is this elegant" or "what's the standard modern Perl orthodoxy for this problem".
So far the results have been enlightening.
I don't want to give away too many of the details of my talk (if it's accepted), but here are two small hints which may or may not help you.
First, just because a good ORM such as DBIx::Class makes searching and manipulating existing data easy doesn't make it the best way to insert big batches of new data.
Second, while LWP and especially WWW::Mechanize are great tools for automating the behavior of a web client, sometimes wget
or curl
in a shell script is quicker, easier to parallelize, and more robust.
(As a bonus, consider also that if you're parsing semi-structured data out of HTML that removing all of the HTML is sometimes even easier than using a real HTML parser or even CSS selectors. Sure, semantic markup helps when you can rely on it, and sure, using a regex to remove HTML tags is a bad idea, but there are ways to turn HTML into plain text quickly and easily without doing anything on your own.)