So I wrote these for a guy who wanted help cataloging a small collection: I'd be farther along on the real work if I had just done it, but it was a nice distraction.
The idea is that you have a proxy server running on your local machine, and when you find the marc data that you want it parses it out of the html for you and puts it into your database. There are two versions here:
Some caveats: This has been working well for the guy I wrote it for, but I haven't gotten feedback from anyone else. I'm on an old mac, so this is a non-forking proxy server, so you probably want to turn off graphics so that it's not any more bogged down than necessary. Finally, I've tuned the regexps for the Library of Congress catalog: if they change, or you want to use another catalog to get your records, you'll need to change the expressions.
The user interaction of both scripts is the same. Once they're started you should see this message:
In your browser please set proxy server to: http://[your IP address]:2512/ and then go to http://0.0.0.0/
$debug set, the results of all the HTTP transactions will be sent to STDERR. If there are errors they should show up in the window for 0.0.0.0. At the very least there will be an error the first time you run it, unless you've already set the correct path for your output file by changing the value that
$outputpath is initialized to at the start of the program (or
$dbpath if you're playing with FMproxy.pl).
Let me know if this is useful, or if there are problems or changes that should be made, or if there is a better place to put these than here. Thanks.
Chuck McCallum, mccallucATyahooDOTcom.