Jan
11
2008
After a long day of flying, I got to Philly about an hour late (thanks to some snow at the St. Paul airport) and found taxis to be in short supply. So, I hired a driver and on the drive over, had a great conversation. My favorite part came when we chatted about our days. Apparently, he’d been driving librarians to their hotels from airports all day — and apparently, they were starting to get on his nerves a little bit. Driving into downtown, he announces that there are librarians, hordes of them here for some librarian meeting. I’d mentioned that that probably explained why hotels were so expensive. I’ve been in Philly 3 times in the last few months, and I’m paying close to 40 dollars more a night than on any of those trips (and I’ve stayed in the same hotel each time). He’d said that was probably the case. Anyway, he goes on to tell me about how boring and cheap these swarming librarians are, telling me a few stories about people he’d taken into town today. He’d said he and the other drivers had been playing spot the librarian as people got off planes and started looking for rides to their destinations.
Anyway, after a day of flying, I found his stories hilarious. But it is interesting how folks see us. I don’t work in an area of the library where I deal with the general public. My public face is the face people see online, so I always get a kick out of how other people see librarians — especially when we start traveling in packs.
–TR
3 comments | posted in Travel
Jan
7
2008

I’ll only have one more post about this (when I actually have books in hand), but Kyle Banerjee and I spent the last year writing a book about building digital libraries. While I yet to have the proof in hand, I did get a call from our publisher letting us know that the book is finished, returned from the printers and available for shipping. A quick check of Amazon shows that to be true.
For those interested, here’s a link to amazon: http://www.amazon.com/Building-Digital-Libraries-How-do/dp/1555706177/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1199662294&sr=8-1.
–TR
3 comments | posted in Book, Family
Jan
2
2008
I’ve been thinking a little bit about some of the things that I use MarcEdit for and have been pushing some of this work off my desk to some of the staff in our technical services department. We actually use MarcEdit quite a bit when it comes to sharing metadata from our Dspace instance with other systems, like OCLC’s WorldCat and our online Catalog. For example, we use MarcEdit to automatically generate MARC21 records for our theses submitted through Dspace. The process seems to work fairly well, and has been very easy for our staff to learn. Should write an article documenting this process and how its working at OSU at some point.
To that end, I’m writing a plug-in for MarcEdit that may enable me to mainstream the processing of web page archiving in Dspace. At this point, the process is a bit too manual for my tastes. Along with spidering a site (using whatever the chosen depth may be), there is this pesky manual step of flattening the site and making the urls relative. Not a big deal (unless there are file name collisions [which there always are] when reading depths), but it takes time. So, I spent some time this afternoon and wrote a threaded web crawler. Seems to work well. At this point, I just need to add the logic to flatten all paths, and come up with a naming schema to re-write all urls to provide unique file names. Once I get that down, building the batch import package for Dspace should be fairly trivial. Not sure how much time I’ll have to work on this over the week/weekend, but would be a pretty cool project to finish I think. It would certainly allow the library to provide site archiving as a dspace option (at this point, its only done under very special circumstances) and should simplify the process enough to the point that it could probably become a mainstream process.
Anyway, if I do get a chance to get this finished, I’ll certainly make it available as a plug-in (with source). Of course, if someone has already developed a simplified process that requires no manual processing after harvest, I would love to hear it.
–TR
1 comment | posted in C#, Dspace, MarcEdit
Jan
1
2008
Just a small update. I’d run into a case where I needed to download some data being provided via OAI in a non-Library specific metadata schema. While the MarcEdit OAI Harvester allows for the definition of your own crosswalks, it limits the types of metadata_Prefixes that can be sent via the harvester (by defining 4 common types). I have some command-line tools that I generally use for something like this — but this time I just decided to make this work with MarcEdit. So, the oai harvester now has the following new functionality:
- The drop-down box that provides the defined metadata types can be augmented by simply typing into the text box. This way you can harvest any metadata type provided via an OAI server. So for example:
This would be an example of how this works. This is an example of downloading Picture Australia metadata from one of their participating institutions. The metadata itself is just Dublin core with two additionally defined values — but without this change, you would not have been able to harvest this particular metadata prefix. Once the metadata prefix is set, the rest of the harvest works as before. The Crosswalk path defines the path to an XSLT the translates the requested metadata to MARC21XML.
- Ability to download the raw OAI metadata files themselves (without translating the data to MARC). Clicking on the Advanced Settings link expands the dialog showing a new checkbox called “Harvest Raw Data (save OAI data to local file system)”:
When this checkbox is checked, the Crosswalk Path text box behavior changes to the following:
The text box behavior changes to expecting a file directory (to save the files) rather than a crosswalk path. The program will save files numerically (i.e., 0.xml, 1.xml, 2.xml).
- The last change to the program is cosmetic. For those that have used this function, you will see an Advanced Settings link. I wanted to clean up the interface a bit so that the program only shows common options by default — but still makes it easy for users to utilize advanced OAI harvesting functionality like Getting individual records by identifier, starting the process at a specific ResumptionToken as well as setting start and end (from and Until) options. Hopefully, this will make the layout a little easier on the eyes.
The update can be found at: MarcEdit51_Setup.exe
–TR
no comments