Mar
3
2009
I wrote this up some time ago, but I still occasionally get questions about it (in fact, I got one today, hence this note). The Project Gutenberg (PG) provides its metadata for download in RDF format on it’s website at: http://www.gutenberg.org/feeds/catalog.rdf.zip. I wrote an XSLT transformation for this data (fairly basic) when I was visiting the Internet Archive last year, and posted it here: http://oregonstate.edu/~reeset/marcedit/html/downloads.html (or directly, at: Project Gutenberg RDF = MARC).
Running the RDF records through MarcEdit using this stylesheet produce the following MARC21 recordset: http://oregonstate.edu/~reeset/marcedit/anonymous/catalog.zip. The process is really a straightforward one. You download the above XSLT stylesheet, register it with MarcEdit and then you can be off on your merry way translating data to your heart’s content. Of course, occasionally, folks ask about translating into other metadata formats, and that’s cool too. If you can work with the API, you can do this in one step. However, if you plan on using the MarcEdit UI, you need to do it in two.
- Setup a PG => MARCXML translation (the xslt stylesheet above will do that)
- Create or use one of MarcEdit’s provided MARCXML => [format] stylesheets to complete the translation.
So, it’s really a two step process. While many of the YouTube videos that I’ve uploaded in the past few days cover parts of this process, I decided to upload on final video on the topic that demonstrates this process (processing the PG data into both MARC and MODS3) as a reference case for future users looking to do something similar. Hopefully this will help. You can find the video here:
http://www.youtube.com/watch?v=1zHKIJ6D_dA
Cheers,
–TR
no comments
Dec
26
2008
My wife got me for this Christmas. I love it.
–TR
3 comments
Dec
18
2008
In LibraryFind 0.9, the UI has been enhanced to include a number of small niceties (things that have been flowing into the 0.8.x branch — but are enhanced in 0.9). Things that you will find in LibraryFind 0.9 is Cover Art (which can be pulled from Open Library, Google or Library Thing [Open Library is the default]) as well as links to full text documents currently available in Google and Open Library. It’s all part of making it easier for users to get access to full-text, online resources.
(Screenshot with coverart):
–TR
no comments
Dec
16
2008
One of the main UI complains with the 0.8.x branch of LibraryFind relates to what happens after the user searches. Since LibraryFind must collocate all the search results together before display, what happens as the user waits (or doesn’t happen) has been an admitted weakness of the program. That will change in 0.9. In 0.9, users will see what’s being queried, number of results, and have the option to kill the search at any time and view the results that have already been found. One additional benefit of this method has been in terms of thread lock. In 0.8.x, a single mongrel instance is locked until a query is completed. This is because the interface never refreshes, but waits for the query to complete. In 0.9, the UI uses micro-queries, hitting the server for short periods of times, allowing mongrels to answer a quick request and then move on. A quick comparison: In 0.8.x, a single query typically takes approximately takes 6-10 seconds to complete (depending on # of targets queried). In the previous model, a single mongrel thread is locked for the entire duration of this query. In 0.9, the average query on the server is 0.005 seconds. While the 0.9 interface makes more queries to the server, these are separate queries, allowing the server to better manage how its pack of mongrels are used and eliminating the thread locking that would take place during queries.
While the UI may change slightly as I finalize the 0.9 interface, here’s a snapshot of what this looks like now:
If you have questions about this, or other LibraryFind development, just give me a holler.
–TR
no comments
Dec
15
2008
The boys got me out of the house this weekend and we picked up our tree. They’d been itching to get the presents laid out. Of course, the other person happy to see the tree is the cat. He seems to believe that we brought this tree in the house just for him, because he’s been climbing all over the tree (though, he hasn’t taken any of the ornaments off yet).
–TR
(he’s hard to see, but the cat is in here)
no comments
Dec
15
2008
I know that its a little early, but the boys are excited. Today, we got a little bit of snow (enough that the boys get off of school tomorrow) and they say that it won’t get above freezing all week — so it maybe sticking around.
I’ve never, ever had a white Christmas, so we are all crossing our fingers.
–TR
1 comment
Dec
2
2008
So, over this past year, I’ve watched the Beaver’s Football team go from awful (Stanford and Penn State) to great (USC) to inspired as they ran through the Pac-10. In fact, like many season ticket holders, I was making my Rose Bowl plans because, well, all we had to do was beat a very flawed Oregon team in the Civil War — which…didn’t happen. Instead, it was 3 1/2 hours of pain as the Beavers teased me throughout the game before ultimately losing. So sad.
Well, I have a friend that thought I needed some cheering up. So, he sent me this joke. Here it is…
How do you make Beaver Cookies?
Beat well for 3 1/2 hours and then take out of big bowl.
Funny — well, no.
–TR
1 comment
Nov
4
2008
Changes, changes. One thing that is always true at an academic institution is that things just don’t stand still very long. Jeremy Frumkin, my good friend and partner in crime here at OSU for the past 5+ years, has moved on to the University of Arizona, leaving some very big shoes for the library to fill. I’m going to miss him (and his family) quite a bit (though don’t tell him — I don’t want this going to his head
).
And myself, well, my role at the OSU Libraries is changing too. As of today, I’m no longer the Digital Production Unit head at Oregon State University…rather, I’ve accepted the position as the Gray Chair for Innovative Library Services. This move for me is an exciting one because of the real opportunity that this position has to help mold the future directions at OSU libraries. In this new role with the library, my job will be to help the library identify new services and technologies that can help the library continue to meet the growing needs of our patron communities. Part of that, will be taking a closer look at how the library is currently doing preservation and data curation, as well as the overall digital library infrastructure here at OSU. Exciting times, indeed.
–TR
6 comments
Oct
19
2008
A friend of mine gave this to me the other day — it’s an old picture of Kenny, probably around when he is 14 or 15 months, so we would have still been in Eugene at the time. To see him now, you’d have to wonder where all that blonde hair has went to.
–TR
no comments
Oct
10
2008
As all good things must come to an end, so too has the ReadEx Digital Institute closed for another year. Like always, this symposium offers an opportunity to connect with colleagues from various institutions and professional backgrounds to discuss some of the issues that we wrestle with as we work on developing and managing our digital collections. There were a number of sessions throughout the 2 days, and below are some brief (and sometimes not so brief) thoughts on each (in addition to some of the fun stuff I did in between).
Day 1:
- David Seaman, Dartmouth College
From Ponderous Perfection to the Perpetual Beta: Library Services in an Age of Superabundant information
Aside from the fact that I don’t like this image or idea of perpetual beta (and I’ve mentioned why here: http://oregonstate.edu/~reeset/blog/archives/566), David talked about a process that Dartmouth College when through this year to determine how the library could become much more nimble within our current information landscape. Like many organizations, the administration at Dartmouth were feeling like they were falling farther and farther behind as they struggled to build systems and build digital collections to meet the needs of their current users. As David put it, the results of the process really confirmed for them something that they already knew — the staff at Dartmouth would like to take more chances and move out a little more towards the bleeding edge, but that they felt like they needed a license from the administration, from their patrons, to take those chances. And through this process, David hopes that this is comes out of this year-long assessment. That staff at his institution can move past the paralysis of perfection and start to build patron services that are developed using a very iterative approach that allows the library to move very quickly in moving new services and improvements to services to patrons.
From David’s talk, I asked a question that I would later give the answer in my mind, as to what the library community’s big win is — the thing that allows the community to cover our sins as we start to put out new services that might not be perfect. A number of technical commentators discuss this concept when considering this dilution of “beta” development in considering Google and Microsoft’s development (they really are quite similar). Google has it’s search/advertising functionality, Microsoft has Windows and Office — but what’s the library communities? I’ll let you consider this for yourself and let you know what I believe that it is later below.
- Meg Bellinger, Yale University
The Collections Collaborative: Putting content into the flow
Meg’s talk primarily centered on Yale and some new directions that they were moving to continue to grow their digital presence and brand. In part, a good deal of this work will be over seen by Meg, who discussed a new department that has been developed to support digital content.
- Paul Duguid, University of California, Berkeley
The World according to Grep: Seeing text through the search box
I’ll admit, this was a great talk in which Paul Duguid, a faculty member at the ISchool in Berkeley discussed the ways in which the search box shapes our world view. His talk touched on a number of topics, include Google, Google Books, Itunes, etc.
- Steven Daniel, ReadEx
The Serial Set goes to the movies: Movie Screenplays and the 1912 Senate Titanic Hearings
I’ve had the opportunity to see Steven talk a number of times and I always enjoy his musing. Essentially, Steven is an expert when it comes to finding information within the serial set and each year, he mines the serial set and comes up with topics to discuss at this symposium. This year, he discussed the Senate Hearings of the Titanic. Within the serials set, these hearing records exist and apparently, through time, as Hollywood has created movies on the Titanic, they have used the descriptions, accounts, etc. as the basis for their movies. It was interesting, as many of these movies take a different look at how these events played out, but the account descriptions, dialog, etc. varies little during the key moments of the tragedy.
Day 2:
- Henry Snyder, University of California, Riverside
Libraries and digitization: forced marriage, marriage of convenience or love match?
Henry is someone that describes himself as semi-retired, but I can only hope that at his age, I’ll still be as deeply informed and involved as he is. Henry has spoken at the Digital Institute before, and this year, he provided a thought provoking talk that spanned the history of digitization in libraries — much of which he seen and participated in during his lifetime.
- Ray Siemen, University of Victoria
A digital humanities approach to understanding the electric book
This year was the year of the instructor. Ray was the third teaching faculty member that was asked to speak at the symposium, and he spoke specifically about the rise of the electric book and how that impacts the humanities researcher. In some sense, he was the perfect follow-up to Henry Snyder’s talk, as he picked up where Henry left off to discuss research and study that is currently ongoing in Canada related to digital humanities.
- Grant Barrett, The Double-Tongued Dictionary
Research Techniques in Digital Text: Beyond “nifty” and on to “useful”
Grant’s talk was unique in that it was one of a user of our digital resources. Grant is an independent researcher and uses the digital resources that university, etc. make available to inform his work. What made his talk unique really was his perspective.
- Terry Reese, Oregon State University
[Title not important]
The title of my talk really wasn’t that important because, well, it was the title of my talk prior to re-writing it the night before. After the first day of talks, I was left with a real need to continue the conversation that David started and answer my earlier question regarding the library community’s big win and what that means for the digital library community. Essentially, my talk was a little scatter shot I thought, with the first half focusing on some analysis that I’ve been doing over the past 6 months regarding who actually uses our digital collections. I think that in general, we understand that the materials that we digitize are now available globally, so some global users are accessing them, but I think that most people believe that the largest users of their digital collections are their traditional user community. However, that’s not necessarily the case (or I think, won’t be the case as time moves forward). Watching the digital access for our own institutional repository revealed that a growing number of invisible users, users with no relationship to the University, our local community, or even the state of Oregon were making use of the content stored in the IR. For months, Oregon users would make up about a 1/4 or so of the total users, with the second highest user groups coming from outside the United States like Canada and India. When taken in total, usage from outside the United States was often times greater than inside the United States (granted, we are now talking about a very large area to a very small area proportionately, but this still surprised me) and that users found the materials in our IR at nearly 90% using either a search engine or a direct link from another resource. The implications of this is that very quickly, our primary users for our digital collections could cease to be those that we traditionally develop for, our campus or local community. And how does this affect overall development of services, collections that are selected for digitization, etc. Interesting questions to ponder.
And so I asked what does this mean for our development. Well, originally, I’d planned on talking about search. To a large degree, I think search is pretty easy. Search across resources, domains, etc. Discovery is hard. Actually helping users find the content that they are looking for and sorting through the noise for the patron is the hard part. But this is some of the work that we are doing with LibraryFind, so I figure I had some things I could say. But after two days of talks, much of the conversation had settled on Search and how it must be made better and I think we ignored the less sexy, but actually more important problem with digital collections that really has no answer at this point, and that being preservation. Above, when I talked about the library’s big win, our “product” that forgives the multitude of sins — well, I believe that it’s content, or more precisely, the preservation of content. Because of libraries, content that would have been lost to time continues to exist throughout the world. Libraries preserve content and we actively seek out content to acquire and preserve for future generations because we think in Library time, or perpetual preservation of materials. This, I think, is the big win. However, as I look at the current state of preservation services (both enterprise and academic) I think that we live in a time where more information could simply disappear than ever before. The digital collections that we build have profound impacts on the world around us, but the irony is that they are created on media that has a shelf life of only a couple years. From the moment a document is born digital, it begins to rot. And while we have been able to design systems that allow for byte level preservation (in many cases), we (and by we, I mean everyone, library or no) will continue to struggle with the ability to migrate binary content forward as technologies and standards change. Preservation is hard for a number of reasons, cost and resources being just two of the variables that we can only kind of control and plan for. As time goes forward, organizations will spend millions of dollars just to keep the disks spinning paying for just energy/cooling costs and replacement media costs. A big problem. But one that I would argue isn’t an just an institutional one, but a cultural one as well. I think that digital preservation will have to take on many of the same characteristics of preservation of our analog materials — i.e., be done cooperatively within the library community.
So folks don’t believe that this entire trip was work, work, work — I can say that prior to the conference, I traveled up to Middlebury College to visit a good friend there, do some hiking and drive around NY. I posted a few comments on that part of my trip at: http://oregonstate.edu/~reeset/blog/archives/564
–TR
1 comment