Mar 24 2007

Roy Tennant vs RDA (and AutoCat) :)

Ah, what fun.  Working in Technical Services, I tend to lurk on the AutoCat list to keep up and get an idea of what folks are chatting about there.  Normally the conversation is on traditional cataloging issues, but Roy’s latest musings in Library Journal (“Will RDA Be DOA”, url: http://libraryjournal.com/article/CA6422278.html) seems to have raised peoples hackles.  Predictably, catalogers were offended by the article, in part I think, because much of the blame for how our ILS systems currently function unfairly seems to fall at their feet.  This is unfortunate, because I think that Roy’s point has gotten lost in the current discussion on the list – that being that our current bibliographic frameworks are not sufficient for meeting future needs.  But I think more explanation is needed here since many people will read this statement and read into it that I’ve just said that MARC, AACR2 and the people that use them suck — which isn’t the case.   Rather, it represents a need to look at our current bibliographic frameworks (MARC, AACR2 and RDA) and evaluate not how they are working for us today (or yesterday) – but if they will meet our needs in a future where the library community and its data have become less isolated from the rest of the world.  We live in a changing information ecosystem — and libraries need to change with it.  While the retirement of MARC and AACR2 may be the eventual end-game, I doubt those that create such records would really see a big difference in what they do.  In fact, I should point out, to some degree this is already occurring.  Folks that catalog using OCLC’s Connexion client are already cataloging in XML.  The client saves data in XML templates — transmits data in XML — but generates MARC records for export.  So I certainly could envision a future where MARC has been replaced by something else, but where current catalogers simply describe things as they always have.  Anyway…

So what do I mean when I say that our current bibliographic frameworks are not sufficient for meeting future needs?  Well, lets talk about this in terms of AACR2, RDA and MARC.  There are two glaring issues as they relate to our current bibliographic frameworks — and I’m not certain how we solve this issue until we, as a community, move from MARC to something else.  I’ll also note that I don’t hear many people talking about them, which I think is too bad because I think that they are issues that cataloger may relate better to.  Generally, this conversation regarding bibliographic frameworks is framed in relation to what systems folks or coders don’t believe MARC can do.  Sometimes they’re right, sometimes there wrong, most of the time, they are running into real-world implementation of a framework that is constantly in a state of flux, being interpreted by different individuals.  However, in many ways, I think that this line of conversation is fruitless.  I’d like to focus my discussion on two issues that I run into helping MARC users around the globe.

  1. MARC doesn’t interoperate in its current form.  What do I mean?  Well, during the current thread, Roy had discussed a need to isolate full-text materials within his catalog.  AutoCat’rs quickly noted that this information can, if encoded correctly, be inferred from the 856 field — which encodes the URL.  Well, no.  In MARC21 when utilizing AACR2, the 856 field encodes the URL information.  However, this is different in CHINMARC, FINMARC, UNIMARC, etc.  The point is, MARC has lots of flavors spanning many different charactersets.  Having created MarcEdit, I’ve gotten the opportunity to work with catalogers around the world and I can tell you without hesitation that MARC flavors do not play well together.  It’s a struggle because OCLC, Library of Congress, they allow our profession to have a very North American focus (which I know RDA is hoping to overcome) but as long as flavors of MARC exist, so to will the cataloging community continue to be splintered.  Believe it or not, OCLC represents only a small part of the current MARC records being created and not everyone uses the Library of Congress as their gold standard.  MARC21 uses MARC8 and UTF8, but I work with a number of folks in Asia where they use Big5 or others — making these records completely incompatible with MARC21 records.  This is one of the benefits of a metadata schema like MODS — title, etc. are placed in the same place, no matter what descriptive rules are applied to the framework.  Users many use different punctuation rules, etc., but the data will be the same.  This isn’t currently the case when dealing with MARC.
  2. MARC, AACR2 and I believe RDA continue to isolate our community.  Who else uses MARC?  Anyone?  While AACR2, MARC, etc. have served our communities for a number of years (~40), it might be time to put this pony out to stud and develop a framework and metadata schema that will allow the library community to leverage mindshare from outside our small community.  Currently, the tools and professional vision of our profession continues to be shaped by a small number of vendors providing solutions for our MARC data.  If the library community adopted MODS or a variety of metadata schemas (for example, FGDC for cartographic materials, MODS for books and serials, etc).  then while our bibliographic frameworks would still be our own and library centric — the ability to build tools for, integrate data with — would expand beyond our little community.  The global IT community speaks XML, not MARC.  It’s time we join the rest of the world in this regard.

While I realize that agreeing with Roy, even partly, may cause me to forfeit my secret technical service decoder ring :) , but I think that at some point, this is a conversation that the technical services community needs to seriously have.  I know, I know — we are having this conversation now the RDA.  Well no, because RDA allowed our current bibliographic framework to be part of the discussion and to some degree guide decisions.  We need to have this conversation without considering what we are doing now.  For me, the biggest concern that I have with our current bibliographic frameworks is the way in which it isolates our community.  There are a number of very bright people working in libraries — but imagine what our community could do if we could tap into the mindshare outside our little community and leverage open source projects directly — without having to first take our data out of MARC and into something like MARC21XML, MODS, etc.  As someone doing some of this work, I find it telling that the first step to designing any system around data currently in MARC, is that I have to take the data out of MARC, correct it for inconsistencies, massage it to make it more straightforward — just so that the information is useful within non-library systems. 

–TR


Feb 16 2007

Mainstreaming R&D

I’ve become more and more convened over the past year talking to directors that for OSS development to be accepted as a part of the library community, it’s going to have to become a mainstream service.  Too much R&D in libraries is done as part of an individual, student or demo project.  To a large degree, front-line workers and developers within the library community have a healthy bent towards OSS.  But organizational attitudes change slower and these are the ones that tend to matter.  So, I’m going to be taking a different course over this next year — at least within my own small part of the world. 

Over the past three months, I’ve been leading a group looking at next generation ILS services for our regional consortia, Summit.  Summit is a consortia made up of 33 academic libraries throughout Oregon and Washington — with all system’s being Innovative.  This is due to the fact that III’s consortia software really only works with III libraries.  In looking at the various options available — we’ve tried to keep an open mind.  I’ve been running copies of Koha and Evergreen over the past month to look at current functionality within a very untraditional consortial setting, folks have spoken to vendors like Endeca, Aquabrowser, III and OCLC as well as others.  In all, the process has showed me a couple of things.

  1. Given that this decision will be just on the consortia database, our options are somewhat limited.  III doesn’t make the process of having an outside vendor interact with the Innreach system easy — though we’ve been told it could be done.  This means that we migrate off III as a group (can’t see that happening), partnering with III (what I think many would consider to be the safer, least disruptive choice), working closely with OCLC –  though the second and third options don’t hold much appeal to me personally. 
  2. Which leads me to number 2 — while the consortia has more than enough talent to develop an inhouse solution — the organization infrastructure simply doesn’t exist to allow such a solution to be considered. 

The second realization is what struck me most.  I spend a great deal of my time helping folks within the Pacific NW implement tools around their ILS — but there really isn’t a centralized or formalized R&D process within the consortia — and for a group this large, that seems to be a shame.  There is a lot of talent tied up within the 33 member organizations, the question is how to get at it.

Well, I’ve got an idea.  While my group really cannot make a recommendation related to the current software available (we can talk about what’s available and what I believe to be the future trends) — I can advise that we formalize an R&D group within the consortia.  Fortunately, Summit is hiring a digital library coordinator — and I think that this position would be perfect to lead this group.  I envision a committee that could be used to:

  1. coordinate Summit development efforts and investigate options like SOPAC, metasearching within a consortial environment, OpenURL within a consortial environment, etc.
  2. provide Summit with shared development resources — allowing member libraries to help drive development of services, while distributing the R&D between member libraries
  3. advocating for OSS and an active R&D agenda to the member libraries directors and the Summit executive board.

In all honesty, I think #3 is the most important.  The proprietary vendor community is very adapt at dealing with the library community at a high level, and this allows them to shape the overall environment within the organization.  My hope is that by creating a formal working group within the consortia and identifying that this is indeed important — and help to lead to an attitude shift within the Pacific Northwest. 

Will it work?  Who knows.  I’ve floated the idea by a few folks — some on other committees, some familiar with the current makeup of the executive committee, and the overall mood isn’t optimistic.  The biggest challenge to overcome is this idea that one’s library doesn’t have any special skills to offer (or any bodies to offer).  If R&D is valued at an organization — resources and people can be found. 

Anyway, my hope is that the recommendations that come out of this study will help to move this conversation forward.  As I said — there is a lot of talent in the Pacific Northwest — its time we started tapping into as a group and seeing what can be accomplished within a consortia when everyone contributes.

–TR


Feb 11 2007

Can the open source community help the ILS matter?

So, let’s start out with a preface to my comments here.  First, it’s a little on the long side.  Sorry.  I got a bit wordy and occasionally wonder a little bit here and there :).  Second — these reflect my opinions and observations.  So with that out of the way… 

This question comes from two experiences recently.  First, at Midwinter in Seattle, a number of OSU folks and myself met with Innovative Interfaces regarding Encore (III’s “next generation” public interface in development) and the difficulty that we have accessing our data in real-time without buying additional software or access to the system (via access to API or in III’s case, access via a special XML Server).  The second meeting has been the current eXtensible Catalog meeting here in Rochester where I’ve been talking to a lot of folks that are currently looking at next generation library tools. 

Sitting here, listening to the XC project and other projects currently ongoing, I’m more convinced than ever that our public ILS, which was once the library communities most visible public success (i.e., getting our library catalogs online) — has become one of the library communities’ biggest liabilities — an albatross holding back our communities’ ability to innovate.  The ILS and how our patrons interact with the ILS shapes their view of the library.  The ILS, at least, the part of the system that we show to the public (or would like to show to the public – like web services, etc.) simply has failed to keep up with library patron or the library communities’ needs.  The internet and the ways in which our patrons interact with the internet have moved forward — while libraries have not.  Our patrons have become a savvy bunch.  They work with social systems to create communities of interest — often times, without even realizing it.  Users are driving the development and evolution of many services.  A perfect example to this has been Google Maps.  A service that in and of itself, isn’t too interesting in my opinion.  But what is interesting is the way in which the service has embraced user participation.  Google maps mashups liter the virtual world — to the point that the service (Google maps) has become a transparent part of the world that the user is creating.

So what does this have to do with libraries?  Libraries up to this point simply are not participating in the space that our users currently occupy.  Vendors, librarians — we are all trying to play catch-up in this space by brandishing about phrases like “next generation”, though I doubt anyone really knows what that means.  During one of my many conversations over the weekend, something that Andrew Pace said really stuck with me.  Libraries don’t need a next generation ILS; they need a current generation system.  Once we catch-up — then maybe we can start looking at ways to anticipate the needs of our community.  But until the library community creates a viable current generation system and catches-up, we will continue to fall further and further behind.

So how do we catch-up?  Is it with our vendors?  Certainly, I think that there is a path in which this could happen.  But it would take a tremendous shift in the current business models utilized by today’s ILS systems, but a shift that needs to occur.  Too many ILS systems make it very difficult for libraries to access their data outside of a few very specific points of access.  As an Innovative Interfaces library, our access points are limited based on the types of services we are willing to purchase from our vendor.  However, I don’t want to turn this specifically into a rant against the current state of ILS systems.  I’m not going to throw stones, because I live in a glass house that the library community created and has carefully cultivated to the present.  I think to a very large degree, the library community…no, I’ll qualify this, the decision makers within the library community — remember the time when moving to a vendor ILS meant better times for a library.  This was before my time — but I still hear decision makers within the library community apprehensive of library initiated development efforts because the community had “gone down that road” before when many organizations spun their own ILS systems and were then forced to maintain them over the long-term.  For these folks, moving away from a vendor controlled system would be analogous to going back to the dark ages.  The vendor ILS has become a security blanket for libraries — it’s the teddy bear that lets everyone sleep at night because we know that when we wake up, our ILS system will be running and if its not, there’s always someone else to call. 

With that said, our ILS vendors certainly aren’t doing libraries any favors.  NSIP, SRU/W, OpenSearch, web services – these are just a few standards that libraries could easily accommodate to standardize the flow of information into and out of the ILS, but find little support in the current vendor community.  RSS, for example, a simple protocol that now most IlS vendors support in one way or another, took years to finally be developed. 

Talking to an ILS vendor, I’d used the analogy that the ILS business closely resembles the PC business of the late 80’s, early 90’s when Microsoft made life difficult for 3rd-partly developers looking to build tools that competed against them.  Three anti-trust cases later (US, EU and Korean) and Microsoft is legally binded to produce specific documentation and protocols to allow 3rd-party developers the ability to compete on the same level as Microsoft themselves.  At which point, the vendor deftly noted that they have no such requirements, i.e., don’t hold your breath.  Until the ILS community is literately forced to provide standard access methods to data within their systems, I don’t foresee a scenario in which this will ever happen — at least in the next 10 years.  And why is that?  Why wouldn’t the vendor community want to enable the creation of a vibrant user community.  I’ll tell you — we are competitors now.  The upswing in open source development within libraryland has place the library community in the position of being competitors with our ILS vendors.  Dspace, Umlaut, LibraryFind, XC – these projects directly compete against products that our ILS vendors are currently developing or have developed.  We are encroaching into their space, and the more we encroach, the more difficult I predict our current systems will become to work with. 

A good example could be the Open source development of not one, but two main stream open source ILS products.  At this point in time, commercial vendors don’t have to worry about losing customers to open source projects like Koha and Evergreen, but this won’t always be the case.  And let me just say, this isn’t a knock against Evergreen or Koha.  I love both projects and am particularly infatuated with Evergreen right now – but the simple fact is that libraries have come to rely on our ILS systems (for better or worst) as acquisition systems, serial control systems, ERM systems — and with ILS vendors having little incentive to commoditize these functions.  This makes it makes it very difficult for an organization to simply move to or interact with another system.  For one, it’s expensive.  Fortunately, the industrious folks building Evergreen will get to the point where it will be a viable option and when it does, will the library community respond?  I hope so, but I wonder which large ACRL organization will have the courage to let go of their security blanket and make the move — maybe for the second time – to using an institutional supported ILS.  But get that first large organization with the courage to switch, and I think you’ll find a critical mass waiting and maybe, just maybe, it will finally breathe some competitive life into what has quickly become a very stale marketplace.  Of course, that assumes that the concept of an OPAC will still relevant – but that’s another post I guess.

Anyway, back to the meeting at Rochester.  Looking at the projects currently be described, there is an interesting characteristic of nearly all “next generation” opac projects.  All involve exporting the data out of their ILS.  Did you get that — the software that we are currently spending tens or even hundreds of thousands of dollars to do all kinds of magical things must be cut out of the equation when it comes to developing systems that interact with the public.  I think that this is the message that libraries and those making decisions about the ILS within libraries are missing.  A quick look around at folks recognized at creating current generation opacs (the list isn’t long) like NCState have one thing in common – the ILS has become more of an inventory management system, providing information relating to an item’s status, while the data itself is being moved outside of the ILS for indexing and display.

What worries me about current solutions being considered (like Endeca) is that they aren’t cheap and will not be available to every library.  NCState’s solution, for example, still requires NCState to have their ILS, as well as an Endeca license.  XC, an ambitious project with grand goals, may suffer from the same problem.  Even if the program is wildly successful and meets all its goals, implementers may still have a hard time selling their institutions on taking on a new project that likely won’t save the organization any money upfront.  XP partners will be required to provide money and time while still supporting their vendor systems.  What concerns me most about the current path that we are on is the potential to deepen already existing inequities that exist between libraries with funding and libraries without. 

But projects like XC, the preconference at Code4lib discussion Solr and Lucene — these are developments that should excite and encourage the library community.  As a community — we should continue to cultivate these types of projects and experimentation.  In part, because that’s what research organizations do — seek knowledge through research.  But also, to encourage the community to take a more active role when it comes to how our systems are developed and interact with our patrons.  

–TR 


Jan 21 2007

Translating ETDs from Dspace OAI to MARC

Ok — here’s the info. 
File: MidWinter 2007 ALCTS Presentation
So what’s included?  Zip file contains our custom XSLT that’s used in MarcEdit, the Macro that I use to clean data, and the ppt slides.  The XSLT file is a custom version of the default OAIDC translation found in MarcEdit.  Its customized to deal with the specific data that will be encountered within our ETD records.  If you wanted to use this XSLT for your own library — you will very likely need to make some small modifications — but it should get you started.  Anyway questions?  Send cookies :)

–TR


Dec 31 2006

CONTENTdm and OpenSearch

I love OpenSearch.  It’s been one of those things that I’ve been wanting to spend more time looking at — maybe incorporate into Dspace or some of our other services like LibraryFind (which actually is now on the todo list).  Anyway, folks may not know it, but Kyle Banerjee and I are writing a book.  A how to of sorts, for folks doing digital repositories.  I’ve been lights out for most of December cranking out 4 finished and 1 1/2 nearly completed chapters.  So far so good.  Well, one of the parts that of the book deals with exposing resources to larger audiences and a discussion of OpenSearch falls into that section.  As  I was looking through the specification this afternoon, I thought, wow, this would be easy to implement just about everywhere.   So I took a 1/2 hour, and quickly whipped up some code that integrated OpenSearch into CONTENTdm.  I’ll post the code shortly.  However, what I thought cool was the number of resources that have embraced OpenSearch as a query method.  IE7 for example, utilizes OpenSearch as the method for querying search providers.  This means that by adding an OpenSearch server to my Contentdm instance, I instantly am able to add this resource as a search target in IE 7.

 

 

As I mentioned, writing the code took about 30 minutes and was much easier than I’d anticipated.  Given the speed at which OpenSearch has caught on outside the library community (I was surprised at how many applications and services support it) and how simple it is to implement — I’m thinking that its almost crazy not to spend the time and integrate the protocol into our organization’s services if only to give developers outside the library community an straighter line for service integration.

 

–TR 


Dec 19 2006

O’Reilly Radar > Google Deprecates Their SOAP Search API

This was reported on the O’Reilly blog.  Apparently, Google is deprecating their SOAP API.  If you ask me — this is a terrible decision.  Look up the current sets of books on Google Hacks, etc. and what you find is an entire ecosystem designed around these API.  This is why developers have traditionally liked Google — its why I’ve traditionally liked working with Google.  But I’m not so sure now.  Here’s a like to the O’Reilly article. 

Link to O’Reilly Radar > Google Deprecates Their SOAP Search API

 

–TR


Dec 19 2006

Lost in Airport hell

I spend the weekend in DC as a panelist for an NEH grant.  It was a very interesting experience.  Having never been on an NEH grant panel before, I was somewhat amazed by the tremendous number of high quality research project currently being undertaken in the humanities.  If this is a representative sample of the work currently being done in the humanities then I’m very impressed.  Also, I was pretty impressed by the NEH process in general.  The panel itself was very organized and well run. 

Unfortunately, after my panel work I had little time to visit DC — something I generally enjoy doing whenever I get a chance to visit.  About the only thing I had time to do was visit the Christmas tree in front of the White House.  I have to admit — I love the Whitehouse.  I just can’t help be be a little giddy each time I see it (as well as snapping a picture or two) — and hopeful that’s something that won’t go away.  I’d hate to think that I was so cynical of my government that DC and its many memorials still didn’t stir that excitement. 

Anyway, this was my first time visiting DC around Christmas — so I made sure I took in the tree display and made sure I got many pictures so I could show Kenny the “big” Christmas tree and some of the DC buildings that we will be visiting this summer at ALA.

And finally, I got to hookup with a friend that I’d only meet through email for a late dinner.  He, his wife and a friend took me to a more traditional Moroccan resturant out around New York Ave. and 6th.  The name escapes me at the moment — but the food was fantastic.  It was a 7 course meal that one has to experience.  I’d definitely try it again (if I could find it :) ).

Coming home, I missed my flight for the first time, ever.  I had an 8:20 flight out of Reagan but I wasn’t too bright that morning and managed to miss my flight by, oh, 2 hours.  Fortunately (I thought), the American Airlines ticket agents were able to get me on a flight a 1/2 later to Dallas and from there, had given me a ticket to PDX.  All I had to do was go to the desk and get my seat assignment (at least that was how it was explained to me).  I didn’t find out till I was in Dallas that my ticket was actually a standby ticket and as I found every time I visit Dallas — every flight going to PDX was oversold.  I spend 9 hours in the airport — watching 2 flights leave before I finally had to call it a night.  I was told by an American Airlines agent that they would simply put me on standby for the first flight to PDX tomorrow.  Ugh — I looked and every direct flight to PDX was oversold (again).  So I found my way to a ticket counter and met a very nice agent name Gloria that got me a reservation in a room at the Holiday Inn (note to self — apparently all airlines have special arrangements with local hotels for just such occasions) and got me booked for a flight from Dallas, to San Jose to PDX.  Its going to be another long day, but at least I should be home in the afternoon rather than not at all.  So, I’m in a room in Dallas — my bag {shrug}, I think its in Portland (it was tagged through) so I hope that its around when I get to PDX tomorrow.

I still can’t wait to come back to DC for ALA.  This is one of my favorite cities to visit and I think that Kenny is going to love it — especially the Metro (lots of trains).   But mark my words — this will be the last time that I miss a flight (with it being my fault). 

–TR


Dec 17 2006

LibraryFind updates

Jeremy apparently did a podcast at CNI (http://connect.educause.edu/blog/mpasiewicz/an_interview_with_jeremy_frumkin?time=1166414496) where he discussed some of the work being done on LibraryFind.  One thing that’s he’s particularly happy about is what he’s calling the “look ahead” feature in LibraryFind.  Essentially, LibraryFind is both an OpenURL server and client allowing it to “look ahead” and resolve all citations before display.  This of course goes beyond simple OpenURL resolution (which is dependant on correct holdings data) — but actually resolves the links using OpenURL, DOI resolution, etc. to determine if the link is indeed reachable by the user.  A nifty feature for sure — though I’ll admit, I’m most pleased with the catching features in LibraryFind at this moment.  I’ll be curious to see if they are as effective as we see in testing — but I’m hopeful.

For those interested, I’ll be giving a 20 minute talk at this year’s code4lib about LibraryFind and possibly some of the experimentation that I’m doing at this point with the different LibraryFind tools.

 

–TR


Nov 3 2006

Google Sitemaps

I’d run across these a few weeks ago and thought they were pretty nifty. Essentially, I was looking for something that would allow Oregon State University’s CONTENTdm collections to be harvested by Google. Since CONTENTdm has an OAI interface, and Google’s Scholar supports OAI harvesting, I thought there must be an easy way to get this set up. Fortunately, Google’s Sitemap facility provides a method for this to happen. Using the OAI server as the sitemap — I was able to get Google to quickly harvest and index our CONTENTdm collections.

Information on the Google Sitemaps can be found on the Google Sitemaps Help documentation site.

[update]
Some folks have asked (like the comment below) — how this works. Well, I created a small script that replaces the oai.exe process in CONTENTdm, at least for Google’s purposes. The script basically just handles the OAI request. Here’s the simple code:

<?
header("Content-type: text/xml");
//print file_get_contents("http://digitalcollections.library.oregonstate.edu/cgi-bin/oai.exe?" . $_SERVER['QUERY_STRING']);
$handle = @fopen("http://digitalcollections.library.oregonstate.edu/cgi-bin/oai.exe?" . $_SERVER['QUERY_STRING'], "r");
if ($handle) {
   while (!feof($handle)) {
       $buffer = fgets($handle, 4096);
       echo $buffer;
   }
   fclose($handle);
}
?>

–Terry


Sep 29 2006

Adding user comments and tagging to CONTENTdm

At OSU, we’ve played with this on and off and finally decided to just take this live.  For those that use CONTENTdm, I’ve created a small document that discusses how this works and what it looks like.  As I said, simple implementation at this point, but if use takes off, I’ll look to add things like tag clouds, integrated search results, etc.  This won’t be interesting to anyone but folks using CONTENTdm.  Sorry.

Here’s a link to the document: CONTENTdm_Tagging.doc

 

–TR