Mar 16 2009

An OA mandate for the OSU library faculty

The OSU Library faculty recently adopted an OA mandate, which is pretty cool.  You can read about it here: http://ir.library.oregonstate.edu/dspace/handle/1957/10850 and what a few others are saying about it:

I think that this is important on a number of levels.

  1. Symbolically, it’s important.  It’s very difficult for the library to go to faculty on campus and ask them to contribute content to the IR, when in fact, the Library faculty itself is not regularly submitting to the IR.  This changes that – and hopefully – will act as a catalysis for other departments on campus to follow the Library faculty’s lead.
  2. As tenured faculty, the research (both papers and presentations) our librarians generate represent an important contribution to the scholarly community. As researchers and scholars, preserving our content and making it freely accessible to future researchers is indeed one of our primary responsibilities as faculty.
  3. This was really a faculty initiated endeavor, that has a great back story, but I won’t include it here right now.  But suffice it to say, a good number of people at OSU deserve a lot of credit for making this happen, chief among those being Michael Boock and Janet Webster – who have worked tirelessly from the beginning to advertise, grow and advocate for the IR in the library.  And for the faculty as well, for stepping up and making this a reality. 
  4. Finally, it’s just one more example of that Beaver ingenuity and can do’edness.  :)

 

–TR


Nov 13 2008

A Guide for the perplexed: libraries and the Google Library Project Settlement

In case folks haven’t seen it, this came across my desk today.  It’s an analysis written by Jonathan Band, a copyright expert that does work with ARL.  The link to the document is here: http://www.arl.org/pp/ppcopyright/google/index.shtml

I think it’s definitely worth taking a look at.

–TR


Nov 3 2008

What would it look like if OCLC was broken up?

The other day, I posted what I seen as some very big concerns with OCLC’s revised policy (currently being reconsidered) on the transfer of records (two of which, I would consider deal breakers).  In this post, I made the argument that maybe it was time to consider breaking OCLC up to reflect what it has become — an organization with two distinct facets: a membership component and a vendor component.  This comment led to a conversation from someone at OCLC who questioned whether I honestly believed that the library community would be better off if OCLC was broken up and it was obvious from our conversation that on this point, we would simply need to agree to disagree.  As a side note, I think that these types of disagreements and conversations are actually really important to have.  I’m always nervous of communities or groups in which everyone agrees since it usually means that people either are not thinking critically or no really cares.  Secondly, I think that we all (OCLC and myself for that matter) want what’s best for the library community — we just have different visions of what that might be. 

Anyway, back to my topic.  Now, I’m going to preface this discussion by saying that this is obviously my own opinion and one that may not be shared by many people within the library community (I really have no idea).  Even within the library open source community, where I’m sure this opinion would be more prevalent (or at least entertained), I’m pretty sure I’m still in the minority.  But as I say, I think that these conversations are important to consider — specifically as we move down a path where OCLC is very quickly positioning themselves to become the library community’s default service provider for all things library (in terms of ILL, ILS interface, cataloging, etc.).

So when I talk about breaking up OCLC, exactly what am I’m talking about?  Well, in order to follow me down the path that I am going to take you, we have to talk about OCLC as I currently see them.  Watching OCLC during the 10 years (I can’t believe it’s actually been 10 years) that I have been in libraries, I have seen a quickening evolution of OCLC from strictly a member driven organization to more of a hybrid organization.  On the one hand, there is what many would consider the membership side of OCLC, that being WorldCat, ILL and their research and development office.  On the other hand, there is OCLC’s vendor arm…a good example of this would be WorldCat Local and WorldCat Navigator.  So how do I make these distinctions — membership services are those that I would consider core services.  These are services that OCLC has developed to add value to what OCLC likes to refer to as the Library Commons (WorldCat).  OCLC’s vendor services are those tools or programs that OCLC sells on top of the Library Commons, of which, I think WorldCat Local/Navigator is a good example.  Now I think that at this point, I know that folks at OCLC (and likely in the membership) would argue that both WorldCat Local/Navigator do provide services that the OCLC membership is currently requesting.  I won’t deny that — however, I would answer that the fact that OCLC treats the Library Commons (WorldCat) as it’s own closed personal community has the unintended affect of limiting the library community’s (and I include both commercial and non-commercial entities in my definition of community) ability to develop new service models.  In effect, we become much more dependent on how OCLC envisions the future of libraries.  Let me try and tease this out a little bit more…

Philosophically, the biggest problem that I have with the current situation is the commingling of OCLC’s treatment of the Commons (WorldCat) and their current strategy of being the sole commercial entity with the ability to interact with the Commons.  I’m a firm believer that the more diverse the landscape or ecology, the more likely that innovation will take place.  We’ve seen this time and time again both inside (Evergreen and Koha certainly have shaken up the traditional ILS market) and outside (web browsers are a good example of how competition breeds innovation) the library community.  However, by isolating the Commons, OCLC is threatening this diversity of thought.  Now, I have a whole set of different issues with the current library ILS community, but in this case, I think that OCLC’s treatment of the Commons, and their ability to leverage that service unfairly skews the ability for both commercial and non-commercial entities to provide innovative services on top of those Commons (and before anyone jumps on me for non-commercial use, let me finish my thoughts here).  Commercially, I’m fairly certain that the current crop of ILS vendors would very much like to provide their own WorldCat Local/Navigator interfaces to their customers, and I’m sure, would be able to tie these interfaces closely with services already provided by the users ILS.  I could envision things like ERM (electronic resource management), simplified requesting, etc. all being possible if the likes of ExLibris or Innovative Interfaces were allowed to build tools upon the Library Commons (WorldCat).  Maybe I would like to develop my own version of WorldCat Local/Navigator that interacts with the Commons and sell it as a product (kind of the same way ezproxy was sold prior to being acquired by OCLC) or a group of researchers would like to do the same.  As a commercial entity, I’m fairly certain that this type of development model wouldn’t be kosher with OCLC unless I licensed access to WorldCat (and I’m not certain that they would given that this would compete against one of their services).  Likewise, open source folks like LibLime or Equinox may like to create an open source version of the WorldCat Local interface.  Under the current guidelines, I understand that an open source implementation of WorldCat Local can exist — but as I understand that agreement, I’m not certain that groups like LibLime or Equinox (or another entity) could not take that project and then sell support-based services around it (I’m unclear on that one though).  However, it’s very unlikely that the library world will see any of these types of developments (well, maybe the open source WorldCat Local since I have a group that could use this and a number of people interested in developing it) because OCLC has come to treat what it calls the Commons (WorldCat), as it’s own personal data store.  There’s that commingling again. 

So if it was up to me, how would I resolve this situation?  Well, I see two possible scenarios. 

  1. Open up WorldCat.  OCLC likes to refer to WorldCat as the Library Commons — well, let’s treat it as such.  Remove the barriers for access and allow anyone and everyone the ability to essentially have their own copy of the Library Commons and it’s data.  Now, rather than specifying terms of transfer and telling libraries under what conditions they can and cannot make their metadata available to other groups, the membership could consider what type of Open Data license that the Commons could be made available under.  Something like the creative commons share alike license which allows for both commercial and non-commercial usage, but requires all parties to contribute all changes to the data back to the community (in essence, this is kind of what Open Library is doing with their metadata) may be appropriate.  OCLC would be free to develop their own products, but the rest of the library community (both library and vendor community) would have equal opportunity to develop new services and ways of visualizing the data found in the Commons.  Does this devalue the Commons (WorldCat)?  I don’t think so — look at Wikipedia.  It uses this model of distribution, yet I’ve never heard anyone say that this devalues it’s content.  Would there be challenges?  For sure.  Probably one of the biggest would be the way that it would change what it means to be a member of OCLC.  If each person could download their own personal copy of the Commons, would libraries stay members.  I’m certain that they would — but I’m sure that what it means to be a member would certainly change.
  2. Split OCLC’s membership services from OCLC’s vendor services.  Under this example, WorldCat Local/Navigator development would be spun away from OCLC as a separate business (this happens in academia all the time).  Were this to happen, OCLC would be able to develop terms for license that could then be leverage by all members of the commercial library community removing the artificial advantage OCLC is currently able to leverage (both in terms of data and deciding who is allowed to work with the Commons).  In all likelihood, I think that this model likely represents the smallest change for the membership and would continue to allow OCLC to make the Commons more available to non-commercial development without artificially limiting other groups interested in building new services. 

One last thought.  In talking to people today, I heard a number of times that OCLC restricting access to the Commons was in fact good thing, in part, because it finally allowed the library community the ability to leverage resources not available to the vendor communities.  In some way, we could finally stick it to them.  That’s fine, I’m all for developing tools and services, but this particular type of thinking I find worrisome.  If we, as a community, feel that we are unable to develop compelling tools and services that are able to compete with other vendor offerings without an artificial advantage — well that’s just sad and says a little something about how we see ourselves as a community.  And this too is something that I’d like to see change because if you look around, you will see that there are a myriad of projects (Koha, Evergreen, VuFind, Fedora, DSpace, LibraryFind, XC Catalog, Zotero, etc.) where developers (some library developers, some not) are re-envisioning how they see many of the services within the library and putting their time and effort into realizing those visions. 

 

–TR


Feb 23 2008

Reading between the lines

Like a number of people, I found the following piece (http://chronicle.com/weekly/v54/i24/24a01101.htm) from the Chronicle of Higher Education on the Open Library fairly interesting — in part, because of the topics that the author chose to highlight.  I tend to categorizes pieces such as this as fluff, in that one rarely gets any content of substance from them.  However, in a short article about the Internet Archive’s Open Library initiative, I found it interesting that so much of the article centered around OCLC, or, should I say, the silence coming from OCLC as members seek to clarify OCLC’s position in regards to the Open Library and it’s members potential participation in this project.  Two things that jump out:

  1. “Librarians are not just uneasy having nonlibrarians edit catalogs; they are also afraid of offending OCLC.”

    An exceptional understatement, though one that doesn’t extend just to the Open Library.  As a general rule, I find that librarians are way to concerned with offending OCLC, with many having a feeling that should an offense be taken, that it could have long running repercussions for the institution.  Are these concerns valid — for OCLC — I think not.  While I firmly believe that OCLC occupies the same vendor space as other entities like EBSCOhost, Elseiver and Serial Solutions, I think that they are much more responsive to their members customers — due in part to the organization’s roots as a large co-opt.  Of course, librarians and libraries have been conditioned to believing that consequences will follow if one rocks the boat or steps on their partner’s toes.  And unfortunately (and much to my chagrin), I’ve had occasion myself to say or post opinions that have cause push back from content/software providers currently serving Oregon State.  Fortunately, my director doesn’t mind when the pot periodically gets stirred, but not everyone is as lucky.  So, I can certainly understand where the nervousness is coming from.At the same time, I think that OCLC is contributing to this sense of uncertainty.  OCLC hasn’t been caught by surprise by the Open Library’s development work and certainly hasn’t been surprised by the Open Library asking OCLC members to contribute data to the project.  For close to a year, OCLC has had the opportunity to provide some form of guidance or position, as it relates to the Open Library project.  Instead, they have been silent.  This leaves librarians and libraries to consult their local OCLC representatives who have been given widely varying information regarding the legality of participating in this project.  While I’ve yet to hear of anyone being told that a library could not participate in the project, it has been quietly discouraged by OCLC’s deafening silence. 
  2. “But one OCLC official, speaking on the condition that he not be identified, said Open Library was a waste of time and resources, and predicted it would fail.”Again, it’s interesting that in a piece like this, that this comment would make it’s way into the article.  Whether or not this reflect’s OCLC’s current position on this particular project, I think that a number of good things may come out of the Open Library project, even if indirectly.  First, OCLC’s grid services.  While likely not a direct result of the Open Library’s project, I’d guess that the current desire to accelerate their availability is in response to the growing number of projects currently looking to move into the space the OCLC has traditionally monopolized.  Yes, let’s call it what it is, in this space, OCLC functions as a monopoly, because OCLC has essentially been allowed to rely on it’s position to squeeze out competing projects (RLG) and leverage their data to create services that would be otherwise impossible to create without the metadata that OCLC currently possess.  I think to some degree, projects like the Open Library give OCLC pause in the sense that at present, they see their bibliographic and holdings content, WorldCat, as their crown jewel.  It represents a body of work that exists no where else in the world and gives them a potential advantage over any cloud-based service being developed within the library community.  At the same time, as OCLC goes forward and libraries become more interested in building some of their own tools (either individually or as part of a consortia), I think that WorldCat, and the data beneath it will actually become less important for OCLC — rather, it will be the services that they develop on top of it that will hold the most value.  And I think that projects like the Open Library have accelerated this development.  As Martha Stewart would say, it’s a good thing. 

    Secondly, I think that this quote is interesting in a larger sense as to how it relates to OCLC as a whole.  They are undergoing big changes — business changes, philosophical changes and I think that this represents that to some degree.  As the piece notes, OCLC’s public face see cooperation as a good thing, while maybe privately, that’s not the case.  But honestly, I think that this is healthy.  OCLC is hiring a lot of bright people and has traditionally had a lot of bright people on staff and what we see is that they are thinking about these issues and how they relate within the larger community (even beyond OCLC).  Now, whether or not OCLC is particularly happy that these disagreements are being aired publicly (something that hasn’t traditionally happen), well, that would be something to keep an eye on as well.

–TR

[update: Spell check fails me again, sorry Martha]

Technorati Tags: ,


Jan 28 2008

Harvesting UMich OAI records with MarcEdit

I’ve had a few folks ask about the the procedure would be for a user wanting to harvest the UMich OAI records using MarcEdit.  Well, there are two workflows that can be followed depending on what you want to do.  You can harvest the OAI data and translate it directly to MARC or you can harvest the raw data directly to one’s file system.  Here’s how each would work:

Generating MARC records from the OAI content:

  1. Start MarcEdit
  2. From the Main Screen, click on the Harvest OAI Records Link
    image
  3. Once the link has been selected, you have a number of options available to you to control the harvesting.  Required options are those that are seen when the screen opens.  Advanced Settings, or optional settings define additional options available to the user.  Here’s a screenshot of the Harvester with the Advanced Options expanded:
     image
    The required elements that must be filled in are the Server Address (the address pointing to the OAI URL), metadata type (format to be downloaded) and Crosswalk Path.  If you select any of the predefined metadata types, the program will select the crosswalk path for you.  If you add your own, then you will need to point the program to the crosswalk path.  Set name is optional.  If you leave this value blank, the harvester will attempt to harvest all available sets on the defined server. 

    Advanced settings give the user a number of additional harvesting options, generally set aside to help the users control flow.  For example, users can harvest an individual record by entering the record’s identifier into the GetRecord Textbox.  A user could resume a harvest by entering the resumptionToken into the ResumptionToken textbox.  If the user wanted to harvest a subset of a specific data set, they can use a date limit (of course, you must use the date format supported by the server — generally yyyy or yyyy-mm-dd format).  Users can also determine if they want their metadata translated into MARC8 (since the harvester assumed UTF8 for all xml data) and change the timeout settings the harvester uses for returning data (you generally shouldn’t change this).  Finally, for users that don’t want to harvest data into MARC, but just need the raw data — there is the ability to tell the harvester to just harvest data to the local file system.  If this option is checked, then the CrossWalk Path’s label and behavior will change — requiring the user to enter a path to a directory to tell the harvester where it should save the harvested files.

  4. For the UMich Digital Books, a user would want to utilize the following settings to harvest metadata into MARC:
    image
    Users wanting to ensure that the MARC data is in MARC8 and not UTF8 format should check the Translate to MARC-8 option.  Once these settings have been set, a user will just need to click the OK button.  For this set (mbooks), there are approximately 111000+ records, so harvesting will take approximately an hour or so to complete.  Longer if you ask the program to translate data into MARC8.
  5. When finished, users will be prompted with a status box indicating the number of records, resumptiontokens and last resumptiontoken processed (and any error information if an error occurred on process).

 

Harvesting OAI records directly to the filesystem

  1. Start up MarcEdit
  2. Select Harvest OAI records link
  3. Enter the following information (Server folder location will obviously vary):
    image 
  4. Files are harvested into the defined directory — number numerically according to resumption token processed.  Again, when processing is finished, a summary window will be generated to inform the user of harvest status and error information related to the harvest.

Errors related to the UMich Harvest that could be encounted:

My guess is that you would see these if you are using the most current version of MarcEdit uploaded 2008-01-27, however, you may run into this if harvesting using other tools or older versions of MarcEdit.

  1. Server Timeout:  When harvesting all records, I was routinely seeing the server reset its connection after harvesting 10-18 resumption Tokens.  The current version of MarcEdit has some fall over code that will reinitiate the harvest under these conditions, stopping after 3 failed attempts.
  2. Invalid MARC data:  Within the 111000+ records, there are approximately 40-60+ MARC records that have too few characters represented in the MARC leader element.  This is problematic because this error will invalidate the record and depending on how the MARC parser handles records, poison the remainder of the file.  MarcEdit accommodates these errors by auto correcting the leader values — but this could be a problem with other tools.
  3. image
    This error message will be generated if you set the start and end elements using an invalid date format.  You should always check with the OAI server to see what date formats are supported by the server.  In this case, the date format expected by the UM OAI server is as follows:
    <repositoryName>University of Michigan Library Repository</repositoryName>
      <baseURL>http://quod.lib.umich.edu/cgi/o/oai/oai</baseURL>
      <protocolVersion>2.0</protocolVersion>
      <adminEmail>dlps-help@umich.edu</adminEmail>
      <earliestDatestamp>2007-10-24T18:48:49Z</earliestDatestamp>
      <deletedRecord>persistent</deletedRecord>
      <granularity>YYYY-MM-DDThh:mm:ssZ</granularity>
    

    Notice the granularity element — this tells me that any of the following formats would be valid:
    2008
    2008-01
    2008-01-01

Anyway — that’s pretty much it.  If you are just interested in see what type of data the UM is exposing with these data elements, you can find that data (harvested 2008-01-25) at: umich_books.zip (~63 mb).

 

–TR

 


Jan 26 2008

MARC21 University of Michigan Google Digital Books Records (records for testing/viewing)

I was playing with MarcEdit’s OAI harvester, making a few changes to fix a problem that had been discovered, as well as add some fall-over code that allows the harvester to continue processing (or at least, attempt to continue processing) when the OAI server breaks the connection (generally through timeout).  To test, I decided to work with the UMichigan Google Books sets of records Michigan recently made available.  It’s a large set and is one of those servers where the server timeout had been identified as an issue (i.e., this came up because a MarcEdit user had inquired about a problem they were having harvesting data). 

Anyway, I’ll likely post the update to the OAI harvesting code on Sunday or so (which will also include an update to the CJK processing component when going from MARC8-UTF8 — particularly when the record sets contain badly encoded data), and with it, I’ll likely include a small tutorial for users wanting to use MarcEdit to do one of the following:

  1. Harvest the UM digital book records from OAI directly into MARC21 (saving characterset in either legacy MARC8 or UTF8 formats)
  2. Harvesting the raw UM digital book metadata records via OAI (without the MARC conversion)

While I think that the the Harvester is fairly straightforward to use, I’m going to post some instruction, in part, so that I can underline some of the common error messages that one might see and what they mean.  For example, with the UM harvesting, I found that the OAI server tended to timeout after approximately 15 queries using a persistent connection.  When it would stop, it would throw a 503 error from the server.  I was able to over come the issue by simply adding some code into the app. to track failures and simply pause harvesting and restart the connection to the server — but these types of errors are not easy for most users to debug since they are not sure if the issue lies with the harvesting software or the server being harvested. 

Another problem that I’ve coded in MarcEdit to fix on the fly is that a handful of MARC21 records (I believe I identified approximately 40ish of 111000+) sent via OAI have invalid leader statements (i.e., not enough characters in the string).  For example, this record: http://quod.lib.umich.edu/cgi/o/oai/oai?verb=GetRecord&metadataPrefix=marc21&identifier=oai:quod.lib.umich.edu:MIU01-001300473, the leader is one character too short.  MarcEdit can fix these on the fly (at least it will try) by validating the length of the LDR and if short, padding spaces to the end of the string.  Since length and directory are calculated algorithmically, the records will be valid, but some of the leader data may get offset due to the padding.  However, there isn’t a thing you can really do about that, outside of rejecting the records as invalid or accepting the data as it (which the poisons all the other records downloaded in the set).  I’m putting together some info for the folks at UM that includes some of the problems that I’ve run into working with their OAI data just in case they are interested.

Anyway, one thing I thought I would do is post a set of these records, in MARC UTF8 and MARC8 charactersets (harvested 20080126 around 1:30 am to 3:00 am) for folks interested in taking a look at the exposed metadata.  You will find that the vast majority of these records appear to be brief metadata records containing basically an author, title and url — though full records are scattered through the record sets.  There are over 111000 records found in the six files.  The files in the zip are:

  1. mbooks-utf8 (combined data set)
  2. mbooks-marc8 (combined data set in marc8)
  3. pd-utf8 (international public domain books)
  4. pd-marc8 (international public domain books in marc8)
  5. pdus-utf8 (u.s. public domain books)
  6. pdus-marc8 (u.s. public domain books in marc8)

A quick note.  These are largish files.  MarcEdit has a preview mode specifically for this purpose.  Unless disabled, MarcEdit by default only loads the first 1 MB of data into the MarcEditor.  This will allow you to preview ~1000-1500 records, but using the editor tools, you can globally edit the entire data file.  This is done because reading data into the Editor is expensive (memory and time).  If you really want to open large files into the Editor, you need to make sure your virtual memory is set fairly high. 

So long as the folks at UM don’t ask me to take it down, I’ve posted these test files at: http://osulibrary.oregonstate.edu/techservices/marc/umich_books.zip for viewing and testing purposes (~62.7 MB), but I would recommend harvesting these records from http://quod.lib.umich.edu/cgi/o/oai/oai directly yourself if you want to use them since UM is adding new records all the time.  And remember, if you want to harvest them with MarcEdit, you’ll need to wait till I post the update on Sunday.

–TR

Technorati Tags: ,,,,


Jan 14 2008

My non-LITA top tech trends

(Note, I started this post last night, but had to put it away so I could get some rest before a 6 am flight.  I finished the remainder of this while waiting for my flight). 

So, after getting up way to early this morning, I staggered my way down to the LITA Top Tech Trends discussion this morning.  Unfortunately, it seemed like a number of other folks did the same thing as well, so I only ended up hanging out for a little bit.  I just don’t have the stamina in the morning to live through cramped quarters, poor broadband and no caffeine.  I get enough of that when I fly (which I get to do tomorrow).  Fortunately, a number of folks who had been asked to provide tech trends have begun (or have been) posting their lists and some folks who braved the early morning hours have started blogging their response (here).  I personally wasn’t asked to provide my list of tech trends, but I’m going to anyway, as well as comment on a few of the trends either posted or discussed during the meeting.  Remember, this is just one nuts list, so take it for what it is.

  1. Ultra-light and small PCs (Referenced from Karen Coombs)
    Karen is one of a number of folks that has taken note of a wide range of low-cost computers currently being made available to the general public.  These machines, which run between $189-$400, provide low-cost, portable machines that have the potential to bring computers to a wider audience.  I’ll have to admit, I’m personally not sold on these machines, in part because of the customer-base that they are aiming for.  Companies such as EeePC note that these machines are primarily targeted to users that are looking for a portable second machine and kids/elderly looking for a machine simply to surf the web.  A look at the specifications for many of these low cost machines are celerion class processors with 512 MB of RAM with poor graphics processing.  Is this good enough for web surfing or browsing the web?  I’d argue, no.  The current and future web is a rich environment, built on CSS, XML, XSLT, flash, java, etc.  I think what people seem to forget is that this rich content takes a number of resources to simply view.  Case in point — I setup a copy of Centos  on a 1.2 MHz Centrino with 512 MB RAM and a generic graphics card (8 Mb of shared memory) and while I could use this machine to browse the web and doing office work with Open office, I certainly wouldn’t want to.  Just running the Linux shell was painful, but web browsing is clunky and office work is basically unusable — essentially, surpassing the machine’s capabilities right out of the box.  Is this the type of resource I’d want to be lending to my patrons…probably not since I wouldn’t want my patrons to associate my library’s technical expertise with sub-standard resources.  Does this mean that ultra-portables will not be in vogue this year and the next?  Well, I didn’t say that.  A look at the success the IPhone is having (a pocket PC retailing for close to $1500 without a contract) seems to indicate that users are wanting to and willing to pay a premium price for portability — so long as that portability doesn’t come at too high of a price. 
  2. Branding outside services as our own (and branding in general)
    There was a little bit of talk about this — the idea of moving specific services outside the library to services like Google or Amazon, and essentially, rebranding them.  This makes some sense — however, I always cringe when we start talking about branding and how to make the library more visible.  From my perspective, the library is already too visible, i.e., intrusive into our users lives.  Libraries want to be noticed, and we want our patrons and organizations to see where the library gives them value.  It’s a necessary evil in times when competition for budget dollars is high.  However, I think it does our users a disservice.  Personally, I’d like to see the library become less visible — providing users direct access to information without the need to have the library’s finger prints all over the process.  We can make services that are transparent (or mostly transparent), and we should. 

    The same thing goes for our vendors.  I’ll use III as an example only because we are an Innovative Library so I’m more  familiar with their software.  By all rights, Encore is a serviceable product that will likely make III a lot of money.  However, of the public instances currently available (Michigan State, Nashville Public Library), the III branding is actually larger than that of the library (if the library branding shows up as well).  And this is in no way unique to III.  Do patrons care what software is being used?  I doubt it.  Should they care?  No.  They should simply be concerned that it works, and works in a way that it doesn’t get in in their way.  From my perspective, branding is just one more thing that gets in the way.

  3. Collections as services will change the way libraries do collection development
    I’m surprised that we don’t here more about this — but I’m honestly of the opinion that metadata portability and the ability for libraries to build their collections as web services will change the way libraries do collection development.  In the past, collection development was focused primarily on what could be physically or digitally acquired.  However, as more organizations move content online (particularly primary resources), libraries will be able to shift from an acquisitions model to a services model.  Protocols like OAI-PMH make it possible (and relatively simple) for libraries to actively “collect” content from their peer institutions in ways that were never possible in the past. 
  4. Increased move to outside library IT and increased love for hosted services (whether we want them or not)
    While it has taken a great deal of time, I think it is fair to say that libraries are more open to the idea of using Open Source software than ever before.  In the short term, this has been a boon for library IT departments, which has seen an investment in hardware and programmer support.  I think this investment in programming support will be short-lived.  In some respects, I see libraries going through their own version of the .COM boom (just, without all the money).  Open Source is suddenly in vogue.  Sexy programs like Evergreen have made a great deal of noise and inroads into a very traditionally vendor oriented community.  People are excited and that excitement is being made manifest by the growing number of software development positions being offered within libraries.  However, at some point, I see the bubble bursting.  And why?  Because most libraries will come to realize that either 1) having a programmer on staff is prohibitively expensive or 2) that the library will be bled dry by what I’ve heard coined by Kyle Banerjee as vampire services.  What is a vampire service?  A vampire service is a service that consumes a disproportional number of resources but will not die (generally for political reasons).  One of the dangers for libraries employing developers is the inclination to develop services as part of a grant or grandiose vision, that eventually becomes a vampire service.  They bleed an organization dry and build a culture that is distrustful of all in-house development (see our current caution looking at open source ILS systems.  It wasn’t too long ago that a number of institutions used locally developed [or open] ILS systems and the pain associated with those early products still affects our opinions of non-vendor ILS software today). 

    But here’s the good news.  Will all software development position within the library go away?  No.  In fact, I’d like to think that as position within individual organizations become more scarce — that consortia will move to step into this vacated space.  Like many of our other services moving to a network level, I think that the centralization of library development efforts would be a very positive outcome, in that it would help to increase collaboration between organizations and reduce the number of projects that are all trying to re-invent the same wheel.  I think of our own consortia in Oregon and Washington– Summit — and the dynamic organization it could become if only the institutions within it would be willing to give over some of their autonomy and funding to create a research and development branch within the consortia.  Much of the current development work (not all) could be moved up to the consortia level allowing more members to directly benefit from the work done. 

    At the same time, I see the increase of hosted services on the horizon.  I think that folks like LibLime really get it.  Their hosted services for small to medium size libraries presumably reduce LibLime’s costs to manage and maintain the software and those hosted libraries from the need to worry about hardware and support issues.  When you look at the future of open source in libraries — I think that this is it.  For every one organization willing to run open source within their library, there will be 5 others that will only be able to feasibly support that infrastructure if it is outsourced as a hosted service.  We will see a number of open source projects move this direction.  Hosted services for Dspace, Fedora, metasearch, the ILS — these will all continue to emerge and grow throughout this year and into the next 5 years.  And we will see the vendor space start to react to this phenomenon as well.  A number of vendors, like III, already provide hosted services.  However, I see them making a much more aggressive push to compel their users (higher licensing, etc) to move to a hosted service model. 

  5. OCLC will continue to down the path to becoming just another vendor
    I’d like nothing more than to be wrong, but I don’t think I am.  Whether its this year, the next or the year after that, OCLC will continue to alienate its member institutions, eventually losing the privileged status libraries have granted it throughout the years, becoming just another vendor (though a powerful one).  Over the last two years, we’ve seen a lot of happenings come from Dublin, Ohio.  There was the merger of RLG, the hiring of many talented librarians, WorldCat.org, WorldCat Local and OCLC’s newest initiatives circulating around their grid services.  OCLC is amassing a great deal of capital (money, data, members) and I think we will see how they intend to leverage this capital this year and the next.  Now, how they leverage this capital will go a long way to deciding what type of company OCLC will be from here forward.  Already, grumblings are being heard within the library development community as OCLC continues to move to build new revenue streams from webservices made possible only through the contribution of metadata records from member libraries.  As this process continues, I think you will continue to hear grumblings from libraries who believe that these services should be made freely available to members, since it was member dollars and time that provided OCLC exclusively with the data necessary to develop these services.  **Sidebar, this is something that we shouldn’t over look.  If you’re library is an OCLC member, you should be paying close attention to how OCLC develops their grid services.  Remember, OCLC is suppose to be a member driven organization.  It’s your organization.  Hold it accountable and make your voice heard when it comes to how these services are implemented.  Remember, OCLC only exists through the cooperative efforts of both OCLC and the thousands of member libraries that contribute metadata to the database.**  Unfortunately, I’m not sure what OCLC could do at this point to retain this position of privilege.  Already, too many people that I talk to see OCLC as just another vendor that doesn’t necessarily have the best interests of the library community at heart.  I’d like to think that they are wrong — that OCLC still remains an organization dedicated to furthering libraries and not just OCLC.  But at this point, I’m not sure we know (or they know).  What we do know is that there are a number of dedicated individuals that came to OCLC because they wanted to help move libraries forward — let’s hope OCLC will continue to let them do so.  And we watch, and wait.

Anyway, that’s my list of trends.

–TR

 

Technorati Tags: ,,

Jan 2 2008

Using MarcEdit to reuse (and maybe import) items in Dspace

I’ve been thinking a little bit about some of the things that I use MarcEdit for and have been pushing some of this work off my desk to some of the staff in our technical services department.  We actually use MarcEdit quite a bit when it comes to sharing metadata from our Dspace instance with other systems, like OCLC’s WorldCat and our online Catalog.  For example, we use MarcEdit to automatically generate MARC21 records for our theses submitted through Dspace.  The process seems to work fairly well, and has been very easy for our staff to learn.  Should write an article documenting this process and how its working at OSU at some point. 

To that end, I’m writing a plug-in for MarcEdit that may enable me to mainstream the processing of web page archiving in Dspace.  At this point, the process is a bit too manual for my tastes.  Along with spidering a site (using whatever the chosen depth may be), there is this pesky manual step of flattening the site and making the urls relative.  Not a big deal (unless there are file name collisions [which there always are] when reading depths), but it takes time.  So, I spent some time this afternoon and wrote a threaded web crawler.  Seems to work well.  At this point, I just need to add the logic to flatten all paths, and come up with a naming schema to re-write all urls to provide unique file names.  Once I get that down, building the batch import package for Dspace should be fairly trivial.  Not sure how much time I’ll have to work on this over the week/weekend, but would be a pretty cool project to finish I think.  It would certainly allow the library to provide site archiving as a dspace option (at this point, its only done under very special circumstances) and should simplify the process enough to the point that it could probably become a mainstream process. 

Anyway, if I do get a chance to get this finished, I’ll certainly make it available as a plug-in (with source).  Of course, if someone has already developed a simplified process that requires no manual processing after harvest, I would love to hear it.

–TR

Technorati Tags: ,,

Nov 29 2007

LibraryFind and Mobile Services

One of the things I was really impressed with while attending DLF was the presentation on the lightweight web platform being built at NCSU.  Leveraging their endeca catalog, the folks at NCSU have been able to produce a set of REST-based api for querying the catalog.  With those services, they’ve designed a mobile interface and a google widgets interface to their catalog.  It’s a good idea — one that I’m working on moving into LF.  LibraryFind already supports a SOAP-based API (which is actually my preference), but I’m running into more and more cases where a light-weight rest-based api would be nice.  Fortunately, ruby/rails makes this possible. 

Anyway, in the spirit of looking at building client services ontop of libraryfind, I’ve created a quick facebook app using libraryfind.  Whether we’ll actually use it (or be able to contribute it to the facebook app list — who knows), but it was actually pretty simple to put together.  Here’s a couple of quick screenshots (you’ll notice I’m using the iframe option — mostly because I was getting annoyed while testing the app getting dropped out of facebook.  This way, the user stays in facebook, but can query the service.

Facebook LibraryFind app Front:

image

Facebook LibraryFind Search/Results:

image

 

At this point, the app is just rendering LF in the iframe, though I imagine that an interface developed for mobile users would also work best within this environment.  I guess time (and usability testing) will tell (if there is even any interest in having something like this at all).

–TR

Technorati Tags: ,

May 1 2007

WorldCat Local — first questions

So OCLC and the UW finally pulled off the cover and unveiled WorldCat Local to the world. A number of folks will write about the things that this does very well. The faceting is nice and clean, as is the layout. Some of the ajaxy features (like holdings retrieval) is a little clunky and I’ve found a bit dodgy as I play with it on different browsers and operating systems, but for mainstream users, it will be fine. I like the integration with other OCLC services. This is one of the real value added benefits that OCLC can offer that most other systems likely cannot. Also, I’m assuming that as OCLC continues to develop this service, API access will become available for member institutions allowing institutions using WorldCat Local to embed their data into other systems.

However, looking at the beta, I also have a number of questions that I don’t have quick answers to either. Here they are in no particular order.

  1. Localization: OCLC’s catalog model utilizes a single master record. This means that local information, relating to access restrictions, local notes, Call Numbers — become marginalized. At this point, the beta doesn’t appear to have a call number search, which makes sense. This is local information that OCLC’s database wouldn’t have. But there is other data — local subjects not found in the OCLC master record, enhanced notes…all things that catalogers create in MARC records to surface items queried via keyword, subject or title — all missing from WorldCat Local at this point. Do we need this local information? Maybe not — this is a good experiment to see if that’s so.
  2. OCLC’ization of libraries: At present, libraries currently job out a lot of services to OCLC. Cataloging, ILL, Online Reference — adding the catalog/federated search would seemingly follow this trend. But at what point does the library stop being an individual entity and simply become an OCLC reseller. This isn’t a hit at OCLC, just a question.
  3. Related to that, I have questions regarding scale of operations. Since OCLC’s move to a unified platform for FirstSearch and its Connexion Cataloging environment, I would characterize operations to be…fragile. Rock solid services that seemed to never have down time seemingly experience problems on a pretty regular basis. In the grade scheme of things, not being able to catalog for a couple hours isn’t that big of a deal. However, once that is moved to the Catalog level — well, then it is. If one’s library catalog is down for an extended period, people notice. If down time becomes systemic — people will complain. Centralized indexes like Google work because of the large data centers that they have around the country to deal with issues relating to scale. FirstSearch/Connexion are member only services. Libraries traditionally throttle their access to their resources (either through purchasing a specific number of queries, ports or simply by how they present the resource) — so it doesn’t fill me with confidence that OCLC’s current services seem to have so many access problems. Again, in the large scheme of things, these services are almost always available — but it makes a big difference when we are talking about uptime for staff versus uptime for the public. Which brings me back to load. Were WorldCat Local to be used as the default catalog, suddenly, OCLC would need to handle a much larger number of queries. Rather than dealing with 1000 FirstSearch queries — OCLC would be required to handle hundreds of thousands of queries, done daily by multiple member institutions. At this point, I question their ability to work with the library community at that scale — but maybe OCLC has been creating data centers that we are not currently aware of.
  4. Widening the digital divide. One think OCLC’s WorldCat Local does is present itself as an index for all library collections. The “find near you” feature suggests that they know if the title is in “your” library. Well, no — they know if the title exists in your library if your library is an OCLC member and subscribes to FirstSearch. Essentially, this amounts to an OCLC tax (hey, just like Microsoft) for being in their index. For many institutions with funding woes, this tax could become too high and the cost will be a widening of the haves and have not’s as those that have can be a part of the OCLC index and those that have not are thrown to the wolves…I mean their ILS vendor.
  5. Cost — I’m curious to see how OCLC prices this service since it doesn’t replace an already existing service but will be an add-on. How you say…well, most ILS vendors don’t price their webpacs as separate line items. This is just part of the system. So you can’t stop paying for this part of the system and shift those cost savings to OCLC. About the only think that you can do is stop paying for system add-ons. Would this replace a federated search tool? Probably not. OCLC’s tool is searching brief records and abstracts for their article search. This is very different than the full text searches that you can get from most federated search tools. Plus, this search only encompasses things that currently exist in the OCLC universe. Until that universe expands, there will be too many items that it misses. The only think that I see it replacing are tools like III’s Encore…so maybe that’s enough — maybe not. I think OCLC will need to think very hard about how they charge for this service in part because libraries already spend such a large percentage of their budgets on OCLC services. I’m not certain, but I would guess that we pay OCLC more than any other vendor, including our ILS provider for services throughout the year. I’m wondering if adding WorldCat Local wouldn’t add to the current annual sticker shock associated with paying for these services.
  6. Library Brain Drain: I’ve talked with a few people about this. OCLC is collecting a lot of talented library project managers from the library community. While this gives me a lot of hope that many of the questions that I have will eventually have answers — it worries me for the library community. Top administrators are very important and there is a very shallow pool of library administrators that actually understand and can envision next generation digital library services. I worry what this poaching will mean for my profession.

Anyway, I guess that’s a long way to say that I’ll be interested in watching and seeing how this service continues to develop.

–TR