Administrator

 

Having gone 34 years without being a coffee drinker, I personally never got why people wanted coffee shops in libraries.  But over the last year, my wife and Greenhill Farms, a Kona Coffee Grower in Hawaii, convinced me that not all coffee is bad.  I’m so convinced, that having a morning cup of coffee (black, no sugar – yuck) has become a bit of a habit. 

Well, this morning, I was hanging out in Vancouver, BC killing time before heading to the airport.  Since I didn’t have anything to do, I grabbed my copy of Norman Mailers “The Castle in the Forest” and headed a couple blocks down the road to Tim Horton’s.  There, I grabbed a medium cup of black coffee, and found myself a quite table to just sit and read.  And I have to admit, kicking back, nursing my cup of coffee and enjoying a good book was really appealing.  Without realizing it, I’d spent about an hour and half in my little corner of the coffee shop.  I think I now definitely understand the draw.

Maybe now that I’ve had this break through, I’ll be able to unwrap other mysteries – like why people enjoy watching talk shows and reality TV, who actually liked the show FireFly and why (because as a scify fan, I don’t get it) the infatuation with Dr. Pepper, and why my cats always look like they are plotting to kill me in my sleep. 

–tr

 

Last week, I was on a panel at the Oregon Library Association entitled, A Sense of Service.  The title of the panel and the impetus for the discussion came out of the Oregon Library Association’s Quarterly publication, or more specifically, Janet Webster’s article entitled, “A Sense of Service(1).”  The article, and the publication in general, focused on what it means as a librarian to serve.  As one might expect, most of the articles in the publication focused on how librarians ultimately work to serve their patrons – which I guess makes sense – it is our most visible role to the public, and one that takes on more importance as libraries struggle to demonstrate their value to their communities.  But it’s also a viewpoint that excludes the majority of people that work in libraries.  If we define service with such a narrow lens, what does that mean for technical services staff, support staff, administration…all of which perform functions that help to meet their patron’s needs. 

I ask this question because this was the question posed to me.  I had been asked to participate on this panel and as the sole librarian on the panel without direct contact with library’s patrons, how and who exactly do I serve and how do you identify if your service has impact aren’t necessarily straightforward questions?  But it’s an interesting question, and one that I actually spent a good deal of time thinking about prior to speaking on the panel.

I guess like many people that work in non-public service positions, I tend to think very little about service – at least in the sense of making people happy.  And so in that respect, I was having a difficult time wondering what I might be able to add to the conversation.  The very talented librarians on my panel would be talking about how they are changing lives by embodying an ethic of service to ensure that their patrons (students, community members, etc.) have access to the information that they need.  These librarians can discuss how their service tangibly impacts their communities – from students who grow into life-long readers, or community members discovering new digital services, or students successfully traversing their library’s vast information resources, these librarians could see that they made a difference every day.  I may have been a panelist, but I was looking forward to hearing my colleagues share their excitement, because often times, the why we serve is very closely aligned to the why we work in libraries.

But as I said, I don’t work with patrons.  As the Gray Family Chair for Innovative Library Services, my job at OSU Libraries is to build bridges, understand how new technology will impact the library, develop experimental services, and help the library strategically align itself to meet future needs.  Sure, patrons are helped by the work that I do – but it’s service that’s one or two steps removed from those that use the library.  Which got me wondering again, what exactly do I have to offer to a panel on service or to those librarians that don’t work directly with the public? 

So what kinds of service could I point to?  My first thought was to look at my dossier.  I’ve been spending a lot of time looking at my dossier lately – mostly because I’m trying to decide when it will be strong enough to take a shot at full professor.  Given that this isn’t judged by the library, but a group largely made up of research scientists at the University, the process can be a difficult one.  Anyway, one of the sections that make up our dossier and candidate statement is one on service.  But honestly, what I find there doesn’t inspire me.  Service in this case is service to the profession, to the university – i.e., committees served, leadership positions held.  Hardly inspirational material for an up and coming librarian.  I had visions of being the lone person on the panel trying to inspire librarians to better service through an adherence to parliamentary procedure.  (Actually, if you ever want to depress yourself, look at the committees your university happens to have.  I swear to God, the one that still makes me laugh ever time I see it at OSU is the Committee on Committees.  I have a feeling that it must be important, but I get such an overwhelming sense of apathy when I see it.) 

Certainly there must be more – so I started to think about why I work in a library.  Actually, not just a library, but why I work at a land grant institution, and an idea started to take shape – especially around my own sense of what service.  And I came to an understanding that for me – there are two distinct ways in which I serve: professionally (which are things I have to do) and selfishly (things I have to do – but want to do).  Well, maybe selfishly isn’t the best way to describe it…no, maybe it is.

Sweeping away the professional service for the moment, I want to focus specifically on how I serve selfishly, and the philosophy that I apply.  As part of the panel, we were asked to address three questions:

  1. Whom do you serve?
  2. How do you serve?
  3. Why do you serve?

Honestly, I couldn’t cleanly answer those questions.  So, I wanted to describe how I serve and place it within the context of libraries. 

Service of Gaps

I think that the best way that I can describe my service and how I would encourage non-public services librarians to make a difference in their libraries, the library community and their local communities is to consider adopting a philosophy of a service of gaps.  It’s a philosophy of service that places an obligation on those people that see needs, to fill them.  And I think I have to explain this a little bit – so I’m going to explain it through three different examples. 

  1. The Libraries of Oregon
  2. MarcEdit
  3. Community Committee work (library board work)

The three above examples represent three different types of service that I participate in because we seen needs and had the capacity to fill the gaps.  The first is the Libraries of Oregon.  This project was born out of a partnership between the Oregon State Library, the Oregon State University Libraries (confusing) and the LSTA Board in Oregon.  For years, the state librarian has been supporting projects targeting the underserved and unserved residents of the state (which hovers around 20%).  The Libraries of Oregon is a project that we hope will begin to address some of the issues relating to access to information within the state.  And yet, this isn’t a project that OSU Libraries had to do (or was originally asked to do).  It doesn’t directly help the OSU community, it diverts some scarce resources to an outside project – and yet, this was a project OSU Libraries sought out and undertook because we should.  As the land grant institution, we had an obligation to fill this gap – and my position allows the library to do that.  I may never meet any of the residents or kids that we help through this program – but because of this work – they will have more access with fewer barriers than before.  And as a father who sees other kids struggling to do homework because they can’t get to a public library or they no longer have adequate school library resources, it’s work I believe very strongly in.  So, I serve selfishly.

For librarians and professionals that spend most of their time writing code, I think that there is an easy tendency to focus on the work that you are doing and forget about the larger impact of this work.  Sure – we understand that the coding we do makes information _____ (fill in the blank, easier to find, safer, richer, visible, etc.), but I also argue that it provides an important avenue for service as well.  Release your code and you contribute to the larger “discussion” of library science.  Develop something useful, and you can help to simplify someone’s work and help them provide better service for their patrons as well.  Engender a community and your can help users/practitioners share workflows, best practices, and build relationships.  I’d love to be able to say that I wrote MarcEdit with such noble thoughts in mind, but it’s the epitome of serving selfishly.  As a cataloger – I needed better tools – so I wrote one.  At the time, cataloging tools were expensive or worked primarily under DOS.  (Of course, there was the fabulous MARC.pm modules for PERL, but I’d rather have someone punch me in the face that have to write PERL [I’m just not that good at it.])  So I wrote MarcEdit to understand MARC and give myself a toolkit for developing other applications needing to access MARC data.  Making it available to the larger community was something done as an afterthought because a few folks (thanks Kyle Banerjee) thought the program might be useful to other people.  Today, I’m not a cataloger, but I continue to develop MarcEdit because it affords me the opportunity to work with catalogers and work with a lot of people doing a lot of very interesting things.  I’m certain I get a lot more out of those connections and relationships than the people that actually use the program – but again, its been a gap that I could fill and I’ve been happy to do it.

The last example is a very personal one to me.  While I joked about professional responsibilities and committee work, there are times when they can take on a special meaning.  Two years ago, I volunteered to serve on my city’s local library board.  Not because I had to or even because they couldn’t have found someone to do it – but because I owed the community and the library a debt and had an opportunity to repay it.  In my family, the public library is almost a second home.  My wife volunteers there, my kids love to visit there – the librarians are close to family (though not as crazy).  So, when an opportunity to serve opened up – I was happy to oblige – and through it, I’ve been able to find small ways to make contributions.  My hope is that I’ll be able to repay in small measure, what the library has personally provided for my family during the 9 years in the community.

These were the kinds of things I talked about during the panel presentation – this idea of a service of gaps and a notion that service is a fulfillment of an obligation to step in and fill needs.  It’s not that much different than what public librarians do – it’s just behind the curtain where people don’t always see it, and where maybe we don’t always recognize it.  I know I was happy to have been a part of the panel because it got me to think about service in a way that I hadn’t before.  It got me to broaden my lens and think about some of the reasons I enjoy working in a library.  And once I started down that path, answering those three questions in the beginning became a lot easier.

  • Whom do you server?
    Those in my sphere of influence that have a need (be that my community, profession, etc.)
  • How do you serve?
    By filling in the gaps, and often times, selfishly
  • Why do you serve?
    Because its my obligation to fill the gaps that I can

 

–TR

 

One of the hats I wear is as a member of the Independence Library Board.  I love it because I don’t work with public libraries as often as I’d like to in my real job, and honestly, the Independence Public Library is the center of the community.  The Library is a center for adults looking for education opportunities, kids looking for resources, and home to a number of talented librarians that are dedicated to encouraging a love of reading to our community.  It’s one of the few libraries I’ve ever known to have both a children’s and adult reading programs and takes advantage of that in the summer – by having the adults and kids compete against each other to see who logs the most pages (the kids always win). 

Each board meeting is interesting, because as the economy became more difficult for people, more people turned to the library.  Every month, the library sees more circulations, more bodies in the building, more kids, more adults – just more.  And they do it on a budget that doesn’t accurately reflect the impact that they have on the community. 

Anyway, one of the things that the Library has going for it is a very active friends program – and through that group (and some grant funds), the library was able to purchase a number of Laptop computers for circulation within the Library.  The Library currently has some, 8-10 terminals that are always being used and the laptops would provide additional seats, and allow people to work anywhere within the library using the wifi.

The Library setup the laptops using the usual software – DeepFreeze, etc. to provide a fairly locked down environment.  However, what was missing was a customizable timer on the machines.  Essentially, the staff was looking for a way to make it easier for patrons checking out the laptops to avoid fines.  The Laptops circulate for a finite period of time within the building.  Once that time is over, the clock starts ticking for fines.  To avoid confusion, and help make it easier for patrons to know when the clock was running out – I’d offered to work on building a simplified timer/kiosk program. 

The impetus for this work comes from Access 2007 I think.  I had attended the hackfest before the conference and one of the project ideas was an open source timing program.  I had worked on and developed a proof of concept that I passed on.  And while I never worked on the code since – I kept a copy myself.  When we were talking about things that would be helpful, I was reminded of this work. 

Now, unfortunately, I couldn’t use much of the old project at all.  The needs were slightly different – but it helped me have a place to start so that I wasn’t just looking at a blank screen.  So, with idea in hand, I decided to see how much time it would take to whip together an application that could meet the needs. 

I’ll admit, nights like tonight make me happy that I still do more than write code in scripting languages like python and ruby.  Taking about 3 hours, I put together a feature complete application that meets our specific needs.  I’ll be at the Oregon Library Association meeting this week, and if folks find this kind of work interesting, I’ll make it a bit more generic and post the source for anyone that wants to tinker with it.

So what does it do?  It’s pretty simple.  Basically, it’s an application that keeps time for the user and provides some built-in kiosk functionality to prevent the application was being disabled. 

Here are a few of the screen shots:

image
When the program is running, you see the clock situated in the task tray

image
Click on the icon, and see the program menu

Preferences – password protected

image

image

image
Because we have a large Hispanic population, all the strings will need be able to be translated.  This was essentially is just the locked message.  I’ll ensure the others are customizable as well – maybe with an option to just use Google Translate (even though it far, far from perfect) if a need to just get the gist across is the most important.

image
Run an action (both functions require a password)

image
Place your cursor over the icon to get the minutes

image
Information box letting you know you are running out of time

image
Sample lockout screen

In order to run any of the functions, you must authenticate yourself.  In order to disable the lockout screen, you must authenticate yourself.  What’s more, while the program is running, it creates a low-level keyboard hook to capture and pre-process all keystrokes, disabling things like escape keys, the windows key, ctrl+alt+del so that once this screen comes up – a user can not break out of it without shutting off the computer (which would result in needing to log in).  Coupled with DeepFreeze and some group policy settings, my guess is that this will suffice.

The source code itself is a few thousand lines of code, with maybe a 1000 or 1500 lines of actual business logic and the remainder around the UI/threading components.  Short and simple.

Hopefully, I’ll get a chance to do a little testing and get some feedback later this week – but for now – I’m just happy that maybe I can give a little bit back to the community library that gives so much to my family.  And if I hear from anyone that this might be of interest outside my library – I’ll certainly post the code up to github.

–TR

 

I’ve posted a new update to MarcEdit.  Lots of changes.  I’ll post some further information on a few changes tomorrow – but in the mean time, here’s the update list:

  1. Bug Fix: Z39.50 — automatic format support when syntax isn’t directly coded as marc, but MARC data is returned (i.e., UNIMARC as an example)
  2. Bug Fix: Z39.50 — batch search would capture multiple records, but the variable reporting the query items that returned multiple items could sometimes be overwritten.
  3. Bug Fix: Field Count — Labels will no longer overlap when users have set system fonts to display above 100%
  4. Behavior Change: Find All Jump List — no longer will it minimize when an item is selected. Now, it will jump to the record, but the list will remain visible.
  5. Bug Fix: Export Tab Delimited — Enlarged the subfield box and enabled additional data to be attached to the subfield
  6. Bug Fix: Export Tab Delimited — when working from a set of loaded settings, only the headers would be printed.
  7. Bug Fix: Z39.50 Batch Search — Gracefully handle exception when error file cannot be created.
  8. Enhancement: Extract/Delete Selected Records — Multiple select
  9. Enhancement: Added find records without Field/Subfield to Edit Shortcuts
  10. Bug Fix: Export Tab Delimited Settings arguments list will clear when loading settings (clearing old items before adding new ones)
  11. Enhancement: Export Tab Delimited Settings — Added Clear Settings option.
  12. Behavior Change: Export Tab Delimited Settings — No longer have to select MARC/no MARC before processing
  13. Bug Fix: lcase option leaving a \ before the replaced text. This is corrected.
  14. Enhancement: MarcEngine loose algorithm wouldn’t ignore Byte Order Marker (BOM) data causing problems with character encodings. This has been updated so that BOM data is ignored and filtered out.
  15. Bug Fix: Delimited Text Translator Constant data would prefix a number before the added data. This has been corrected.
  16. Bug Fix: Delimited Text Translator ability to process Excel 2010 xml files was broken. This has been corrected.
  17. Enhancement: Classify Tool now has the ability to exclude dates from serials and integrated resources
  18. New Tool: MARC SQL Explorer is a new tool designed to provide the ability to import MARC data into a database (or read records from a database). The SQL Explorer supports SQLite and MySQL.
  19. Bug Fix: EAD => MARC XSLT Translation update
  20. Enhancement: Task Automation Editor keeps focus when moving specific tasks up and down the task list.
  21. Enhancement: Task Automation Editor inserts new tasks into the task list next to the nearest selected task.
  22. Enhancement: Task Automation Editor — if a new items is added or deleted and the editor is closed without saving, a prompt will occur.
  23. Enhancement: De-duplication of records between multiple files allowing for the extraction of a file without duplication or just unique records.
  24. Enhancement: Support proxy detection when doing automated updates.
  25. Enhancement: MARCEngine COM update — previously, GetError returned errors associated with the XSLT processing. This function will now return errors for a number of other functions. I’ll continue to update this.
 

I’ve been spending a lot of time working with the Windows 8 beta, partly because I wanted to make sure marcedit would work with the new system and partly because I know that at some point I’d be seeing it.

First impressions:

  1. Not having the start button is a little jarring because you find and start programs differently (think search or tiles). 
  2. there are some cool new shortcuts that once you get use to using the windows key, actually make using the system fun
  3. I like the mail program since it gives your outlook calendar and email integration, while providing integration for my google, facebook and windows live calendars
  4. I like the people tile (reminds me of my phone)
  5. like the integration with skydrive
  6. not sure what I think yet about the way desktop and metro are integrated.  I’m getting use to it, but it doesn’t always feel as slick as it should. Though maybe that will get better with time
  7. love, love, love the integration with xbox. If you have one, that will be your killer feature in windows 8 since you can stream your games from your xbox to windows 8 device.
  8. It will be interesting to see if they create a metro office interface because I don’t like how office 2010 integrates. Decided to use libreoffice and google docs till I see what comes out.

However, the big question is how it works as a tablet os? Well, I got an Acer Iconia Tab and have installed Windows 8 and so far so good. Its both lite-weight with all the functionality of a desktop. So I’ll periodically check in to let folks know how it goes.

TR

P.S. This was actually written on the tablet, using live writer.

 

I posted a small update to MarcEdit last night (or this morning depending on your perspective).  Changes were as follows:

* 5.7.10

** Bug Fix: Custom Swap Field Function — corrected an error that would cause the targeted field to be sorted out of order. 

** Bug Fix:  Replace All (RegEx) — Corrected a bug that caused non-matching regular expression items to drop data when using the /m switch for matching multiple data fields.

** Enhancement:  Generate Control Numbers:  Added the ability to set the field for the control number to be inserted.  Can be added to an existing field, be added as a repeatable field, or added as a control field.

** Enhancement:  Added Swap Title function to the Edit Shortcuts section.

** Enhancement:  When splitting data, the program will increment file names if like control numbers exist in a file (i.e., two records with the same control number will result in a file called "controlnumber.mrc" and controlnumber1.mrc"

 

The two most noticeable changes are with the Generate Control Numbers function and the Swap Title Function.

Generate Control Numbers

The generate control numbers function has always been a part of the application – but has always been limited to created control number data in the 001 field.  Likewise, there was a limitation that only allowed data to be generated if an 001 wasn’t present.  The update makes the following changes:

  1. Ability to set the Field or Field/Subfield combination for the data to be inserted.
  2. An option to always insert (allowing for control numbers to be generated if the field or field/subfield combination already exists in a file.  This would be good for adding new control numbers to repeatable fields, like the 035.

image

 

Swap Title Data

This request came up a week or two ago.  The question was related to swapping data in the 245$a into the 776$t.  MarcEdit has always included a swap field function which works if the user wants to move all the data in the $a into the new field/subfield pair.  However, in this case, the user wanted to move the title data without the non-filing characters.  The current swap field function does not support that option. 

In considering how to make this change, I ran into a problem – the swap field function is one of the oldest functions in the MarcEdit toolbox and the code is showing its age.  It’s also quite complicated, because of the many different types of options available to it.  Originally, I’d planned on just adding to this master function – but the complexity involved in updating the function became too unweldy. 

 

**Side note**

One of the dangers of working on a large project for so long, is that bits of old code that no longer are used tend to take up life as part of the coded, undead.  Zombie bytes that simply take up compiler space and occasionally screw with your mind when it rises back to life do to a missed case statement or commented line of code.  While working with the swap field function, it also became apparent that I need to refactor the present function (something I’m starting).  This happens all the time.  Whenever I do significant work on an area of the program, I take that as an opportunity to refactor old code.  In fact, not too long ago, I streamlined all the code in the primary editing library – except the swap field function.  It kind of looks like code that was written by two monkeys pounding on a keyboard.  It’s long (somewhere between 10-15,000 lines of code), redundant in parts, and ugly.  Yet, it works and works well and in the real world, working code wins.  And yet, I’ve started the refactoring process because it’s time for this function to go on a diet….

 

**off Side Note**

 

In the end, what I decided to do was add this as a macro of sorts.  In the last update, I added a new section called Edit Shortcuts.  These are shortcuts to what are actually complex regular expressions that no person should ever have to work out.  So I added it here.  The tool expects that data will be extracted from the 245$a, and the user determines what field and subfield they will place the data into.  This is a first pass on the macro, I’ll be updating it soon to allow indicator values to be added as well as part of the field creation process.

 

To use it, look in the MarcEditor under Edit/Edit ShortCuts/Field Edits/Title Swap.

image

 

–TR

 

So, this isn’t even 1/2 baked yet, but I’ve been working on a project to move cataloging onto the mobile phone.  Is there a practical application here – maybe.  I’ve got a couple of ideas where something like this might be useful…maybe in helping with recon projects where catalog cards or shelves of uncataloged materials are presents.  Likewise, I think that there could be an application for gifts processing – maybe two.  For gift processing, I could see subject selectors given the ability to essentially scan and “catalog” a work if a record is found, essentially speeding up the process of getting an item to the shelf.  Additionally, I could see this process being expanded to aid in doing valuation of gifts – where a selector could scan books and have the utility go out to amazon and keep a running valuation.  So, there could be some interesting applications.

The demo -

  • Written in C#, borrowing components from MarcEdit
  • Runs on Windows Phone 7.5 or possibly Android with the Mono Touch support (though I haven’t tried it yet)
  • Supports scanning and searching of ISBN data, OCRing card catalogs and reading/acquiring barcoded materials.
  • Can download records from Z39.50 targets and upload records using TCP to an ILS system

A couple of notes:

  • ISBD…while you would think that this would make scanning and reading catalog cards easier – it doesn’t.  Using both Google’s free OCR services and Microsoft’s OCR products, I’m finding that the punctuation on the catalog cards is fouling the OCR.  I think the reason is that OCR systems are trained to look for words, and the punctuation, especially in the subject blocks, doesn’t make sense.  So, it makes the OCR basically worthless.  I can parse parts of things – generally titles, authors and ISBN data, which is good enough to search for a record – but at this point, generating a record straight from a catalog record seems unlikely unless a better OCR service is found.
  • Programming this was actually really easy.  On the windows phone, it’s a combination of C# and Silverlight.  This was admittedly easier because I could reuse some of the MarcEdit codebase in doing this, but I think you could do this on other systems as well by simply moving some of the data processing off the device and to a web-service.  It means more data is moving between the device and the web, but ease of development may be worth the speed trade-off.
  • Reactions to this type of tool are interesting.  I gave a brief (and choppy) demo of this process at a local technology conference, Online NW.  Unfortunately, the computer wouldn’t cooperate to show the demo video till the end, so the lightening talk was abbreviated – but I was talking to a colleague after the conference and it was interesting that people in the crowd (I’m assuming catalogers) gasped at horror with the idea that someone, other than a cataloger, might actually go and download records and put them into the ILS (which reminds me – I need to delete the record I downloaded into our ILS Smile)

 

Anyway – for those folks interested in seeing a 1/2 baked demo, feel free to watch this video placed on youtube this afternoon.

QuickMARC Proof of Concept

 

–TR

 

So I’ve been spending my time making a few changes to my proof of concept cataloging application using my phone.  A couple of things that I’ve learned along the way:

  1. No matter how good the OCR is, I’m not sure it ever gets to a point where you can just happily scan a catalog card and get all the data perfectly.  You can thank ISBD punctuation for that.
  2. Setting holds data in OCLC is much easier than you’d think it would be, thanks to the Z39.50 Extended properties.
  3. Adding a barcode reader really was easier than I thought it would be

Right now, the proof of concept allows users search (and set holdings) to OCLC (using their login credentials) or search and download records from US LC.  You can scan a barcode to get the record, or you can scan a library card and allow the program to attempt to disassemble the metadata to determine the best search profile.  Obviously, of the methods, this one is the most dodgy, but it’s interesting to see how it works and how OCR incrementally improves. 

As I’ve been working on this, it’s been making me wonder what are the real life implications for a project like this.  Obviously, one of the goals was to make taking catalog cards and making them easier to recon.  But the ability to use the phone as a barcode scanner and catalog on the fly also makes me wonder if a tool like this could be used while shelf reading or at point of acquisition of a text, or at a circulation desk when working with a book without a record. 

One benefit of this work as well, is that since this code is being written in C#, I’m starting to think about how I might co-op some of this work in MarcEdit.  The idea being that a user could upload a set of images to a folder and then MarcEdit could OCR those images and utilize the data from those images to automatically retrieve records for that content.  I’m not quite sure how reasonable of an idea this really is at this point due to limitations with OCR, but from a technical standpoint, I have all the components I would need to make this happen.  So who knows, maybe this work will spawn something new and innovative yet.  Well see.

 

–tr

 

The latest MarcEdit update has been baked and pushed out the door.  If you are running a current version of MarcEdit, you can expect to see the program prompt you for update (unless you’ve disabled that functionality).  Otherwise, you can find the update at: http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html.  Originally, this update was planned to be primarily cosmetic, with two small bug fixes.  However, after working with a colleague playing with some large Hathi Trust metadata files, a few other updates ended up squeezing in.  So what’s changed?  See below:

  1. Enhancement: MARCXML => MARC enhancements.  When translating from MARCXML
    to MARC, MarcEdit will truncate records if the record data is too long (over
    the 99,999 bytes) or the field data is too long (over 9,999 bytes).  MarcEdit
    will truncate records that are too long or split the field data if too long.
    If either operation occurs, MarcEdit will recode the 008/38 to an "s".  This
    enhancement only affects the MARCXML=>MARC conversion function — however,
    that means that any function that converts data to MARC through MARCXML is
    affected by this change. 

    I discussed this change in more length here, but essentially, this change was necessitated because I’m occasionally running into XML data that I’d like to translate into MARC, but simply is too large.  The changes here allow MarcEdit when translating data through the MARCXML=>MARC process to automatically augment records that would otherwise be generated as invalid (as currently happens).  If you’d like to see how MarcEdit handles these types of errors, you can look at a sample file at: http://people.oregonstate.edu/~reeset/marcedit/anonymous/long_xml.xml.  This file has 3 MARCXML records.  The first one is roughly 3 times too large for a traditional MARC record thanks to the many 9xx fields in the record.  Prior to this update, MarcEdit would generate a record, calculating the length of the record incorrectly (it would calculate the length, then take the first 5 numbers in the value – since the record is longer than 5 values, the record length would be incorrect).  After this update, MarcEdit will now truncate fields once the record limit has been reached and notify the user through the UI that the truncation took place, in addition to the 008 modifications mentioned above.

  2. Bug Fix: Swap Field function:  Under certain rare conditions, moving data
    from a control field to a variable field results in the delimiter value being
    dropped on the swapped data.
  3. Bug Fix: Set Font function — when the function fails, the program will now exit the function gracefully and render the font in its default state.
  4. Enhancement: Validator has been augmented so that invalid record
    identification of records in .mrk format can be done outside of the
    MarcEditor.
  5. Enhancement: Added a new Change Case shortcut that allows users to set the
    initial character in a field to upper case, without modifying the case of any
    other characters in the subfield.

So that’s it for the updates.  The MARCXML=>MARC changes were very significant changes, but hopefully they will be useful ones.  I know that they will be welcomed at OSU since we occasionally run into issues of fields being too long when harvesting our ETD records from DSpace to generate our MARC records for the catalog.

–TR

 

One of the benefits of moving the MARCXML=>MARC translation algorithm away from XSLT to an inline function is the ability to provide some sanity checking beyond the simple XML validation.  One of the issues that I see periodically when working with XML conversions is the need to code data truncation into my XSLT stylesheets.  For example, the ETD process that we use with DSpace looks for the abstract and makes sure that the data in the abstract doesn’t exceed the 9,999 bytes for a MARC field. 

Recently however, I found a different problem that I don’t run into often, but showed up when working with some data provided by the Hathi Trust.  Some colleagues were given a large sample of data (32 GBs of MARCXML) data to do some research into providing better identification of government documents records.  The new MarcEdit MARCXML process is able to make short work of this 32 GB file, translating the data into MARC in ~20 minutes.  The problem however, that arrives, is that some of these records are too long.  For reasons I cannot understand, the Hathi Trust data includes a local 9xx field, that from the context, appears to be item information.  Unfortunately, some records include thousands of items, meaning that when the data is translated, the resulting record is too large (exceeds the total length of 99,999 bytes). 

However, because of the new MARCXML process, I’ve been able to create a work around  for situations like this.  When processing MARCXML data, MarcEdit will internally track the record length of a translated record.  If that record would exceed the maximum record length, MarcEdit will truncate the record by dropping fields off the end of the record.  The program will also modify the 008/38 byte, setting the value to “s” (means modified) and will visually notify the user that a truncation occurred by changing the results panel purple.

image

While I generally take a hands off approach to modifying MARC data through the translation process, this seems to be a good compromise for dealing with what is now, a rare situation, but what I predict, will become an all too common situation as more data is created in systems without the MARC record limitations.

These changes to the translation engine will occur on the next MarcEdit update (scheduled for 1/23/2012), when I’ll post both an announcement and include a small record set that can demonstrate the new functionality. Hopefully, folks will find these changes useful, especially as technical services departments find themselves having to deal with more and more non-MARC metadata.

–TR

© 2011 Terry's Worklog Suffusion theme by Sayontan Sinha