Mobile MARC Cataloging demo Posted: 2012-02-10 23:37:42
So, this isn’t even 1/2 baked yet, but I’ve been working on a project to move cataloging onto the mobile phone. Is there a practical application here – maybe. I’ve got a couple of ideas where something like this might be useful…maybe in helping with recon projects where catalog cards or shelves of uncataloged materials are presents. Likewise, I think that there could be an application for gifts processing – maybe two. For gift processing, I could see subject selectors given the ability to essentially scan and “catalog” a work if a record is found, essentially speeding up the process of getting an item to the shelf. Additionally, I could see this process being expanded to aid in doing valuation of gifts – where a selector could scan books and have the utility go out to amazon and keep a running valuation. So, there could be some interesting applications. The demo - - Written in C#, borrowing components from MarcEdit
- Runs on Windows Phone 7.5 or possibly Android with the Mono Touch support (though I haven’t tried it yet)
- Supports scanning and searching of ISBN data, OCRing card catalogs and reading/acquiring barcoded materials.
- Can download records from Z39.50 targets and upload records using TCP to an ILS system
A couple of notes: - ISBD…while you would think that this would make scanning and reading catalog cards easier – it doesn’t. Using both Google’s free OCR services and Microsoft’s OCR products, I’m finding that the punctuation on the catalog cards is fouling the OCR. I think the reason is that OCR systems are trained to look for words, and the punctuation, especially in the subject blocks, doesn’t make sense. So, it makes the OCR basically worthless. I can parse parts of things – generally titles, authors and ISBN data, which is good enough to search for a record – but at this point, generating a record straight from a catalog record seems unlikely unless a better OCR service is found.
- Programming this was actually really easy. On the windows phone, it’s a combination of C# and Silverlight. This was admittedly easier because I could reuse some of the MarcEdit codebase in doing this, but I think you could do this on other systems as well by simply moving some of the data processing off the device and to a web-service. It means more data is moving between the device and the web, but ease of development may be worth the speed trade-off.
- Reactions to this type of tool are interesting. I gave a brief (and choppy) demo of this process at a local technology conference, Online NW. Unfortunately, the computer wouldn’t cooperate to show the demo video till the end, so the lightening talk was abbreviated – but I was talking to a colleague after the conference and it was interesting that people in the crowd (I’m assuming catalogers) gasped at horror with the idea that someone, other than a cataloger, might actually go and download records and put them into the ILS (which reminds me – I need to delete the record I downloaded into our ILS
) Anyway – for those folks interested in seeing a 1/2 baked demo, feel free to watch this video placed on youtube this afternoon. QuickMARC Proof of Concept --TR
Proof of concept redux Posted: 2012-01-29 08:44:23
So I’ve been spending my time making a few changes to my proof of concept cataloging application using my phone. A couple of things that I’ve learned along the way: - No matter how good the OCR is, I’m not sure it ever gets to a point where you can just happily scan a catalog card and get all the data perfectly. You can thank ISBD punctuation for that.
- Setting holds data in OCLC is much easier than you’d think it would be, thanks to the Z39.50 Extended properties.
- Adding a barcode reader really was easier than I thought it would be
Right now, the proof of concept allows users search (and set holdings) to OCLC (using their login credentials) or search and download records from US LC. You can scan a barcode to get the record, or you can scan a library card and allow the program to attempt to disassemble the metadata to determine the best search profile. Obviously, of the methods, this one is the most dodgy, but it’s interesting to see how it works and how OCR incrementally improves. As I’ve been working on this, it’s been making me wonder what are the real life implications for a project like this. Obviously, one of the goals was to make taking catalog cards and making them easier to recon. But the ability to use the phone as a barcode scanner and catalog on the fly also makes me wonder if a tool like this could be used while shelf reading or at point of acquisition of a text, or at a circulation desk when working with a book without a record. One benefit of this work as well, is that since this code is being written in C#, I’m starting to think about how I might co-op some of this work in MarcEdit. The idea being that a user could upload a set of images to a folder and then MarcEdit could OCR those images and utilize the data from those images to automatically retrieve records for that content. I’m not quite sure how reasonable of an idea this really is at this point due to limitations with OCR, but from a technical standpoint, I have all the components I would need to make this happen. So who knows, maybe this work will spawn something new and innovative yet. Well see. --tr
MarcEdit 5.7 update posted Posted: 2012-01-22 22:46:46
The latest MarcEdit update has been baked and pushed out the door. If you are running a current version of MarcEdit, you can expect to see the program prompt you for update (unless you’ve disabled that functionality). Otherwise, you can find the update at: http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html. Originally, this update was planned to be primarily cosmetic, with two small bug fixes. However, after working with a colleague playing with some large Hathi Trust metadata files, a few other updates ended up squeezing in. So what’s changed? See below: - Enhancement: MARCXML => MARC enhancements. When translating from MARCXML
to MARC, MarcEdit will truncate records if the record data is too long (over the 99,999 bytes) or the field data is too long (over 9,999 bytes). MarcEdit will truncate records that are too long or split the field data if too long. If either operation occurs, MarcEdit will recode the 008/38 to an "s". This enhancement only affects the MARCXML=>MARC conversion function -- however, that means that any function that converts data to MARC through MARCXML is affected by this change. I discussed this change in more length here, but essentially, this change was necessitated because I’m occasionally running into XML data that I’d like to translate into MARC, but simply is too large. The changes here allow MarcEdit when translating data through the MARCXML=>MARC process to automatically augment records that would otherwise be generated as invalid (as currently happens). If you’d like to see how MarcEdit handles these types of errors, you can look at a sample file at: http://people.oregonstate.edu/~reeset/marcedit/anonymous/long_xml.xml. This file has 3 MARCXML records. The first one is roughly 3 times too large for a traditional MARC record thanks to the many 9xx fields in the record. Prior to this update, MarcEdit would generate a record, calculating the length of the record incorrectly (it would calculate the length, then take the first 5 numbers in the value – since the record is longer than 5 values, the record length would be incorrect). After this update, MarcEdit will now truncate fields once the record limit has been reached and notify the user through the UI that the truncation took place, in addition to the 008 modifications mentioned above.
- Bug Fix: Swap Field function: Under certain rare conditions, moving data
from a control field to a variable field results in the delimiter value being dropped on the swapped data.
- Bug Fix: Set Font function -- when the function fails, the program will now exit the function gracefully and render the font in its default state.
- Enhancement: Validator has been augmented so that invalid record
identification of records in .mrk format can be done outside of the MarcEditor.
- Enhancement: Added a new Change Case shortcut that allows users to set the
initial character in a field to upper case, without modifying the case of any other characters in the subfield.
So that’s it for the updates. The MARCXML=>MARC changes were very significant changes, but hopefully they will be useful ones. I know that they will be welcomed at OSU since we occasionally run into issues of fields being too long when harvesting our ETD records from DSpace to generate our MARC records for the catalog. --TR
MARCEngine MARCXML translation changes coming this weekend Posted: 2012-01-21 01:10:30
One of the benefits of moving the MARCXML=>MARC translation algorithm away from XSLT to an inline function is the ability to provide some sanity checking beyond the simple XML validation. One of the issues that I see periodically when working with XML conversions is the need to code data truncation into my XSLT stylesheets. For example, the ETD process that we use with DSpace looks for the abstract and makes sure that the data in the abstract doesn’t exceed the 9,999 bytes for a MARC field. Recently however, I found a different problem that I don’t run into often, but showed up when working with some data provided by the Hathi Trust. Some colleagues were given a large sample of data (32 GBs of MARCXML) data to do some research into providing better identification of government documents records. The new MarcEdit MARCXML process is able to make short work of this 32 GB file, translating the data into MARC in ~20 minutes. The problem however, that arrives, is that some of these records are too long. For reasons I cannot understand, the Hathi Trust data includes a local 9xx field, that from the context, appears to be item information. Unfortunately, some records include thousands of items, meaning that when the data is translated, the resulting record is too large (exceeds the total length of 99,999 bytes). However, because of the new MARCXML process, I’ve been able to create a work around for situations like this. When processing MARCXML data, MarcEdit will internally track the record length of a translated record. If that record would exceed the maximum record length, MarcEdit will truncate the record by dropping fields off the end of the record. The program will also modify the 008/38 byte, setting the value to “s” (means modified) and will visually notify the user that a truncation occurred by changing the results panel purple.  While I generally take a hands off approach to modifying MARC data through the translation process, this seems to be a good compromise for dealing with what is now, a rare situation, but what I predict, will become an all too common situation as more data is created in systems without the MARC record limitations. These changes to the translation engine will occur on the next MarcEdit update (scheduled for 1/23/2012), when I’ll post both an announcement and include a small record set that can demonstrate the new functionality. Hopefully, folks will find these changes useful, especially as technical services departments find themselves having to deal with more and more non-MARC metadata. --TR
Turning your phones into Cataloging clients Posted: 2012-01-08 17:29:54
Over the past few years, I’ve owned a number of different smart phones. I’ve had an IPhone, Android (the first in fact) and now a Windows 7 Phone. I have admit, they are all great, especially when I compare them to my old Blackberry. What you can do with each of these devices is quite cool. One of my favorite aspects of these phones is how easy it is to hack on them. When I had my IPhone, I spent some time learning Object C and writing a few simple IOS apps. I did the same thing with Android and Java. However, now that I have a Windows Phone, I find that I have many more opportunities to write applications for it because there’s no learning curve…I already use both Silverlight and C# in some personal coding projects. So, why do I bring this up. Well, one of the things I’ve been thinking about is how these little micro computers that fit in our pockets can potentially be used in libraries. There are some obvious uses (making our catalogs more mobile, using geolocation within a building to help users navigate to a book, etc), but what I’m more interested in is how we can make staff life a little easier with these devices. Looking around our library, one area that I can definitely see where these kind of devices might be able to make a big impact is in cataloging and technical services – well, more specifically, eliminating the need to perform recon within cataloging and technical services. Travelling around a few libraries in my immediate area, one thing that I’ve found is many libraries still have small card catalogs. The often are of materials that have yet to be reconned and represent older journal titles and monographs. Many libraries also have large gift shelves, and areas in the stacks themselves, that remain uncataloged. It would be nice if we could take these micro computers, fully equipped with a digital camera, and photograph ourselves out of this problem. The difficulties of course relate to OCR and the conversion of this data into MARC itself…or maybe it’s not a difficulty. I’ve been doing a little bit of playing around (well, more than a little bit) and here’s what I’ve found. It’s easy to do OCR on the web (free OCR). Folks my not realize it, but the Google Docs API provides a free OCR service. So does Microsoft. By working with the camera on a smart phone, it’s easy to send a snapshot of a book title page or card catalog card to one of these OCR services and return the results back to the phone. Using MarcEdit (being written in C#, MarcEdit can be compiled to run on a windows phone, I’ve done it), I’m able to utilize the MARCEngine to take that OCR data and either retrieve data from Amazon, the library of congress, another library catalog – massage the data, and upload it to my catalog – all from my phone. Pretty cool stuff. Right now, this work all remains in the research stages…its rough. The UI is sad, and the parsing of the OCR’d data could be much better. But the interesting thing is that it does work. Does it have a real applicability in the library world – maybe, maybe not. I’m just not sure if enough reconned material still exists for this type of application to be needed. But what this type of experimentation does show is that libraries probably should be looking at these little micro computers as more than consumer devices (i.e., how they change the way our users interact with our services) and consider how these devices may change the way libraries perform their own work. BTW, if folks are interested in this recon project – my intention is to talk about it at C4L this Feb during a lightening talk. Ideally, I’ll have it cleaned up enough to show it off, and maybe, if there is interest, talk to some folks about how they can run something like this on their own Windows 7 Phone. --TR
Merry Christmas–MarcEdit 5.7 Available Posted: 2011-12-25 02:11:16
Merry Christmas everyone. I hope that everyone has a safe, and happy holidays with their family and their friends. In what has become a bit of a holiday tradition, I’m releasing an update to MarcEdit, MarcEdit 5.7. Yep, this shifts from version number from 5.6 to 5.7, and there are some pretty good reasons why – so lets get to it. Updates Native MARCXML Processing I’ve talked about this change at length in an earlier post, but in order to facilitate some of the work that I’m interested in doing situated around MarcEdit and Linked Data, I had to improve the XML processing related to MARCXML. Previously, MarcEdit utilized XSLT processing for all XML conversions. This works great, provides a lot of flexibility, but has a fairly substantial memory footprint with visible performance issues when dealing with larger (500 MB+) MARCXML file sets. To deal with these issues, I’ve updated MarcEdit so that I’ve now included native processing of MARCXML data using a SAX style XML processor. This means validation of the document happens as the document is processed, but the take away is that MarcEdit’s MARCXML process has nearly no additional memory footprint and processes data approximately 190 times faster than the current process. Of course, some people may have good reason to want to continue to use the XSLT style processing (for example, they may have customized the MARCXML=>MARC xslt), so I’ve also maintained the ability for users to continue to use the previous XSLT style MARCXML processing (though the new method is the default). You can modify the MARCXML processing preferences within the Application Preferences window.  Users wanting to disable the native XSLT processing function and utilize the previous XSLT process simply need to uncheck the Use Native Option (Non-XSLT Process). When this option is unchecked, the non-native option will be used. This change has an impact in other parts of the program as well. If you use the MarcEdit COM based API or .NET API to access the MARCEngine – API calls to the engine for MARCXML=>MARC processing will utilize the XSLT translation process if an XSLT is passed into the function. If you want to use the native process, simply pass an empty string (or null value) to the function. Likewise, individuals using the cmarcedit.exe program (MarcEdit’s Console Program). If you want to use the native process, simply do not provide an xslt when calling the MARCXML=>MARC translation. UTF8=>MARC8 conversion updates The UTF8=>MARC character conversion process wasn’t treating combining characters for diacritics represented as {dotb} or {commab} correctly. These diacritics were recognized, but the combining byte wasn’t being moved properly within the string causing the diacritic to modify the wrong value. I’d like to thank Joe Altimus at Arizona State University for bringing this to my attention this week. Multiple File Record Deduplication Utility One of the feature requests that I get every now and again, is a request to update the MarcEdit duplication record function found in the MarcEditor. Very often, users want to run this tool over multiple files, rather than find duplication records in a single source file. So, I’ve modified the existing function so that you can now perform this function outside of the MarcEditor, and upon multiple files. You find this function on the main MarcEdit window, under Tools/Find Duplicate Records.  When you run this function, you get the following window.  Simply click on the Open folder, and select a file. To add another file, simply select the open icon again and select another file. You’ll see selected files added to the dropdown list. MarcEdit will then utilize the files in this list to perform the stated operation. At this point, this function is an extension of the existing deduplication tool. I was considering making a tool that did a more heuristically analysis of the records to determine duplicate records, but at this point, I’m going to wait for users to give this a try and provide some feedback so I can target my development accordingly. Introduction of MarcEditor Editing Shortcuts I was spending some time looking through the MarcEdit listserv the past few weeks, and one of the things that I have noticed is that a lot of questions to the listserv revolve around regular expressions. Generally, these are questions from catalogers that have used regular expressions in the past, but just need a little nudge to solve a problem. That’s great…but I also notice that there are a few questions that come up a lot. One of these questions revolve around the character case within records (specifically titles). So what I’ve done (and if it’s useful we’ll keep it, if it’s not, I can retire it quickly), is added a new menu entry in the MarcEditor/Tools menu called Edit Shortcuts.  As you can see from the screenshot, the first set of Edit Shortcuts that I’ve added to the program deal with changing character case within the program. Essentially, these are shortcuts that initialize specific regular expressions for you over a defined set of MARC data (field/subfield combination). My hope is that people will find these shortcuts useful, and will suggest additional shortcuts that I can add to the program. Moreover, at this point, you cannot add these shortcuts to an Automation Task. This is primarily because these shortcuts are virtual placeholders within the program – they only are meta-functions. However, if people thing that this would be useful, I’m certainly happy to go back and figure out a way to make these a part of the task automation function. MODS=>RDF XSLT stylesheet added to the XSLT repository I’ve starting to look at ways to - Make the generation of linked data easier
- Provide tangible linked data examples from MARC
As part of that work, I’ve been working with a MODS=>RDF (linked data) example created by Stefano Mazzocchi in 2006, with edits. For users interested in following that work, or playing with it themselves, they can download the stylesheet from the MarcEdit XSLT repository. As part of this work, I’ve found on enhancement that I’ve started working on – and that is the ability to chain XSLT processes together. Currently, if you want to use this stylesheet from MARC, you will need to translate the data from MARC=>MODS, and then run a second process translating the data from MODS to RDF triples. Ideally, I’d like to make that one step – so I’ll be spending some time looking at how that might be accomplished. Getting the update In addition to the updates listed above, I made a handful of minor changes to the program. The majority of these changes represent usability or code optimizations, but there are there nevertheless. If you want to get the update and you currently have MarcEdit, you can download the updated application through the automated updater found within MarcEdit, or you can get the update from: - MarcEdit Website: http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html
- Windows 32-bit download: MarcEdit_Setup.msi
- Windows 64-bit download: MarcEdit_Setup64.msi
- Alternative Windows/Linux/Mac Download: marcedit.zip
Again, have a safe and merry Christmas everybody, --TR
MarcEdit 5.6 available Posted: 2011-10-02 20:39:41
MarcEdit 5.6 has been posted for download. If you are currently running MarcEdit via Windows and have the automatic updating function enabled, you’ll be notified of the update automatically. The current changes to the application are as follows: - Bug Fix: Swap fields tool -- when moving data from a variable field to a control field using a find criteria, the newly created control field would inherit the indicators and subfields of the swapped variable field.
- Bug Fix: Screen resolution -- When using MarcEdit with the DPI resolution set above 100%, the controls were not redrawing properly
- New Feature: MarcEdit includes a tool for querying registered xslt crosswalks and downloading them for use with MarcEdit (see: http://people.oregonstate.edu/~reeset/blog/archives/964)
- Enhancement: MARCEngine diacritics expansion -- MarcEdit's diacritics engine has been expanded to recognize ~70 new diacritic mnemonic codes. (http://people.oregonstate.edu/~reeset/blog/archives/966)
- Enhancement: MarcEditor Intellisense -- when placing a { in the MarcEditor, MarcEdit will now display a list of available diacritic mnemonic codes. (http://people.oregonstate.edu/~reeset/blog/archives/968)
- Enhancement: MarcEdit Options: Users can disable the MarcEditor Intellisense by modifying the value found in the MarcEditor tab.
- Enhancement: MarcEdit MARCValidator: When validating data from within the MarcEditor, a new option (identify invalid records) has been made available. This function specifically looks for structural issues with the mnemonic file format that can cause issues related to record compilation or cause issues related use of the global functions. Errors are reported and ranked based on their likelihood of causing record compiling issues, using the scale of Warning, Serious, Critical.
- Enhancement: Updated Help documentation.
- Enhancement: Users can now place data marked for the 260c into the 008 when using the delimited text translator.
- Informative: MarcEdit version number has changed to 5.6
For information about individual updates, please view subsequent posts. --TR
MarcEdit 5.6’s XSLT Registry Posted: 2011-10-02 20:05:31
MarcEdit’s XML Functions tool gives users the ability to create and register new XSLT functions with the MarcEdit Application. However, one of the questions that occasionally comes up is how do I share or find new XSLT’s that may work with MarcEdit. When a user installs MarcEdit, I provide as part of the installation package a set of XSLT’s for users doing common XSLT translations to and from MARC. However, I write many more that I don’t share and am occasionally sent XSLT’s from users interested in making their crosswalks available to a larger audience. So, I’ve added to MarcEdit the first phase of an XSLT registry. This first part of this update will allow users to search and download XSLT’s that I have an added to the registry. The next enhancement related to this feature will allow users to contribute their own XSLT’s to the application. So how does it work – well, it’s pretty simple. I’ve uploaded a YouTube video here: XSLT Crosswalk Repository Hopefully this will answer questions – however, if not, feel free to post them here. --TR
|