Jan 16 2006

MarcEdit 5.0 refresh

I fixed two issues this morning:

  1. When using the Extracted Selected Records or Delete Selected records utility — if the display field criteria appears in the record multiple times (i.e., multiple 090 fields), then the field count is skewed and the export will grab the wrong records.  This has been corrected.
  2. When openning a large file in preview mode and then selecting a new record (clearing the screen) the program would assume that it was still in preview mode and an error would occur when saving.  This has been corrected.

As always, the download can be picked up from: MarcEdit50_Setup.exe


Jan 13 2006

Some people….

I was riding my bike home this evening and I’m not quite sure what gets into some people.  When I first started riding between Independence and Corvallis 3 years ago, I use to get a lot of kids in their big truck honking or yelling at me on my ride — particularly in the winter when it was dark and rainy.  It’s been quite some time since that’s happened…till today.  I was coasting home this evening and a truck full of teenagers driving in the opposite lane decided that it was worth their time to slow down so that they could shout obscenities before squealing their tires and driving away.  I’ve never figured out what these types of folks get out of this — but hopefully they enjoyed it. :)  Then, after riding about 3 more miles, just outside of Independence, I ran over a string of barbed wire that was strung across the bike lane.  I’m not sure if was placed their purposefully (you’d be surprised how often this happens — particularly in some roads or days of club bike rides…) but I figured out what caused my flat tire yesterday.  (and I was wondering why it looked like my tire had been shredded).  Fortunately, it doesn’t look like it punctured my tire today.

[Updated Jan. 13, 2006]

Well, I’m pretty sure the barbed wire was deliberate.  I was riding in this morning and found a 6 foot long string laying across the other side of the road as well.  Hopefully this was a one-time stunt, but if I see them return, I guess I’ll have to let the good folks in the Polk county sherrif’s office know about it (the county sherrif’s and ODOT for that matter are actually really great about looking out for bikers.  I’ve had few problems on my ride, but when I’ve run into issues with debris or road tacking, these two agencies have been fantastic).

–Terry  

 

 

 


Jan 13 2006

ETDs and automatic MARC generation

Since Sept., OSU has started accepting ETDs in electronic format.  The materials are stored in Dspace — and this has been a bit of a boon for us.  I’ve spent a little time this week putting together an xslt stylesheet that can be used, in conjunction with MarcEdit 5.0’s Metadata harvester, to harvest the metadata from Dspace and automatically generate MARC records for these items.  Its a pretty slick process — the only part that I can’t quite figure out is our keyword/subjects.  Since we are getting unqualified DC out of Dspace, I can’t determine which dc:subject is a keyword and which is an LC term — so for now, we are just generating all as LC subjects, and having the keywords removed at the point of final review. 

Anyway, the process is a simple one — in MarcEdit 5.0, there is a function in the MarcEditor called Metadata harvester.  This is an OAI harvester that supports a number of different metadata types.  Its looks something like this:

Metadata Harvester

Anyway, setting the config properties and harvesting the site produces very good MARC records…here’s an example:

 marc_rec.PNG

Some of these fields are generated via the XSLT, some taken from the Dspace Metadata, some configured from the Dspace data.  Of interest, the subjects are encoded correctly by using a special template that examines the structure of the subject and then determining the type of each part of the field — however, unfortunately, not all data encoded in the 650 field are actually LC subjects.  We mix student provided keywords with our LCSH subjects within multiple subject fields — but you can’t tell the difference between the two types when harvesting via the OAI.  I’m hoping when I look at Dspace that maybe I can have it suppress export of specific fields within a collection.  Pretty interesting.

I’ll let folks know how this works out as we start to implement this new workflow.

–Terry

 


Jan 13 2006

MarcEdit 5.0 Refresh (Jan. 13, 2006)

How exciting…I hope folks don’t mind the regularity of the updates recently but I’ve been having a great deal of fun lately scrubbing the code and putting the final touches on the application.  Its made for some longer nights this week but I definitely think that its been worth it.I’ll update this message tomorrow morning and can get the formal changelist that I’d written, but I wanted to post a quick note to let folks know that I’ve posted a new update.  The cleaning of code and and filling in places that I’d skipped continues.  This week has been a great week for me, Larry Dixon of the U.S. Library of Congress has been putting MarcEdit’s new UTF-8 conversion function through its paces and has come across a number of places where functionality that should work (i.e., it works fine when using the default editing functions) but doesn’t.  That’s what has spanned this most recent list of updates (six in all).  Here’s a short list of what’s been fixed.  As noted, I’ll provide expanded descriptions tomorrow. 

I’ve dealt with the following issues:1) SRU client — Validation error from resulting file in XML Spy — schema location incorrect when saving records retrieved in MARC21XML.  I’ve corrected the location so this should validate now.

2) When translating from MARCXML to MARC, spacing in the 010 is lost making it incompatible with the pre-2001 LCCNs.  I’ve corrected this…the XSLT file that handles the final conversion has been modified to keep all characters but: characters 10, 13 and 9 (decimal notation). 

3) Change the leader byte 09 from ‘a’ to blank when converting from UTF-8 to MARC-8.  Done.

4) When starting up — receiving the following exception:  Object reference not set to an instance of an object.  Done — I modified the xml config save code so that it will handle these situations where the parent node isn’t present within the config file.  So when the program runs, if the key isn’t available when saving, it will generate it for you. 5) Swapping double ligs for double tildes when records use depricated codepoints.  Done — The current defined encoding worked correctly, but when I added support for the previous encoding format…well, I’d just flipped the values.  Easy to fix.

6) Character conversion tools UTF8 to MARC8 not working correctly:  The file stream was being openned in an incorrect character encoding.  Its what I thought the problem would be since the actual functions themselves (which the Maker and Breaker use) works fine.As always, the program can be downloaded from: MarcEdit50_Setup.exe 

–Terry

 

 


Jan 12 2006

Rivers still rising

So I’ve taken to keeping an eye on the Luckiamute River near Suver, OR (http://ahps2.wrh.noaa.gov/ahps2/hydrograph.php?wfo=pqr&gage=suvo3&view=1,1,1,1,1,1).  Hwy 99 crosses this river in a low spot and I’ve been watching this river go up and up and up.  Fortunately, I don’t think I’m going to see this cover the road.  It might flood just about everything else around it — but there’s just so much low-lying farm land that essentially, a new 12 foot deep lake will need to develop before water hits the highway — so I think I’m fine…I hope we’ll be fine.  I’m not sure I’d like to have to wade across a flooded road on with my bike…whew, I’m getting the chills just thinking about it.

–Terry


Jan 11 2006

MarcEdit 5.0 Update

Alrighty,

I’ve posted the update that corrected the two earlier noted bugs.  The first correction had to do with translating UTF-8 encoded MARC records into MARCXML and then back into MARC-8 — this was an easy fix and should be completed.  I’ve verified it against the 102 test records that had been set for review.  The second correction had to do with the translation of some UTF-8 Cyrillic Characters from UTF8 MARC to MARC8.  I’ve tested this against the about 30 cyrillic records in OCLC and it appears to be working correctly. 

As always, the program is ready for download at: MarcEdit50_Setup.exe

–Terry


Jan 11 2006

MarcEdit 5.0 bug report fyi

Just an fyi –

I’d received some test files today for a small set of records where some diacritics were being mangled when translating data from MARCXML to MARC in MARC-8.  I investigated, and there is a very narrow bug in the application that I’ve corrected and will post as an update this evening.  The problem only exists when doing the following:

  1. Have a MARC record already in UTF-8
  2. Convert the UTF-8 MARC record to MARCXML with a narrow band of characters (about 3 combining ansel characters, and these characters must be the last character in the byte stream)

The problem was easy to correct but should never have shown up in the first place.  It was spawned from a lack of test data (we don’t have a unicode database, so all my tests generating MARCXML records have been with MARC-8 record that are converted to UTF-8 on the fly during the conversion).  So here’s why it happened…because MarcEdit must support both the MARC8 and UTF8 charactersets, the program includes code to handle data at a byte (in the case of MARC data) and character (for XML data) level.  Rather than forcing users to specify the characterset used within a set of records, I’ve made the MARCEngine smart enough to detect what characterset is in use (a real pain, when you consider that MARC records cannot use the BOM markes to designate characterset — so instead, a bytes characteristics must be evaluated to determine its characterset).  Yes, there are MARC fields for setting characterset, but I find that they are unreliable (i.e., unused by most systems).  So, the program uses a custom algorithm that can read a set of bytes and determine if those bytes make up a UTF-8 character.  Within the part of the MARCEngine that handles MARC data processing (Maker and breaker) — this algorthem works fine (which is why the error doesn’t affect this part of the program).  However, in the XML API, I tried to create a “lite” algorthem that wasn’t quite robust enough.  So, I’ve taken the old algorthem out and inserted the robust algorithem and everything works again. 

In the process of testing the algorithems, I’ve found one other error that I’ve corrected as well.  One of the difficulties with doing UTF-8 translation is that that there are multiple code points for some characters.  If they are translated from MARC-8 to UTF-8 using the LC defined tables, they will be one code point….if they are entered from a keyboard, they will be another code point.  To accommidate this, I’d created a catch function to handle these characters — and it ended up trapping some characters that shouldn’t have been trapped.  This affects any translation from UTF-8 to MARC-8, though was limited primarily to Cyrillic scripts (since their translated characters matched the extra codepoints).  I’ve corrected this as well and this too will be in tonights update.   

Anyway, sorry for missing these two (I guess I’m fortunately that most systems still don’t export and import unicode data) and I will have it uploaded and fixed tonight.  I’m also going to spend some time downloading some records from OCLC to get 880s so that I can test other charactersets as well.

Cheers,

 –Terry


Jan 11 2006

MarcEdit 5.0 components updates

New updates already…I was playing with MarcEdit this afternoon and run into a couple of small bugs that I squashed this afternoon.  I made the following changes:

  1. MarcEditor Printing:  Oops, I made a change last night and broke the MarcEditor printing.  To make is so that the program could print fonts correctly, I’d modified the print code which made it so that blank lines stopped printing.  Well, that’s been corrected.
  2. MarcValidator — Another oops.  When records have an 880, the program crashes because I didn’t define indicators.  Well, I shouldn’t have to (that’s how the program was defined) for checking — everything is suppose to be customizable, so I corrected the program so that this won’t be a problem any longer.
  3. MarcEditor — in preview mode, if you try to compile the program, it will throw an error because the file pointer wasn’t being freed.  This has been corrected.

Finally, I think I’m going to remove the Beta tag from the program at the end of this month if I finish the last pass of Help documentation that I have planned.  It won’t complete the help, but the program isn’t really a “beta” program any longer and I’d like to start seeing folks migrate away from the 4.6 version if possible simply because most questions that get asked regarding functionality is generally in the 5.0 version at this time. 

As always, the new update is at: MarcEdit50_Setup.exe

–Terry


Jan 10 2006

Cycling in the rain

Its still storming and raining here in Oregon.  Today was great.  This morning, I road into work (25 miles) and battled with wind all the way in.  Fortunately, this is what I wanted today.  However, coming home — the wind was gusting up to 35 mph with constant 13-15 mph wind…I was fast.  25 miles in just under 55 minutes — which in the rain and the dark was a little faster than I usually like to ride.

 –Terry


Jan 10 2006

MarcEdit 5.0 Update

Alright, I’ve been doing a little bit of interface work on the 5.0 MarcEditor to make editing 006/008 fields easier for users.  So, I’ve made the following changes:

  1. 006/008 Editing Windows.  These are found under the Edit menu in the MarcEditor.  It will edit an existing field or generate a new field.   A couple of notes — this isn’t a batch utility tool.  This is for editing individual records.  Because of this, this function is really only useful if being used on a record set between 0-3 MB.  Larger than that, and you will likely see a noticeable flicker if the program has to turn off the WordWrapping option (required to use this option) running the program.
  2. Generate Control numbers:  The program can now be setup to generate a control number each time  a new record is generated.
  3. New option — you can set an option in the MarcEditor so that new records are appended to the current file — rather than opening a new file.
  4. Metadata Harvester update — I found that when dealing with CONTENTdm servers, something about the resumption token wasn’t working with the server.  I’ve corrected the code so that it does a better job handling varieties in server supplied metadata.
  5. Metadata Harvester Update — You can now translated directly to MARC-8 from the harvester (rather than just to UTF8)
  6. I made a small change to the install file to see if I can do a better job finding the version of the MDAC installed on one’s computer.  The program essentially is looking at a registry key — however, the thing that is a pain is that Microsoft doesn’t make it easy to find the version number of this component.  In fact, there own tools cannot definitively tell you what version you have — it only compares multiple file points to come up with a best guess.  One note however, since the program is reading two registry keys (to discover the .NET Framework version number and the MDAC version number) — you need to make sure that you have permission to read registry keys.  What I’m thinking of doing in the future is basically check the user permissions, and if they cannot read the registry, simply adding a message letting folks know what the requirements are and then allowing installation anyway (rather than simply failing during the install process — which happens now).
  7. XSLT file edits.  I’ve edited the primary MARCMnemonic xslt (which is used in every translation into MARC) to remove the prepending and trailing spaces from a field since there isn’t really cases where this should be present, and they tend to cause issues when translating into MARC.  Also, I’ve modified the OAIDCtoMARCXML xslt.

As always, the program is available for download at: MarcEdit50_Setup.exe

 –Terry