Jul 13 2006

MarcEdit 5.0 updates (preupdate moved into download area)

Ok — I ran the preupdate through some more extensive testing and I’m happy with it.  I’ve also made a few other changes.  The biggest changes:

  1. Entity resolution — ability to set entity resolution by translation.
  2. Installation improvements — I’ve made some changes to the program so now MarcEdit 5.0 can be installed on machines where the user has Guest access.  Basically, so long as the machine has the .net framework installed and the user can create a directory, the user can install MarcEdit 5.0.  How does this work considering that MarcEdit uses the GAC?  Well, I’ve configured the program so if the GAC isn’t present, it falls back to the library in the application directory.  Seems to be working well and have confirmed from others that this seems to be working.

I’ve got a few other updates in the hopper.  The Saxon.Net code, addition of a form of the RLG report card (for EAD validation), python generation of the Script Maker, etc.

As always, the download is from: MarcEdit50_Setup.exe

–TR


Jul 12 2006

Character conversion in .NET

I had someone who uses UniMARC ask me about some problems that they were having with the conversion between Unimarc to Dublin Core.  Some of the characters where being skewed when the translation occurred.  The problem of course relates to the characterset that the Unimarc records are encoded in.  Because a number of MARC formats utilize the MARC8 characterset, I’ve code MarcEdit to expect either MARC8 or UTF8 when moving to XML.  The reason for this is because the MARC8 characterset overlaps with the ISO 8859-1 (cp 1252) charactermap.  Since the UniMARC data was in the ISO 8859-1 encoding, the overlapping elements were translated into Unicode as if they were in MARC8.  Ugh.  Fortunately, MarcEdit provides a facility that allows users to migrate their data between other formats and UTF8.  This layer allows users to move data from any supported characters (any available windows code page) to UTF8 and then back to MARC8 if necessary.  So for this example, I was able to recommend that the user use this character tool to convert the data into UTF8 and then process the file into XML.  It adds one step extra step, but it works for now.  I’m thinking in the near future, I’ll likely add an option in the Preferences to allow users to set a default characterset.  This will allow the MarcEngine to internally handle these problems easier when moving between MARC and XML.

One of the things I’ve been happy with in .NET has been the ease of moving between charactersets.  As some of you may know, the MARCEngine in MarcEdit has traditionally be written in assemblier.  This meant that I wrote my own character conversions for the most part — making the process fairly tedious.  In C# however, this is handled in a couple of lines of code.  So for example — If I was openning a file in windows codepage 1252 and needed to convert it to codepage 1250 or even UTF8:

string s = “”;
byte[] in;
byte[] out;
System.IO.StreamReader reader = new System.IO.StreamReader(@”c:\test1252.txt”, System.Text.Encoding.GetEncoding(1252);
System.IO.StreamWriter writer = new System.IO.StreamWriter(@”c:\testutf8.txt”, false, System.Text.Encoding.UTF8);

//Read the file in
s = reader.ReadToEnd();
in = System.Text.Encoding.GetEncoding(1252).GetBytes(s);
out = System.Text.Encoding.Convert(System.Text.Encoding.GetEncoding(1252), System.Text.Encoding.UTF8, in);
writer.Write(System.Text.Encoding.UTF8.GetString(out));

reader.close();
writer.close();


Jul 11 2006

Has anyone ever used Saxon.NET?

Within MarcEdit, the MarcEngine makes use of the .NET System.xml libraries.  This has been a very good thing in that Microsoft has a very special class setup specifically for XSLT translations — and by adding some additional code and extending the class, I’ve been able to create a very fast XSLT processor.  The downer however is that it has the same limitations as the System.XML library — i.e., it only supports versions XSL 1.0.  For a while, I’ve been evaluating the Saxon.NET library and it looks very promising.  I’m going to do some testing with the library to see how it compares speed wise, but I really like the fact that it supports version 2.0 of many XML processing languages.  I use saxon on unix all the time, but if anyone has any particular experience working with the .NET version of the Saxon library, drop me a line and give me your thoughts.  I’ll post my impressions later on.

 

–TR


Jul 11 2006

MARC21XML=>Static OAI XSLT Stylesheet

So I noticed David Bigwood mention on his blog (http://catalogablog.blogspot.com/2006/07/static-oai.html) that he wished that there existed an XSLT stylesheet that moved data from MARC21XML to Static OAI.  Well, ask and you shall receive.  While I doubt that David will be able to make use of this (it sounds like he’s finished with his project), maybe someone down the line will find it useful.  I’ll eventually fold this into the MarcEdit xslt sample repository.  Until then, you can download it from here: http://oregonstate.edu/~reeset/marcedit/xslt/MARC21slim2StaticOAIDC.xsl

–tr


Jul 11 2006

MarcEdit 5.0 pre-update

There are a few folks that have been requesting this — so I’ll make it available for those wanted to try it.  I’ll formalize the update later this week.  So what does it do?  The change is in how the MarcEngine handles DTDs.  By default, I’ve turned DTD validation off in order to make it easier to share files that have dtds attached.  Unfortunately, the problem is that for users that utilize entities, they won’t be resolved.  So, I’ve modified the MarcEngine so that users can define if a particular transformation should resolve remote entities.  I’ve tested it briefly and it seems to work fine.  However, I’ll be having a few folks test over the week (and I’ll test), so folks can expect a formal, tested update later this week.

Here’s a powerpoint showing exactly how to turn this on: Resolving DTD/Entities in MarcEdit

The information update is found at: MarcEdit50_preSetup.exe

–tr


Jul 10 2006

Coding block?

I’ve heard of writers block, but coders block?  Who knows.  All I know is that I’ve been in a bit of a funk lately.  After pushing out the first version of our metasearch tool, our group sat back and looked at retooling the program in Ruby.  Partly to make it easier to do R&D prototyping faster, partly to hide the XML backend a little bit more from the UI designers.  Great.  I spend 3 days recoding much of the backend components.  I actually found I like ruby quite a bit.  Then I started travelling.  I had a couple of speaking engagements, then ALA and next thing I know — I’m having a really hard time getting started again.  And not just this project — most all of them.  So what to do –

Well, I took some time over the 4th to take a break with family and then went back to working on MarcEdit.  This is one of my favorite hobbies — working with MarcEdit and crunching metadata — and I think it worked.  I spent a number of days last week engrosed in finishing the script maker and after finding a stopping point, starting working in Ruby again.  Last night, I spent a good couple of hours working on ruby again.  I’ve been needing to write a wrapper class around the protocol classes that I’ve been developing, and was actually able to make quite a bit of progress, so whooho.

Its odd though — I’ve never had the interuptions of travel throw me out of wack like it did this time.  And it was funny because I really wanted to do a lot of coding — but each time I sat down to do it — I just drew a blank. 

–TR


Jul 10 2006

Qwest customer service — the 8th circle of hell

Agh — I thought I’d taken care of this last month.  Earlier this year, I’d signed up for Qwest DSL.  It was great — we finally had high speed and for the ~8 months I was with them, their service was exactly what I paid for.  However, our little city of Independence, OR laid their own fiber optic cables and are providing a minimal 5 MB line for less than what I was paying Qwest at the time.  Given that I like to buy local whenver possible, I made the switch last June.  This was when I discovered that Qwest must be cutting their customer services department because I was met with a voice directed customer service nightmare.  It was like the 8th circle of hell.  This stupid little recording asking me, “What would you like to do today?”, followed by, “I’m sorry, but I don’t understand your request”.  Followed by a list of requests, followed by a list of requests, but numbers.  This maze of frustration lasted for ~15 minutes before I finally got to talk to a living, breathing person. 

Fast forward to July — this months Qwest bill arrives — and still, I’m seeing DSL service on the bill.  So, I go and get myself a couple of IBprofin, a double hot chocolate and again, tackled the Qwest customer service machine.  The annoying voice is back on the line — asking me again, questions that I know it can’t answer.  Why won’t it stop…another 15 minutes of navigating through this mess and I get a person — who is able to quickly help me.  I wonder how many calls qwest gets that are hangups?  That seems to be the only reason I can think of for having such a painful system is to reduce the number of calls to your customer service through attrition.  Dante would be proud.

–Tr


Jul 8 2006

MarcEdit 5.0 Update

Ok — so I finished testing some of the generated code snippets and everything looks like its working.  So I’ve posted the update.  The program includes the changes noted from this following post: http://oregonstate.edu/~reeset/blog/archives/292.  It appears that the perl generation is working well and I corrected a small error with the vbscript generation.  So download it and give it a try.

As always, you can get the update from: MarcEdit50_Setup.exe

–TR


Jul 8 2006

MarcEdit 5.0: Generate Perl example

So I’m wrapping this up as we speak.  I’ve been testing some of the instances and noticing that I’d left out a small block of code in the modified subfield section — so I’ll be finishing that tonight with the hope to post tonight.  But I thought I’d include a sample perl script generated by the wizard.  This adds a single field to a marc file.  As with the vbscript wizard, I think that the most useful function of the script wizard will be to provide a set of template code for processing MARC records in a number of different languages — but it also does fill a gap for folks with very little scripting knowledge, so hopefully, this will continue to fullfill this need.  Anyway, here’s the example: Generated PERL script

–tr


Jul 8 2006

MarcEdit 5.0 Updates

I won’t be posting these till tomorrow or Sunday, but here’s what’s coming:

  1. Script Maker:  Script maker will now support the generation of PERL scripts.  I’ve added the code to the interpreter and now its generating PERL scripts.  I need to test them.  Very likely, I’ll be fixing a few problems with them once I make this available (I’m not a particular fan of PERL, so it doesn’t just come to me), but that will be OK.  I figure once folks get a chance to start trying this out and generating some scripts — I’ll get a bit of useful feedback.
  2. MarcEngine — As noted in an earlier post (http://oregonstate.edu/~reeset/blog/archives/289), I made a change to how regular expressions are treat in the MarcEngine.  This has made the COM component much more responsive and in sync with the rest of the application.
  3. MarcEditor changes:
    • Modification to the Swap fields function: Made the function more like 4.6.
    • Modification to the Edit Subfield function:  Enhanced the conditional subfield insertion functionality.
    • Added an additional check when closing the file to make sure that all changes done in a session create a prompt when exiting the application.
    • Added a close button to the file menu.  This will close the current document while keeping the editor open.
  4. MarcValidator Changes:  In response to Bryan Baldus’s comment regarding validation on a subfield rather than field level — I’ve added the ability to do either.  If you want to validate on an entire field — you use the valid syntax.  If you wish to validate on a specific subfield within a field — you use valid[subfield] (i.e., valida) and then the same syntax as the valid tag.  So for example — expanding the 020 example:

In the below example, only the subfield a and subfield z will be validated.

020 R INTERNATIONAL STANDARD BOOK NUMBER
valida [^0-9x] Valid Characters
validz [^0-9x] Valid Characters
ind1 blank Undefined
ind2 blank Undefined
a NR International Standard Book Number
c NR Terms of availability
z R Canceled/invalid ISBN
6 NR Linkage
8 R Field link and sequence number

 Anyway, I have to test the PERL changes a bit more before putting them out in the wild — but expect to see them late Saturday, early Sunday.

–TR