Mar 19 2010

MarcEdit 5.2 update

I’ve been busy this week working on a few more updates. This week, I was able to squash a couple of bugs, fix a few UI issues and add a couple of enhancements. Here’s the highlights of what’s been corrected:

Bug Fixes:

  • MARCEngine: When converting to UTF-8 using the MarcMaker – the leader byte indicating that data is UTF8 is now being set.
  • MARCEngine: If you break a file and convert to UTF8 and then Make the file, and convert the file to UTF8, the diacritics would get mangled. When you perform this process in MarcEdit, you only have to do it one (so break – then don’t do it again, or make and don’t do it again – because MarcEdit will always treat that file as UTF8). However, many people check the box twice and this was causing a problem. This has been corrected.
  • Task Editor – When attempting to edit a delete field task, it will raise an error message. This has been corrected.
  • MarcEditor Tools Menu – Defined Tasks: If you opened the Manage Task , then close it – the Editor will duplicate the defined tasks. This has been corrected.
  • Help button in the Edit Subfield tool when editing control fields has been linked.
  • MARC Spy link has been added (under the MARC Tools FAQ)

Enhancements:

  • Task Manager and Edit windows have been configured to be resizable.
  • Swap Field Function has been added to the Task Automation tool.
  • Windows Character Map has been linked to the MarcEditor – under the Edit Menu Item.

These are the highlights. As always – if you have questions, comments or suggestions for enhancements, let me know.

For those of you running a current version of MarcEdit, your program should prompt you to update the next time you start the program. Otherwise, you can download the updates from:

· Windows Installer: MarcEdit_Setup.msi

· Zip File: marcedit.zip

–tr


Mar 14 2010

MarcEdit 5.2 Updates

First, a big thank you to folks that sent me feedback on the new Task Automation tool. I’ve taken that feedback and put together an update to address some of the usability questions. Changes made for this update:

  1. Ability to clone existing tasks
  2. Ability to rename tasks
  3. Ability to reorder tasks
  4. Ability to Edit Task steps
  5. Bug Fix: Regular Expressions in tasks were being escaped and not processing correctly. This has been fixed.
  6. MarcEditor – Tools Menu. New option added as a child to assigned Tasks – this allows users not only see tasks with associated shortcuts – but also see and run any defined task.
  7. Export Settings – a new little function found on the main window, under file – that allows users to export their user settings. This will package items together for re-import into MarcEdit. This function is useful when users wish to share configuration files, tasks, macros, etc. between machines.
  8. Import Settings – imports a settings package back into MarcEdit. When you import settings – these overwrite your existing configuration settings. So, if you import tasks, macros, XSLT functions – these will replace any locally defined functions.
  9. There were a handful of UI changes primarily to the task window.

I’ve added a couple of additional tutorials to YouTube to address the changes mentioned in this update. Links to these:

  1. MarcEdit Task Automation Maintenance: http://www.youtube.com/watch?v=fnorN0MFFN0
  2. MarcEdit: Exporting local Settings to another computer: http://www.youtube.com/watch?v=uR8pcpQ8IUs
  3. MarcEdit: Importing settings to your local computer: http://www.youtube.com/watch?v=248dgSgHlmg

As always, updates can be found at:

–TR


Mar 8 2010

MarcEdit 5.2 Update

Hi all,

I have just uploaded a new version of MarcEdit to the website.  This version specifically addresses two bugs and introduces the Task Automation function into the application. 

Bug Fixes:

  • Changes not saved in the MarcEditor:
    Under certain conditions, MarcEdit would lose track of changes made within the program.  This occurred primarily after changes had been made and then the Find All Function was used.  This has been corrected as of this version.
  • Save As…file not found error:
    When using Save As to Save data, MarcEdit would throw an error if the file being saved did not previously exist.  This has been corrected as of this version.
  • Invalid prompts to save a file before closing:
    MarcEdit tended to error on the side of always asking users to save their data before closing.   However, even if a user had saved their data, the message may still have popped up.  This has been corrected as of this version.
  • Find/Replace – replacing without using Match Case:
    While it is always recommended when doing global replacements to do them using the Match Case option – prior to this version, unchecking the match case option would cause the replacement to take too many characters.  This has been corrected as of this version.

Enhancements:

  • Task Automation tool:
    The task automation tool is a recorder that allows users chain together replacement functions.  Sadly, I haven’t been able to add information to the help file, however, I did record the following video tutorial to provide some initial background to get started using the function.  One quick note – this is a new tool so when using it, please verify changes.  Moreover, feedback is definitely appreciated.
    Video Tutorial:   http://www.youtube.com/watch?v=gmqTGfTubU4

You can download the updated version of MarcEdit at: MarcEdit_Setup.msi or the Linux/Mac/Other version build at: marcedit.zip.

–TR


Feb 21 2010

MarcEdit 5.2 update

Thanks for the feedback everyone. I spent some time this week try to see how many of the requests I could accommodate. So, I posted a new update which includes the following changes:

Bug Fixes:

1) MarcValidator bug has been squashed – this bug would cause items validated from within the MarcEditor to be incomplete (some records would be skipped).

2) Delimited Text Translator – when translating from Excel, if the first cell in the last column is empty, the column won’t show up in the field definition window. This has been corrected.

3) Field Count – when generating the report on an invalid formatted file and an exception is thrown, it wouldn’t be trapped correctly. That shouldn’t be the case any longer.

4) Installer updated – there has been a hard to locate issue that would occasion lock files when updating MarcEdit. Generally, this problem bit people who had installed a previous, non-MSI installer based version of MarcEdit. I believe that I’ve correct (or made pretty much suppressed) the conditions in which this error would occur. Essentially, I’ve re-written the bootloader that the installer uses and moved all helper code into an MSI code library. Since this is the official recommended method for injecting processes into an install using the MSI engine – I think that this should help matters. However, I’ve never been able to recreate the install problems – so when you try this update – if you get an error – you can correct the problem by:

1. Uninstalling MarcEdit

2. Removing all files (not folders) from the MarcEdit program directory (generally c:\program files\marcedit 5.0)

3. Reinstalling MarcEdit

4. And then let me know what error you get so that I can continue to work on isolating the rare install issues when they occur.

Enhancements:

1) MarcValidator has been updated to check for dangling or incomplete subfields. This means that a field like this:
=500 \\$$aTest or =500 \\$aTest$ will be flagged as an error.

2) MarcValidator has been updated to check for invalid punctuation/spacing at the beginning of subfields. This means that a field like:
=500 \\.$aTest will be flagged.

3) MarcValidator has been updated to add an option to note if a field is obsolete. You can turn this on by adding: obsolete to the field block codes. Please see the help file for an example, or the master rules file in the MarcEdit program directory.

4) MarcValidator has been updated to allow fields to be paired. For example, you can pair a 490 with an 830 – and if one is note present, have the validator generate a warning.

5) Updated the help file to include full specs on the MarcValidator rules file formatting options.

6) Find All has been updated so that if in preview – will automatically switch to paging mode. The Find All function was designed to work in Paging Mode. If you are in Preview mode and using the Find All – it will load the correct snippets but the window isn’t editable. When using Find All, the program automatically turns preview mode off for the duration of the editing session.

7) Delimited Text Translator has been updated to support Excel XML and Access XML (2007) formats (.xlsx & aacdb formats). You must have either MS Excel 2007 or MS Access 2007 (or the runtimes) installed.

8) MARC Spy – When you open MARC Spy, the program automatically prompts you to open the file for examination.

9) And in case you missed it – in the last update, I added an option to search all record data in the Extract/Delete Selected records utility. This option should allow folks to do much more robust queries against record files.

You can pick up the update from:

· MSI Installer: MarcEdit_Setup.msi

· Zip file: marcedit.zip

As always, if you run into problems, let me know.

–TR


Feb 15 2010

MarcEdit 5.2 Update

I’ve posted a wide ranging update to MarcEdit that includes a number of new features, enhancements and fixes.  If there was something that you had requested and I said that I would get to it in the next update and you don’t see it on this list, let me know.  I might have added it but not noted it in my update list (or, I dropped the ball, in which case, I’ll make sure it makes the next round).  If you have a current version of 5.2, you should be prompted for the updated automatically.  If you have an older version of MarcEdit, you can install this version over the top – but it may be easier to simply uninstall the previous version and install the new version of 5.2.  Your config settings will be preserved on uninstall.  As always – if you run into any problems – give me a holler. 

You can download from the following:

· Windows Download: http://oregonstate.edu/~reeset/marcedit/software/development/MarcEdit_Setup.msi

· Linux/Mac download: http://oregonstate.edu/~reeset/marcedit/software/development/marcedit.zip

Bug Fixes:

1) Preview Mode/Paging file pointer loss – I’ve corrected this.  You can now turn on preview mode with the new paging again if you would like.

2) Script Wizard – When using conditions – the “=” sign was being left out of the generated script for field comparison.  That’s been corrected

New Features:

1) Merge Records Tool:  The MarcEdit Merge records tool allows users to take two different MARC records sets and merge the contents of one to another.  The tool right now only works on two sets of mnemonic records.  At the moment, if you have a set of data from a delimited file that you’d like to merge – you will need to run the Tab Delimited Records tool to generate a MARC records file – and then merge the results.  I’ve added the following enhancements for this function to my update list:  Allow processing of MARC formatted data (data not broken into the mnemonic format) and processing Tab Delimited Data. 

2) Mac UI corrections:  I spent a lot of time testing and working on a Mac mini this month and in general – the program is now usable.  Not perfect…there are still UI issues.  But there are few if you use an older version of mono 2.2.5.  If you use an newer version of Mono – you have to run Mono through the X11 server.    I’ve included the Mac install instructions below if you are interested.  As I say – your mileage may vary here.

Enhancements:

1. HTML entity conversion – I’ve added html entity conversion to the UTF8 marcedit record conversion algorithm.  While HTML entities are *not* valid within MARC data – I’ve heard from a few folks that they are present enough to make it worth my while to add them to the translation table.  The translations that will be supported are the most common – the list is below:

o 

o ¡

o £

o ¤

o ¥

o ¦

o ¨

o ª

o «

o ¬

o ­

o ®

o °

o ±

o ³

o ´

o µ

o ·

o ¸

o ¹

o º

o »

o ¼

o ½

o ¾

o ¿

o À

o Á

o Â

o Ã

o Ä

o Å

o Æ

o Ç

o È

o É

o Ê

o Ë

o Ì

o Í

o Î

o Ï

o Ð

o Ñ

o Ò

o Ó

o Ô

o Õ

o Ö

o ×

o Ø

o Ù

o Ú

o Û

o Ü

o Ý

o Þ

o ß

o à

o á

o â

o ã

o ä

o å

o æ

o ç

o è

o é

o ê

o ë

o ì

o í

o î

o ï

o ð

o ñ

o ò

o ó

o ô

o õ

o ö

o ÷

o ø

o ù

o ú

o û

o ü

o ý

o þ

o ÿ

o "

o &

o <

o >

o Œ

o œ

o Š

o š

o Ÿ

o ˆ

o ˜

o  

o  

o  

o ‌

o ‍

o ‎

o ‏

o –

o —

o ‘

o ’

o ‚

o “

o ”

o „

o †

o ‡

o ‰

o ‹

o ›

o €

1. Replace Tool – MarcEditor – MarcEdit will now save the last 10 Find/Replace criteria.  This was done to make it easier for folks using Regular Expressions.  The program will remember the last 10 so, if you use a regular expression or replacement string often, it should be available to you.  I’ve been thinking about allowing folks to simply save grouping that they can then call up at a later date – but not knowing how useful or interested folks might be in something like this – I decided to implement a “lite” version of this to test the waters.

2. Find All Function

  • Window is now resizeable
  • New button that allows you to go back to your current Find All Query with saved query in textbox.

3. Normalized Naming conventions – MarcEdit menus now all say Preferences rather than the cornucopia of text that may have existed previously.

MAC OSX INSTALLATION PROCEDURE:

1.2 INSTALLATION FROM ZIP

a) Ensure that the dependencies have been installed
   1) Dependency list:
     a) Using X11 Server
      i) MONO 2.4+ (Runtime plus the System.Windows.Forms library [these are sometimes separate downloads])
     ii) Install X11 Server (on the Mac Install Disk)
     b) Using Native Carbon Interface
      i) Install MONO 2.2.5 (skip version 2.2.6-2.6.1 — A major bug was introduced crashing any program that uses messageboxes)
     c) 
      i) For Yaz Installation:
         a) Install X11 Server (on the Mac Install Disk…this is used by Mac Ports — but also can be used in rendering MarcEdit if the native Carbon driver causes problems)
         b) Install Xcode Developer Tools (current version). 
            i) Xcode tools are found at the Apple Developer Connection site (http://developer.apple.com/tools/xcode/) or on your Mac Install Disk.  The Xcode tools are required by MacPorts
         b) Install MacPorts (http://www.macports.org/) — follow instructions, here: http://www.macports.org/install.php)
            i) Make sure that you run the selfupdate on MacPorts (sudo port -d selfupdate)
         c) Once MacPorts is installed, install yaz:  sudo port install yaz (this will take some time as it will install both your dependencies and yaz to your /opt/local/lib/ directory.
b) Unzip marcedit.zip
c) Navigate to the MarcEdit program directory and run linux_bootloader.exe (example: mono linux_bootloader.exe)
d) Yaz.Sharp.dll.config — You need to map the yaz3.dll file to the correct Mac equivalent version.  On a Mac, if you install Yaz through the MacPorts, you would set the dllmap to the following:
    i) <dllmap dll="yaz3.dll" target="/opt/local/lib/libyaz.3.dylib" />
e) How do you setup an icon for Mac?
f) On first run: 
   a) To run with X11 server (often times, this is faster and works a bit better than the native Carbon Driver)
      i) cd [marcedit program directory]
     ii) MONO_MWF_MAC_FORCE_X11=1 mono MarcEdit.exe
   b) To run with Carbon Driver:
      i) cd [marcedit program directory]
     ii) mono MarcEdit.exe
  c) Preferences tab will open, click on Other tab, and set the following two values:
      i) Temp Path: /tmp/ (or a defined user folder)
     ii) Mono Path: [to your full mono path — generally /usr/bin/mono

Congratulations — MarcEdit 5.2 should now be working on the Mac.  There are admittedly still some UI issues that I’ll be working on correcting — but at this point, MarcEdit should be fully functional for Mac Users.

–tr


Dec 29 2009

MarcEdit 5.2 update/Linux download

For some time, I’ve been working on a development branch of MarcEdit to provide better (i.e., official) support for the application on Linux.  This was actually one of the big reasons behind the change in program languages from a mix of Assembly/Delphi to C#, as well as my interest and periodic activity with the Mono project. 

Over the past few months, I’ve been working on trying to smooth out the wrinkles in the Linux version – as well as work around a few known issues in the Mono runtime.  I’d honestly hoped to have this work finished back in November, but it never happened, well, it did happen, but then I ran into a new problem – a forked version of MarcEdit.  In November, I’d created a version of MarcEdit that worked great under Linux, but was pretty different in some significant ways (code-wise) from the Windows version, and that simply wouldn’t do. 

So, since November, I’ve been reconciling the two versions – making tweaks to the Windows code to make it more Linux friendly, and working with the Linux code to make it less Linux specific.  It’s taken some time, but I finally finished the process last night.  So, it is with some trepidation, that I can finally provide an official download, complete with Install instructions and support, for the Linux platform.

To that end, I’m providing an update for all versions of MarcEdit.  The Windows version is being updated to complete the reconciliation process.  As of today, the branches have been merged and there is now a single codebase for MarcEdit.  So, going forward, I will ensure that I test MarcEdit on both the Windows and Linux platforms (specifically, Ubuntu). 

Now, to the next question…what about Mac?  Well, what about the Mac?  Long-term, I hope to eventually have Mac support as well, but in many respects, when Mac support happens will be dependent on the Mono Runtime, and the Mono project finding Mac developers interested in doing that work.  I had hoped, some time ago, that Mac switching to a Unix infrastructure would help with compatibility – but the UI used on the Mac has actually made that process much more difficult in many ways (at least, that’s been the case for projects I’ve been associated with).  So, while I hope to extend support to the Mac platform at some point, I am not there yet.  While you can certainly run MarcEdit on a Mac (especially the command line version of MarcEdit [cmarcedit.exe], you will find that there are severe UI issues when running the MarcEdit GUI.  As the Mono runtime moves to 3.0, maybe some of these issues will disappear.  I’ll keep watching, testing, and maybe even simplifying the MarcEdit UI to mitigate some of these problems – but that’s where we stand right now.

*************

For MarcEdit users, the Linux/Other distribution package will be provided as a simple zip file.  Within the Zip file, you will find and Install.txt file.  This file provides install information for Windows users that don’t want to use the MSI installer, as well as Linux users that want to run MarcEdit.  As always, MarcEdit can be downloaded from: http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html or directly, from:

*************

The MarcEdit install instructions have been kept fairly simple…Maybe too simple.  I’m hoping to hear back from users – if you have trouble, find steps missing, etc., I would love to hear from you.  Below, I’ll recreate the install instructions for those interested – otherwise, you can find these in the Zip archive.

–TR

*************

*****************************************
MarcEdit Installation Instructions
MarcEdit 5.x
Author: Terry Reese
Install.txt Last Modified: 12/28/2009
*****************************************

Contents:
  1) Windows Users
     a) Installation using Supported Installer
     b) Installation using Zip File
  2) Linux/Other Users

WINDOWS INSTALLATION PROCEDURE:

1.1 WINDOWS Installer

MarcEdit’s supported Windows installation program can be found at: http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html.

1.2 WINDOWS INSTALLATION FROM ZIP

For those users wanting to utilize MarcEdit but cannot run the installer due to permissions restrictions, MarcEdit can be installed from the Zip file.  Installation of MarcEdit on Windows from the Zip file will remove the ability to utilize MarcEdit’s scripting capabilities — but otherwise, the program will function normally.

Instructions:

a) Unzip marcedit.zip.
b) Open the command console (cmd)
c) Navigate to the MarcEdit program directory, and run bootloader.exe

CONGRATULATIONS, MarcEdit should now be ready to run on your PC.

LINUX/OTHER INSTALLATION PROCEDURE:

1.1  INSTALLATION FROM ZIP

a) Ensure that the dependencies have been installed
   1) Dependency list:
      i) MONO 2.4+ (Runtime plus the System.Windows.Forms library [these are sometimes separate])
     ii) YAZ 3 + YAZ 3 develop Libraries + YAZ++ ZOOM bindings
    iii) ZLIBC libraries
     iV) libxml2/libxslt libraries
b) Unzip marcedit.zip
c) Navigate to the MarcEdit program directory and run linux_bootloader.exe (example, mono linux_bootloader.exe)
d) Yaz.Sharp.dll.config — ensure that the dllmap points to the correct version of the shared libyaz object.
e) main_icon.bmp can be used for a desktop icon
f) On first run:
   a) mono MarcEdit.exe
   b) Preferences tab will open, click on other, and set the following two values:
      i) Temp path: /tmp/
     ii) MONO path: [to your full mono path; likely /usr/bin/mono]

If you encounter problems with these instructions, please contact: <Terry Reese> terry.reese@oregonstate.edu.


Dec 24 2009

MarcEdit 5.2 Update

I just wanted to let everyone know that I posted an update to MarcEdit.  These are the highlights:

  • Z39.50 — Updating UTF conversion so source character set can be specified
  • Swap Field Function — updating the function to allow multiple subfields to link to a single subfield
  • Sorting Fix
  • Find All Update — Fixed some focus issues — hopefully this will make the jumping smoother
  • Export Tab Delimited — included the ability to set one’s own delimiters
  • Installer Updates

If you have MarcEdit 5.2 already installed, you should be prompted for an automatic update. 

You can download directly from:
http://people.oregonstate.edu/~reeset/marcedit/software/development/MarcEdit_Setup.msi

Anyway, I just wanted to wish everyone a Merry Christmas and a safe, happy holidays.

–TR


Nov 18 2009

MarcEdit 5.2 Available

It’s been a lot of work, but I’m finally ready to officially release  MarcEdit 5.2, though with some caveats.  This is the first public 5.2 build (there have been some private builds that have been available to some folks on the MarcEdit ListServ) – and as far as I can tell, potential problems related to the new features have been shaken out.  However, there have been a lot of changes and additions.  Few that affect the MARCEngine itself – but many affecting the MarcEditor, the largest being the new Paging structure.  While I’ve had this version available to people for testing for about 3 weeks with few reported problems – I’m not naive enough to think that I’ve caught everything.  So my advice to people – if you want to try out the new features, work with the new version of MarcEdit – great.  I’ll be keeping the current 5.1 build available for download for a short period just in case there is a show stopper in this build that requires someone to regress to a previous version.  At the same time – I will be monitoring the bug reports closely for a while – so the more people willing to use the updated version – the faster we shake it out.  With all that said, I’d recommend anyone that would usually be interesting in testing Beta/RC quality code to jump right in.  For those squeamish of the bleeding edge, I’d recommend hanging back – maybe until Dec., to see what issues, if any, shake out.  On the bright side, updates should be much easier going forward.  For those that have permission to install applications on their machines – MarcEdit provides an automated updated tool (noted below) – so if there is a need for updates – I should be able to distribute them quickly.

So what has changed in MarcEdit 5.2?  Quite a bit actually.  Let me highlight the most noticeable changes.

1) UI changes:  Right off the bat, if you’ve used MarcEdit, you will notice that it is different.  The main screen has been updated to include icons and reduce some of the functional choices to make entrance into the program a little less confusing. 

image

Additionally, shortcuts and relationships to functionality have been more well defined.  For example, in the MARC Tools window, there was no access to the Validator, the Split or the Join tools.  Now, that has been rectified. 

image

You will find that these types of changes have been made throughout the program.

2) Arabic Right to Left support:  In the MarcEditor – you now have the option to support Arabic Right to Left Rendering and input.  You access this function through the context menu (right click on the MarcEditor when a file is loaded) or by clicking  CTR+SHIFT+R.  I’m considering this Experimental at the moment.  I’ve been working with a few folks in the Middle East and am very pleased with the feedback I’m receiving – so as they continue to work with this feature – I very likely will be making additional changes.  Also, one additional note, Arabic rendering disables the paging at this time.  It was just easier to do it that way.

image

3) File Paging:  While MarcEdit will continue to utilize the Page Preview function – when one loads the Full file into MarcEdit, the program now utilizes a Paging approach to render.  Files are pre-processed and output as pages – with a specified number of records being displayed per page.  Couple things of note.  First, general Find/Replace operations only occur over the page that is displayed – however, all batch editing functions (Replace All, Add/Delete Field, etc.) – these occur over the entire file, not just the currently displayed page.

image

4) Jump Lists (Find All): In order to make finding items easier while using the new Paging Mode, a new Find All and Jump List function has been added to the tool.  This allows users to query the entire file, and then jump to individual records for edit.  Within the new Paging model, using regular Find will find only items in the current page.  Using Find All allows the user to query data across pages.  The Jump List results displays the searched result within the context of the field that it was found it – as well as a record number.

image

image

5) Automatic Updates:  I’ve added an automatic update tool.  If allowed, MarcEdit will query the central download server and track when changes to the program have taken place.   This is implemented one of two ways:
a) Automatic Updates – which are set in the Preferences area.  This works a lot like the Firefox updater.  When MarcEdit is updated, you will be prompted with a note that an update is available.  If you chose to download the update, MarcEdit will download the new installer, close MarcEdit, and then run the installer.  Like any program – you will need to have permission to run the update – but this should make it easier to determine when changes to the program have been made.
image

b) From the Help Menu on the main MarcEdit window – you can find a Check for MarcEdit menu entry.
image

When you are prompted for an update – you will see the following:
image

Hopefully, this will make managing the program easier for individuals.

6) Official Linux Install:  I have a few people shaking this out and I’m working out an automated build process for my linux version and documenting install instructions – but will have a Linux tarball ready for download no later than Dec. 1st.  Does this mean a Mac version is coming?  Hopefully yes – though a lot of it will depend on the next Mono runtime refresh and whether they fix some of the rendering issues with some of the panel/group controls.  If I automate the build process earlier than that – I’ll post a Linux Preview sooner as well. 

7)  Setup program changes:  I’ve updated the installer to do some additional install checks, clean up the 5.1 icons, etc.  I’m thinking about adding some additional switches to the installer to allow adminstrators – specifically those using software like Novell Groupwise to distribute the application, the ability to set some of the configuration options.  I’m curious if folks have suggestions related to the types of options that you’d like to be able to set on install.  Also, a note – on the roadmap for a future point release of 5.2 is the simplification of the installer.  I’m slowly moving installation code out of the bootloader – I’ll continue that process.

8)  And lastly, I want to thank George Mason University again, for being willing to host a MarcEdit ListServ.  This is one of those things that I’ve always wanted to setup – but I honestly just haven’t had the time or desire to be a list administrator.  Having someone step up and fill that void could have big benefits for the user community – so, if you haven’t signed up for the listserv, you can find it here: http://www.lsoft.com/scripts/wl.exe?SL1=MARCEDIT-L&H=MAIL04.GMU.EDU.  The list has a searchable archive – so any questions asked to the list will become part of the larger MarcEdit knowledge-base.

As you look through this release, you will find a number of other changes (addition of indicator counting in the field reports, additional options in some of the batch tools) – but the above are the items that have occupied the vast majority of my time. 

Finally, I’d love to update the MarcEdit icon – but I have very little artistic talent.  If there is a user out there that has a great desire to make a contribution and has some artistic sensibilities – I’d love to get some samples of potential updated MarcEdit icons.  This icon: image  has represented the MarcEdit application for nearly 10 years (I can’t believe it’s been that long) – I’d like to refresh it.  So, if anyone wants to make some suggestions, I’d appreciate it. 

Download URLs:

 

–TR


Oct 20 2009

MarcEdit Paging approach

I’m just about to the point where I have this work completed and will be ready to send it out for a few people for testing.  However, I want to provide some feedback so folks have an idea how this will work (even if you’re not that interested).

Paging:

The idea here is that loading the entire data file into an edit window is a big waste of resources and a performance killer.  So, rather than load all the data, we load small snippets of data, but allow users to search the entire file or page through it.  At this point, here’s what this looks like:

image

This is a sample using a 109 MB file.  Previously, this would have consumed over 450 MB of virtual memory to open, and editing would be limited.  Using the paging approach, memory allocation is down to 37 MB – essentially the memory allocated when the program opens (thanks to the need to initialize the .NET framework)

image

This is a big difference and it shows.  But how does this actually work exactly so that as you page through files, performance doesn’t suffer?

Well, here’s the process when paging. 

  1. The user selects a file to open
  2. MarcEdit opens the file, and does the following preprocessing steps
    1. Is Preview mode selected –> If yes, open in Preview mode
    2. Is Preview mode turned off –> If yes, continue to paging
      1. Pull the configuration option that defines number of records per page (found on the preferences dialog)
      2. Pre-process the file.  Preprocessing does the following
        1. Determine number of records in the file
        2. Determine number of pages to display
        3. Create an internal memory map of the file, capturing a structure of start and end positions within the file for a set of pages.

 

The most important part of the paging process is the pre-processing that occurs on the file.  In order to do paging (at the record level), MarcEdit must read the file and determine how many records are in the file.  This means that when you open a large file, there will be an initial pause while the file is pre-processed – but once this preprocessing is done, there should be no need for the program to need to do this again unless the file is reloaded (through a global edit, etc).  How long will it take?  This is hard to say.  The process that I use is fairly optimized, uses buffers, etc.  So, for example, on the 109 MB file example above, preprocessing took approximately 2 seconds.  I think that this is fair.  However, once the processing is done, each page, no matter where in the file, should be able to be addressed in under a second (or right at 1 second for allocation and render).  For my 109 MB test file, page rendering is an average of 0.7 seconds.  I’m happy with this.

Saving/edits:

I knew when doing this that saving and handling edits on paged data would be one of the biggest issues of this method.  The primarily reason is that in most cases, the method that would be used would be to create a shadow copy (memory mapped file) of the original and save changes to it as the user paged through and made edits.  The problem with this approach are two fold.  Since we are dealing with records (not characters) – each edit would need to be saved, re-preprocessed (because file positions would change) and then re-rendered.  When I attempted to use this approach on my 109 MB test file, paging jumped to nearly 6 seconds to render a page because of all the work being done to save and reprocess the file.  Obviously, that’s not acceptable.  So, I’ve decided to use a different approach.  Internally, I’ve added an enumerated structure that stores a page number and a file pointer.  As pages are changed, a temporary file is created that stores just that modified page.  As MarcEdit is paged, it checked the enumerator to see if a page exists before pulling it from the source.  This way, if you change page 1, then move to page 2 and go back to page 1, you’d see your changes – which would be pulled directly from the shadow buffer.  These temp files will be stored and will then be rectified when:

  1. The user saves a file
  2. The user completes a global edit function (because these always require a full save – even if it is to an internal shadow file).

Using this approach, paging isn’t affected by edits to pages, and saving appears to work fine. 

Anyway, that’s the approach that I’m working with right now.  As I say, I’m hoping to wrap up this work tonight/tomorrow and given that occurs, I’ll be posting a test version for those brave souls who what to give this a whirl and give me feedback.  While may let folks see one more tool – I’m going to add a debugger switch which will allow you to capture a log file that stores variable states at critical moments.  This is something that I’ve been wanting – as it should help me when people as for debugging help.

 

–TR


Oct 15 2009

MarcEdit design question/advice

I asked this question on the MarcEdit Listserv, but will post it here as well.  Below, is the message and images of the wireframes that are mentioned.  If you have an opinion – feel free to join the list and let me know, or if you like, you can contact me directly at: terry.reese@oregonstate.edu

 

******* Forwarded Message from the MarcEdit-L Archive **********

I have a question and I’m hoping that the collective wisdom of the MarcEdit-L list can help me solve it.  I’ve got an update for MarcEdit that I’ve been sitting on for about a month because I have a specific issue (usability mostly) that I’m trying to solve, and I have an idea how to do it, but it will change the way that you edit MARC records in the editor (at least, how they are displayed) and before I go forward, I wanted to quickly take the communities pulse on this.

The problem

So let’s start with an explanation of the problem.  As folks that have worked with both MarcEdit 4.x and MarcEdit 5.x know, the ability for the Editor to load a lot of data into is much different.  In MarcEdit 4.x, the application utilized a custom edit control written in assembly for loading and editing records in the MarcEditor.  This allowed users to load very large files (150 MB or so) into the editor without a noticeable change in speed when adding new data to the editor, resizing windows, etc.  In MarcEdit 5.x, I made a conscious decision to utilize all .NET components to preserve the ability to port MarcEdit to the Linux and Mac platforms (Linux will be officially completed at the next release btw) – however, this had some implications with the editor in two ways.  1) Loading rich content into the editor has a much higher memory cost and 2) this higher memory cost has a definite effect on performance (loading and editing).  This is why I introduced the preview mode – a read-only mode that allowed users to load a snippet of the file and then make their global edits.  For my usage of MarcEdit, this worked beautifully – but I’m finding that a number of users have workflows that require them to load the entire file and perform single record edits which is, I’ll admit, painful when files start to get close to 8-10 mbs in size – as changes in the editing window often times are made, but are made with a delay (i.e., you type a word – a pause, then the data catches up).  This also affects screen resizing, etc.  Tied to this problem is the various character encodings that MarcEdit supports (it’s beyond MARC8 and UTF8).  This as well causes an issue with memory usage depending on the encoding in use – and honestly, is one of the big reasons for the change away from the assembly components in MarcEdit 4.x – that component simply didn’t do Unicode well and that’s the future of MARC.  The current component in MarcEdit does Unicode very well, but certain scripts give Windows some fits rendering (performance wise) – so it’s a problem – one that I’d like to solve.

Solutions

Anyway, that’s the problem I’m looking to solve.  I’m looking for a solution that will allow users that want to make individual record changes on large dataset within the MarcEditor, and do so in a way that allows the editor to gracefully handle memory management and performance.  The present solution, the one that is completely untenable, is to load all the data into an edit control.  On my test machines, I can load files up to ~150 MB in size into the control (your mileage will vary due to virtual memory restrictions and available ram) but it comes at a huge cost.  In Windows (and virtual languages like .NET especially), rendering content virtually is expensive.  Memory consumed is roughly 4x the source – so, rendering 150 MB of data costs my system ~600 MB of virtual ram.  Painful, and performance shows.  This is why the preview mode is there.  But let’s say you are dealing with a smaller dataset, something in the 8-10 MB range.  You are still consuming close to 40 MB to render the data – and performance can suffer depending on hardware and memory available.  If you need to make individual record changes on a batch in that size range, making these changes may be frustrating as you may indeed have to deal with a delay in entering data as the system re-buffered available memory to handle the work.  I’m pretty sure that everyone that’s had this happen agrees that this needs to change (I’ve heard from 3 people recently that have been experience this problem and are trying to figure out how to make it work within existing workflows) and I’m sure there are others that have not spoken up or may still use MarcEdit 4.x for very specific tasks simply because the handling of larger files for individual record editing was better (which is fair, but becomes less and less of a reliable solution as more data becomes available in UTF8).

So I’ve been thinking about this a lot over the past month, writing some test code, developing some wireframes and I want to present some options and get some feedback.  Essentially, there are two ways that I think I can deal with this issue.  One is to essentially provide real-time random access to large files [not preferred], so that the only data loaded into the editor will be available within the memory buffer.  This would likely be the ideal solution, but it also is the most difficult to write simply because all data would need to be mapped to temporary buffers, tracked, etc.  Also, when dealing with really large files, the random access will not be immediate, meaning that as you move further down the file, the ability to page down may become more labored.  The benefits however, is that the memory footprint would be much, much lower so performance for general, individual record editing, should improve greatly.  It also would most closely resemble the current way that MarcEdit provided editing within the MarcEditor.  All data would appear to be loaded in a Notepad-like interface – you’d page down, scroll down just as you do now.  I’m not sure how this would affect Find and Replace – but I’m sure we could make it work. 

And while the above may be the more ideal, it’s not the one that I’m leaning towards (hence this message).  I’ve been thinking a lot about how MARC records are represented in MarcEdit, how they are edited, etc. and I’m beginning to believe that when working with a large set of MARC records, the best solution wouldn’t be to provide simply a complete picture of all loaded records, but would be to display groups of records, with the ability to page through a recordset.  I’ve attached some wireframes to illustrate this point in the attached PowerPoint.  In slide 1, I’ve provided a demo of how I think the editing may look (ignore the menus, icons – these are just part of my test code).  Essentially, users would define how many records they want to display per “page”.  I’m thinking that the sweet spot would likely be about 500 – but I’d make this user defined.  MarcEdit can then, very quickly, determine how many records are in the file and then break up the record set as pages.  MarcEdit then would only load one page of records at a time.  This allows users the ability to quickly do individual edits of records, reduces memory footprint and greatly improves the overall experience of using large data files.  It also takes system memory limitations completely out of the equation, as only a small block of records will be displayed at any given time.

Using this system also would let me rethink how we do finds within a Recordset.  At present, when you use the find tool, MarcEdit has to enumerate over the entire record set and this is, for all intensive purposes, a very memory intensive operation.  Slow too if you have a lot of records.  In this new model, I’d add a new button to the Find dialog – Find All (see slide 2).  When Find All was used, what would be generated is a report of all occurrences of the needle found within the record set.  The report would show the criteria in context, with the ability to jump to the specific page where the text was found.  Personally, I think that this could be a big improvement over current find, as users would immediately be able to see all the cases in which a criteria exists without having to jump through the entire file.  Additionally, this type of a design would allow me to start thinking about the MarcEditor itself, so that record set editing could be done with pages (so you could for example, span a new page within a new MarcEditor tab so pages could be compared [see slide 3]).  I think that this type of design could eventually lead to some fairly interesting enhancements – but I also recognize that it will be different.  It represents a different way to view and edit records in MarcEdit – though, this change really only affect how you edit records individually (since global editing is done differently). 

Finally, implementation – if I move down the above path – I can integrate the current test code into the existing MarcEdit application with little work.  I could wrap up my update and not have to really worry about introducing regression errors.  If I try to implement the first solution, all bets are off in terms of when it would be done.  It would represent a major change to how data is handled within the program and I’d have to step back, re-write a lot of code and then find some willing users to try  it because there would be a significant chance for regression errors.

Anyway, that’s my idea.  I think it addresses a known weakness in the program and makes individual record editing better, and does so without causing too much interruption to the user.  And, if successful, may allow me to slowly remove the preview mode from the MarcEditor, as it would no longer be needed.

How can you help

If you stayed with me this long and looked at the wireframes, you are probably wondering how you can help.  Well, I’m looking for comments and ideas on this.  MarcEdit is a very community oriented project.  I’d say that over 90% of the work that goes into the program, is done at the community’s request.  This is an issue that I know has been raised by members of the user community, and I’m really waiting to make the community involved in the decision.  I’m definitely open to other suggestions and suggestions for how to tweak the wireframes (since I recognize that there are many places where usability could be improved) – but that’s kind of where I’m at right now. 

Thanks everyone who made it this far,

–TR

********************************
Terry Reese
Gray Family Chair
for Innovative Library Services
121 Valley Libraries
Corvallis, Or 97331
tel: 541.737.6384
********************************

 

Wireframes:

 

Slide 1

 

Slide1

 

 

 

Slide 2

 Slide2

 

 

Slide 3

 

Slide3