Nov 18 2009

MarcEdit 5.2 Available

It’s been a lot of work, but I’m finally ready to officially release  MarcEdit 5.2, though with some caveats.  This is the first public 5.2 build (there have been some private builds that have been available to some folks on the MarcEdit ListServ) – and as far as I can tell, potential problems related to the new features have been shaken out.  However, there have been a lot of changes and additions.  Few that affect the MARCEngine itself – but many affecting the MarcEditor, the largest being the new Paging structure.  While I’ve had this version available to people for testing for about 3 weeks with few reported problems – I’m not naive enough to think that I’ve caught everything.  So my advice to people – if you want to try out the new features, work with the new version of MarcEdit – great.  I’ll be keeping the current 5.1 build available for download for a short period just in case there is a show stopper in this build that requires someone to regress to a previous version.  At the same time – I will be monitoring the bug reports closely for a while – so the more people willing to use the updated version – the faster we shake it out.  With all that said, I’d recommend anyone that would usually be interesting in testing Beta/RC quality code to jump right in.  For those squeamish of the bleeding edge, I’d recommend hanging back – maybe until Dec., to see what issues, if any, shake out.  On the bright side, updates should be much easier going forward.  For those that have permission to install applications on their machines – MarcEdit provides an automated updated tool (noted below) – so if there is a need for updates – I should be able to distribute them quickly.

So what has changed in MarcEdit 5.2?  Quite a bit actually.  Let me highlight the most noticeable changes.

1) UI changes:  Right off the bat, if you’ve used MarcEdit, you will notice that it is different.  The main screen has been updated to include icons and reduce some of the functional choices to make entrance into the program a little less confusing. 

image

Additionally, shortcuts and relationships to functionality have been more well defined.  For example, in the MARC Tools window, there was no access to the Validator, the Split or the Join tools.  Now, that has been rectified. 

image

You will find that these types of changes have been made throughout the program.

2) Arabic Right to Left support:  In the MarcEditor – you now have the option to support Arabic Right to Left Rendering and input.  You access this function through the context menu (right click on the MarcEditor when a file is loaded) or by clicking  CTR+SHIFT+R.  I’m considering this Experimental at the moment.  I’ve been working with a few folks in the Middle East and am very pleased with the feedback I’m receiving – so as they continue to work with this feature – I very likely will be making additional changes.  Also, one additional note, Arabic rendering disables the paging at this time.  It was just easier to do it that way.

image

3) File Paging:  While MarcEdit will continue to utilize the Page Preview function – when one loads the Full file into MarcEdit, the program now utilizes a Paging approach to render.  Files are pre-processed and output as pages – with a specified number of records being displayed per page.  Couple things of note.  First, general Find/Replace operations only occur over the page that is displayed – however, all batch editing functions (Replace All, Add/Delete Field, etc.) – these occur over the entire file, not just the currently displayed page.

image

4) Jump Lists (Find All): In order to make finding items easier while using the new Paging Mode, a new Find All and Jump List function has been added to the tool.  This allows users to query the entire file, and then jump to individual records for edit.  Within the new Paging model, using regular Find will find only items in the current page.  Using Find All allows the user to query data across pages.  The Jump List results displays the searched result within the context of the field that it was found it – as well as a record number.

image

image

5) Automatic Updates:  I’ve added an automatic update tool.  If allowed, MarcEdit will query the central download server and track when changes to the program have taken place.   This is implemented one of two ways:
a) Automatic Updates – which are set in the Preferences area.  This works a lot like the Firefox updater.  When MarcEdit is updated, you will be prompted with a note that an update is available.  If you chose to download the update, MarcEdit will download the new installer, close MarcEdit, and then run the installer.  Like any program – you will need to have permission to run the update – but this should make it easier to determine when changes to the program have been made.
image

b) From the Help Menu on the main MarcEdit window – you can find a Check for MarcEdit menu entry.
image

When you are prompted for an update – you will see the following:
image

Hopefully, this will make managing the program easier for individuals.

6) Official Linux Install:  I have a few people shaking this out and I’m working out an automated build process for my linux version and documenting install instructions – but will have a Linux tarball ready for download no later than Dec. 1st.  Does this mean a Mac version is coming?  Hopefully yes – though a lot of it will depend on the next Mono runtime refresh and whether they fix some of the rendering issues with some of the panel/group controls.  If I automate the build process earlier than that – I’ll post a Linux Preview sooner as well. 

7)  Setup program changes:  I’ve updated the installer to do some additional install checks, clean up the 5.1 icons, etc.  I’m thinking about adding some additional switches to the installer to allow adminstrators – specifically those using software like Novell Groupwise to distribute the application, the ability to set some of the configuration options.  I’m curious if folks have suggestions related to the types of options that you’d like to be able to set on install.  Also, a note – on the roadmap for a future point release of 5.2 is the simplification of the installer.  I’m slowly moving installation code out of the bootloader – I’ll continue that process.

8)  And lastly, I want to thank George Mason University again, for being willing to host a MarcEdit ListServ.  This is one of those things that I’ve always wanted to setup – but I honestly just haven’t had the time or desire to be a list administrator.  Having someone step up and fill that void could have big benefits for the user community – so, if you haven’t signed up for the listserv, you can find it here: http://www.lsoft.com/scripts/wl.exe?SL1=MARCEDIT-L&H=MAIL04.GMU.EDU.  The list has a searchable archive – so any questions asked to the list will become part of the larger MarcEdit knowledge-base.

As you look through this release, you will find a number of other changes (addition of indicator counting in the field reports, additional options in some of the batch tools) – but the above are the items that have occupied the vast majority of my time. 

Finally, I’d love to update the MarcEdit icon – but I have very little artistic talent.  If there is a user out there that has a great desire to make a contribution and has some artistic sensibilities – I’d love to get some samples of potential updated MarcEdit icons.  This icon: image  has represented the MarcEdit application for nearly 10 years (I can’t believe it’s been that long) – I’d like to refresh it.  So, if anyone wants to make some suggestions, I’d appreciate it. 

Download URLs:

 

–TR


Oct 20 2009

MarcEdit Paging approach

I’m just about to the point where I have this work completed and will be ready to send it out for a few people for testing.  However, I want to provide some feedback so folks have an idea how this will work (even if you’re not that interested).

Paging:

The idea here is that loading the entire data file into an edit window is a big waste of resources and a performance killer.  So, rather than load all the data, we load small snippets of data, but allow users to search the entire file or page through it.  At this point, here’s what this looks like:

image

This is a sample using a 109 MB file.  Previously, this would have consumed over 450 MB of virtual memory to open, and editing would be limited.  Using the paging approach, memory allocation is down to 37 MB – essentially the memory allocated when the program opens (thanks to the need to initialize the .NET framework)

image

This is a big difference and it shows.  But how does this actually work exactly so that as you page through files, performance doesn’t suffer?

Well, here’s the process when paging. 

  1. The user selects a file to open
  2. MarcEdit opens the file, and does the following preprocessing steps
    1. Is Preview mode selected –> If yes, open in Preview mode
    2. Is Preview mode turned off –> If yes, continue to paging
      1. Pull the configuration option that defines number of records per page (found on the preferences dialog)
      2. Pre-process the file.  Preprocessing does the following
        1. Determine number of records in the file
        2. Determine number of pages to display
        3. Create an internal memory map of the file, capturing a structure of start and end positions within the file for a set of pages.

 

The most important part of the paging process is the pre-processing that occurs on the file.  In order to do paging (at the record level), MarcEdit must read the file and determine how many records are in the file.  This means that when you open a large file, there will be an initial pause while the file is pre-processed – but once this preprocessing is done, there should be no need for the program to need to do this again unless the file is reloaded (through a global edit, etc).  How long will it take?  This is hard to say.  The process that I use is fairly optimized, uses buffers, etc.  So, for example, on the 109 MB file example above, preprocessing took approximately 2 seconds.  I think that this is fair.  However, once the processing is done, each page, no matter where in the file, should be able to be addressed in under a second (or right at 1 second for allocation and render).  For my 109 MB test file, page rendering is an average of 0.7 seconds.  I’m happy with this.

Saving/edits:

I knew when doing this that saving and handling edits on paged data would be one of the biggest issues of this method.  The primarily reason is that in most cases, the method that would be used would be to create a shadow copy (memory mapped file) of the original and save changes to it as the user paged through and made edits.  The problem with this approach are two fold.  Since we are dealing with records (not characters) – each edit would need to be saved, re-preprocessed (because file positions would change) and then re-rendered.  When I attempted to use this approach on my 109 MB test file, paging jumped to nearly 6 seconds to render a page because of all the work being done to save and reprocess the file.  Obviously, that’s not acceptable.  So, I’ve decided to use a different approach.  Internally, I’ve added an enumerated structure that stores a page number and a file pointer.  As pages are changed, a temporary file is created that stores just that modified page.  As MarcEdit is paged, it checked the enumerator to see if a page exists before pulling it from the source.  This way, if you change page 1, then move to page 2 and go back to page 1, you’d see your changes – which would be pulled directly from the shadow buffer.  These temp files will be stored and will then be rectified when:

  1. The user saves a file
  2. The user completes a global edit function (because these always require a full save – even if it is to an internal shadow file).

Using this approach, paging isn’t affected by edits to pages, and saving appears to work fine. 

Anyway, that’s the approach that I’m working with right now.  As I say, I’m hoping to wrap up this work tonight/tomorrow and given that occurs, I’ll be posting a test version for those brave souls who what to give this a whirl and give me feedback.  While may let folks see one more tool – I’m going to add a debugger switch which will allow you to capture a log file that stores variable states at critical moments.  This is something that I’ve been wanting – as it should help me when people as for debugging help.

 

–TR


Oct 15 2009

MarcEdit design question/advice

I asked this question on the MarcEdit Listserv, but will post it here as well.  Below, is the message and images of the wireframes that are mentioned.  If you have an opinion – feel free to join the list and let me know, or if you like, you can contact me directly at: terry.reese@oregonstate.edu

 

******* Forwarded Message from the MarcEdit-L Archive **********

I have a question and I’m hoping that the collective wisdom of the MarcEdit-L list can help me solve it.  I’ve got an update for MarcEdit that I’ve been sitting on for about a month because I have a specific issue (usability mostly) that I’m trying to solve, and I have an idea how to do it, but it will change the way that you edit MARC records in the editor (at least, how they are displayed) and before I go forward, I wanted to quickly take the communities pulse on this.

The problem

So let’s start with an explanation of the problem.  As folks that have worked with both MarcEdit 4.x and MarcEdit 5.x know, the ability for the Editor to load a lot of data into is much different.  In MarcEdit 4.x, the application utilized a custom edit control written in assembly for loading and editing records in the MarcEditor.  This allowed users to load very large files (150 MB or so) into the editor without a noticeable change in speed when adding new data to the editor, resizing windows, etc.  In MarcEdit 5.x, I made a conscious decision to utilize all .NET components to preserve the ability to port MarcEdit to the Linux and Mac platforms (Linux will be officially completed at the next release btw) – however, this had some implications with the editor in two ways.  1) Loading rich content into the editor has a much higher memory cost and 2) this higher memory cost has a definite effect on performance (loading and editing).  This is why I introduced the preview mode – a read-only mode that allowed users to load a snippet of the file and then make their global edits.  For my usage of MarcEdit, this worked beautifully – but I’m finding that a number of users have workflows that require them to load the entire file and perform single record edits which is, I’ll admit, painful when files start to get close to 8-10 mbs in size – as changes in the editing window often times are made, but are made with a delay (i.e., you type a word – a pause, then the data catches up).  This also affects screen resizing, etc.  Tied to this problem is the various character encodings that MarcEdit supports (it’s beyond MARC8 and UTF8).  This as well causes an issue with memory usage depending on the encoding in use – and honestly, is one of the big reasons for the change away from the assembly components in MarcEdit 4.x – that component simply didn’t do Unicode well and that’s the future of MARC.  The current component in MarcEdit does Unicode very well, but certain scripts give Windows some fits rendering (performance wise) – so it’s a problem – one that I’d like to solve.

Solutions

Anyway, that’s the problem I’m looking to solve.  I’m looking for a solution that will allow users that want to make individual record changes on large dataset within the MarcEditor, and do so in a way that allows the editor to gracefully handle memory management and performance.  The present solution, the one that is completely untenable, is to load all the data into an edit control.  On my test machines, I can load files up to ~150 MB in size into the control (your mileage will vary due to virtual memory restrictions and available ram) but it comes at a huge cost.  In Windows (and virtual languages like .NET especially), rendering content virtually is expensive.  Memory consumed is roughly 4x the source – so, rendering 150 MB of data costs my system ~600 MB of virtual ram.  Painful, and performance shows.  This is why the preview mode is there.  But let’s say you are dealing with a smaller dataset, something in the 8-10 MB range.  You are still consuming close to 40 MB to render the data – and performance can suffer depending on hardware and memory available.  If you need to make individual record changes on a batch in that size range, making these changes may be frustrating as you may indeed have to deal with a delay in entering data as the system re-buffered available memory to handle the work.  I’m pretty sure that everyone that’s had this happen agrees that this needs to change (I’ve heard from 3 people recently that have been experience this problem and are trying to figure out how to make it work within existing workflows) and I’m sure there are others that have not spoken up or may still use MarcEdit 4.x for very specific tasks simply because the handling of larger files for individual record editing was better (which is fair, but becomes less and less of a reliable solution as more data becomes available in UTF8).

So I’ve been thinking about this a lot over the past month, writing some test code, developing some wireframes and I want to present some options and get some feedback.  Essentially, there are two ways that I think I can deal with this issue.  One is to essentially provide real-time random access to large files [not preferred], so that the only data loaded into the editor will be available within the memory buffer.  This would likely be the ideal solution, but it also is the most difficult to write simply because all data would need to be mapped to temporary buffers, tracked, etc.  Also, when dealing with really large files, the random access will not be immediate, meaning that as you move further down the file, the ability to page down may become more labored.  The benefits however, is that the memory footprint would be much, much lower so performance for general, individual record editing, should improve greatly.  It also would most closely resemble the current way that MarcEdit provided editing within the MarcEditor.  All data would appear to be loaded in a Notepad-like interface – you’d page down, scroll down just as you do now.  I’m not sure how this would affect Find and Replace – but I’m sure we could make it work. 

And while the above may be the more ideal, it’s not the one that I’m leaning towards (hence this message).  I’ve been thinking a lot about how MARC records are represented in MarcEdit, how they are edited, etc. and I’m beginning to believe that when working with a large set of MARC records, the best solution wouldn’t be to provide simply a complete picture of all loaded records, but would be to display groups of records, with the ability to page through a recordset.  I’ve attached some wireframes to illustrate this point in the attached PowerPoint.  In slide 1, I’ve provided a demo of how I think the editing may look (ignore the menus, icons – these are just part of my test code).  Essentially, users would define how many records they want to display per “page”.  I’m thinking that the sweet spot would likely be about 500 – but I’d make this user defined.  MarcEdit can then, very quickly, determine how many records are in the file and then break up the record set as pages.  MarcEdit then would only load one page of records at a time.  This allows users the ability to quickly do individual edits of records, reduces memory footprint and greatly improves the overall experience of using large data files.  It also takes system memory limitations completely out of the equation, as only a small block of records will be displayed at any given time.

Using this system also would let me rethink how we do finds within a Recordset.  At present, when you use the find tool, MarcEdit has to enumerate over the entire record set and this is, for all intensive purposes, a very memory intensive operation.  Slow too if you have a lot of records.  In this new model, I’d add a new button to the Find dialog – Find All (see slide 2).  When Find All was used, what would be generated is a report of all occurrences of the needle found within the record set.  The report would show the criteria in context, with the ability to jump to the specific page where the text was found.  Personally, I think that this could be a big improvement over current find, as users would immediately be able to see all the cases in which a criteria exists without having to jump through the entire file.  Additionally, this type of a design would allow me to start thinking about the MarcEditor itself, so that record set editing could be done with pages (so you could for example, span a new page within a new MarcEditor tab so pages could be compared [see slide 3]).  I think that this type of design could eventually lead to some fairly interesting enhancements – but I also recognize that it will be different.  It represents a different way to view and edit records in MarcEdit – though, this change really only affect how you edit records individually (since global editing is done differently). 

Finally, implementation – if I move down the above path – I can integrate the current test code into the existing MarcEdit application with little work.  I could wrap up my update and not have to really worry about introducing regression errors.  If I try to implement the first solution, all bets are off in terms of when it would be done.  It would represent a major change to how data is handled within the program and I’d have to step back, re-write a lot of code and then find some willing users to try  it because there would be a significant chance for regression errors.

Anyway, that’s my idea.  I think it addresses a known weakness in the program and makes individual record editing better, and does so without causing too much interruption to the user.  And, if successful, may allow me to slowly remove the preview mode from the MarcEditor, as it would no longer be needed.

How can you help

If you stayed with me this long and looked at the wireframes, you are probably wondering how you can help.  Well, I’m looking for comments and ideas on this.  MarcEdit is a very community oriented project.  I’d say that over 90% of the work that goes into the program, is done at the community’s request.  This is an issue that I know has been raised by members of the user community, and I’m really waiting to make the community involved in the decision.  I’m definitely open to other suggestions and suggestions for how to tweak the wireframes (since I recognize that there are many places where usability could be improved) – but that’s kind of where I’m at right now. 

Thanks everyone who made it this far,

–TR

********************************
Terry Reese
Gray Family Chair
for Innovative Library Services
121 Valley Libraries
Corvallis, Or 97331
tel: 541.737.6384
********************************

 

Wireframes:

 

Slide 1

 

Slide1

 

 

 

Slide 2

 Slide2

 

 

Slide 3

 

Slide3


Oct 15 2009

MarcEdit Listserv

So, the good folks at George Mason University have offered to host a MarcEdit Listserv.  If you are interested, you can find it here: http://www.lsoft.com/scripts/wl.exe?SL1=MARCEDIT-L&H=MAIL04.GMU.EDU

This list would be a great place for folks looking to ask questions.  I’m one of many moderators to the list, so if a question is asked, I (or someone) will try to answer it.  What I’m most excited about is that this will create a searchable archive for folks looking for help.

–TR


Aug 4 2009

A forgotten MarcEdit Subfield Editing function :)

I was asked the other day at AALL (American Association of Law Libraries) if MarcEdit could be used to move specific data from one field and replace the data currently present in another.  So, an example – the ability to move data from a 260$c to the 008 position 7:4.  You can actually, though its sadly not documented (one of those few hidden gems that have been created either for specific projects I or others have been working on).  So how do we do it.

Open the Edit Subfield Function.  In the Edit subfield function, there is an option called Move Subfield data.  That’s that one we want to check.   Then, we enter the following (using the 260$c-008).

Field: 260 [Enter the field with the data that you wish to move]
Subfield: c
Find: [leave blank – though you can enter data here if you want to find something specific to move]
Replace: 008|7||

Ok, the Replace looks funny and it is.  There are essentially a handful of options you can set here (4) – I’m going to explain two for now (and will update this post when I update the official documentation). 

Each pipe “|” represents a delimiter.  The first two pipes are the most important:

  1. Field to move to
  2. Where to move (replace) the data

In the above example, we are moving data to the 008 and placing the data in position 7.  If I was placing the data into a subfield, I would have entered a subfield (example: c) here.  So, the edit form would look like the following:

image

 

–TR


Aug 1 2009

MarcEdit Z39.50 on ‘Nix

I need to send this out to the 15 or so folks that have agreed to be my first guinea pigs testing out a MarcEdit build on ‘Nix and Mac systems (and btw, Mac UI rendering isn’t good.  That’s not surprising because Mono’s UI changes tend to show up correctly on ‘Nix first, then Mac – so I’m hopefully the planned 2.6 update in Sept. will correct many of the errors) but I wanted to document it here as well so I don’t forget to add it the installation instructions later.

In Windows, MarcEdit includes the yaz install as part of the application installation.  This means that when people install MarcEdit, all the dependencies that they need are installed as well.  With Linux, that won’t be the case.  On Linux, you will need to make sure that you install the yaz and yaz-devel packages.  Once installed, you need to make one more change (and here’s the trick).

In MarcEdit windows, the yaz dll has been marked, renamed as yaz3.dll.  The reason for this is that I don’t want to be accidently over-writing a previous installation of the software (in case other programs on the system are relying on older or newer versions of the library).  This works fine in Windows, but on Linux, the problem is that the yaz components are installed as yaz (not yaz2, yaz3, etc).  So, in Mono, the way that the framework makes calls to native libraries through PInvoke is to look for the linked file and then start checking the following locations for the following file names (using yaz3.dll as the example):

  • Application Path/yaz3.dll
  • Application Path/yaz3.dll.so
  • Application Path/libyaz3.so
  • Application Path/lib/yaz3.dll
  • Application Path/lib/yaz3.dll.so
  • Application Path/lib/libyaz3.so
  • System/lib/yaz3.dll
  • System/lib/yaz3.dll.so
  • System/lib/libyaz3.so

The problem is a simple one – when yaz is built either by source or package manager, you end up with a shared object called: libyaz.so.  So, the simple solution is to setup a symlink from System/lib/libyaz.so to System/lib/libyaz3.so.  So, on my Ubuntu install, that would be creating a symlink in the following path (/usr/lib/) using the following command:

  • ln libyaz.so libyaz3.so

And that’s it.  Once I made that change, the Z39.50 client started working as expected, and now this information has been documented so I can make sure it makes it into the INSTALL.txt file.

Cheers everyone,

–TR


Jul 31 2009

MarcEdit testing on ‘Nix (updates)

So far so good.  I’ve had a great response and am starting to get back feedback that I’ll be incorporating into work that I plan on doing this weekend to fix a few ‘Nix issues.  But I think that when the next MarcEdit release is posted – I’ll for the first time be including information on how to run MarcEdit on Linux. 

Now a Mac – that platform is a completely different animal.  Mono is being developed to be cross platform so MarcEdit does run on a Mac – but not well.  There are at this point still too many UI issues for me to recommend or suggest someone using a Mac run MarcEdit on that platform.  My hope is that Mono 2.6 due in Sept. will correct many of the Mac UI problems.  The roadmap indicates that most of the work for this build in the Windows.Forms emulation will be dedicated to Mac UI fixes – so here’s hoping that is true.

 

–tr


Jul 30 2009

Looking for testers interested in running MarcEdit on ‘Nix

I’ve started formally looking for guinea pigs that want to give this a whirl.  Installation has been simplified, everything works (save the Z39.50 functionality) – so if you would like to try running MarcEdit on either a Linux or Mac (though, running it on a mac may be dodgy – I know that mono development on the mac usually runs a little behind ‘Nix’) let me know.  At this time, I’m hoping to get maybe 5 more users (I have 3 now) that are interested in giving this a go (both just installation but also actually using the software).

If you are interested, ping me at terry.reese at oregonstate.edu.

 

–TR


Jul 26 2009

Supporting Right to Left languages in MarcEdit (part 3)

I’ve been spending a bit of time over the past few weeks doing some travelling but also working on this problem in MarcEdit.  When I left it last, I was pretty close to having a solution – the only thing that had been still a little wonky was dealing with data elements that were primarily mixed numeric/text delimiter pairs.  Well, the good news is that it looks like I’ve got it figured out.  As noted earlier, the bidirectional algorithm provides methods for inserting control codes that can be used to distinguish embedded data to ensure that the data is handled correctly (or, handled in the fashion that the user is expecting).  I think I’ve gotten to that point with MarcEdit. 

Now, in order to get this to work, I had to implement a couple of changes.  I’ve added spacing between delimiters and the field data that make them up.  These spaces are cosmetic and are filtered out when data is saved.  It is only for the purposes of rendering the data correctly.  Likewise, the LDR statement in MarcEdit, for the purposes of the Right to left translation, when streamed to the screen, the field is swapped from LDR to 000.  In MarcEdit, field 000 has always been a synonym for the LDR field – however, for the purposes of correct right to left rendering, changing this to a numeric representation made life much easier.

This weekend, I’m wrapping up a build of this for folks interested in testing.  I have to dedicated testers (both in the middle east) who will be working with the software and providing feedback – specifically as it relates to rendering and use of the global utilities.  Because this is nearly finished and there haven’t been any outstanding [blocking] issues, I’ll be holding the next update till this work has completed.

I’ll put a small write-up in the tutorial when this ships, but how will this work in MarcEdit?  It will actually be very simple to activate. 

1) First, you start with a record and open it in the MarcEditor:

image

2) Using either the keycode combination of:  CTRL+SHIFT+R or right clicking on the menu and selecting Display Right-to-Left from the menu:

image

3) When you select that option, MarcEdit will re-load your data file, shifting the display to Right to left.

image

My guess is that as my testers get their hands on this, we will run into a few issues (there always are) – but so far, it appears from my untrained eye (using the sample data provided to me over the past two weeks) that everything appears to working as one might expect. 

–TR


Jul 9 2009

Right to left rendering Part II

So, a couple of a additional notes on this.  I got some feedback from my latest attempt and while I’m closer, there are still some significant issues.  Also, my testers and I weren’t seeing the same things on the screen when testing so I thought I’d follow up on two aspects of working with Right to Left languages that are not completely intuitive.

1) While Windows does support Right to Left rendering by default, it will not apply the bidi algorithm correctly unless you have explicitly turned on support for complex scripts (at least in XP-, having check my vista or Windows 7 testing boxes yet).  This wasn’t obvious.  It seemed to me that if I could see the text, could switch on the Right to Left processing, that certainly, I would be seeing what an Arabic user would be seeing.  Well, no.  If you want to actually have Windows support correct rendering of Right to left data, you need to open the Regional and Language Control Panel Applet and check the Install files for complete script and right to left languages:

image

Obvious – right?  Once I dug up my Windows XP disk, I was able to install the support files and now, I can see what the same thing that an Arabic user would see.  So, at least we are now working with the same screwed up display.

 

2) The bidi is an interesting algorithm, and the more I’ve been looking at it, the more that I think that options actually exist in the algorithm to solve my problem.  At issue is how the algorithm treats data that shouldn’t be treated as right to left or mixed displays.  The best explanation that I’ve read comes from here: http://blogs.msdn.com/vsarabic/archive/2008/03/24/mixed-time-date-display.aspx#comments explaining why mixed Time/Date elements often display incorrectly within Arabic interfaces.  It explains how numbers and neutral characters are processed and how the rendering shifts when non-neutral, non-arabic characters are encountered.  For MarcEdit’s purposes, I have a pretty good idea which non-neutral characters are causing my problems – the delimiters (since these can be a-z0-9).  But the bidi (which you can read more about here: http://unicode.org/reports/tr9/) includes a set of character overrides that can be embedded to force certain data to be interpreted as strong, weak or neutral characters for processing.  Playing around with this a little bit, I found that I could embed a few key elements and change the output to look like the following:

image

Still not perfect, but getting much closer to what I’m looking for.  Of course, embedding these character codes invalidates the MARC record so they would need to be filtered out when saved or the saved data would be a mess – but I think that this could potentially be useful, specifically for people doing web display – since these embedded characters would allow you to essentially mark control/display data so that the rendering doesn’t affect the overall output of the text.  Make sense?  Not much to me either. :)

 

–TR