Dec 21 2007

OCLC’s Connexion XML — why, oh why?

As I’d noted previously (http://oregonstate.edu/~reeset/blog/archives/479), some early testers had found that the Connexion plug-in that I’d written for MarcEdit stripped the 007.  I couldn’t originally figure out why — it’s just a control field and their syntax for control fields is pretty straightforward.  However, after looking at a few records with 007 records, I could see why.  In Connexion, OCLC lets folks code the 007 using delimiters like a normal variable MARC field (when its not) — and they save it as such — using delimiters.  For example:

<v007 i2=" " i1=" " im="0">
  <sa>
    <d>s</d>
  </sa>
  <sb>
    <d>d</d>
  </sb>
  <sd>
    <d>f</d>
  </sd>
  <se>
    <d>s</d>
  </se>
  <sf>
    <d>n</d>
  </sf>
  <sg>
    <d>g</d>
  </sg>
  <sh>
    <d>n</d>
  </sh>
  <si>
    <d>n</d>
  </si>
  <sj>
    <d>z</d>
  </sj>
  <sk>
    <d>u</d>
  </sk>
  <sl>
    <d>u</d>
  </sl>
  <sm>
    <d>u</d>
  </sm>
  <sn>
    <d>d</d>
  </sn>
</v007>

I’ll admit — I have no idea why they went with this format.  From my perspective, its clunky.  The 007, as a single control field, is fairly easy to parse as it can have up to 13 bytes, with number of bytes specified 0 byte of the data element.  In this format, you actually have to create 9 different templates for the different possibilities in order to account for different field lengths, byte combinations and delimiter settings.  Honestly, my first impression when looking at this was that its a perfect example of how something so simple can become much more difficult than need be.  Personally, I would have been happier had they broke from their MARCXML like syntax for this one field to create an special 007 element.  Again, this is something that could have been easily abstracted in the XSLT translation — but to be fair, I don’t think that they figured anyone but OCLC’s connexion team would ever be trying to work with this. 

So how I’m solving it?  Well, one of the cool things working with XSLT (and .NET in general) is the ability to use extensions to help fill in missing functionality in the XSLT language (in my case, the ms:script extension in the msxml library).  Since this transformation isn’t one that I’m really sharing (outside the plug-in), I’m not too worried about its portability.  So, what I’ve done is created a number of helper C# functions and embedded them within the xslt document to aid processing.  For example,

<xsl:stylesheet version="1.0"
xmlns:marc="http://www.loc.gov/MARC21/slim"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ms="urn:schemas-microsoft-com:xslt"
 xmlns:osu="urn:oregonstate-edu:xslt"
 extension-element-prefixes="osu">
  <xsl:output method="xml" indent="yes" />
  <ms:script language="C#" implements-prefix="osu">
    <![CDATA[
        public int length(string s) {
          s = s.ToLower();
          if (s=="c") {
             return 14;
          } else if (s=="d") { return 6;}
          else if (s=="a") { return 8;}
          else if (s=="h") { return 13;}
          else if (s=="m") { return 10;}
          else if (s=="k") { return 6;}
          else if (s=="g") { return 9;}
          else if (s=="r") { return 11;}
          else if (s=="s") { return 14;}
          else if (s=="f") { return 10;}
          else if (s=="v") { return 9;}
          else { return 8;}
        }
      ]]>
  </ms:script>
 

This is a simple function that I’m using to track the number of elements needed for the processing template.  This is because I don’t want to create 9 different XSLT templates for each processing type, so I’m using some embedded C# to simplify the process.  On the plus side, using these embedded scripts make the translation process much faster on the .NET side (since .NET compiles xslt to byte code anyway before running any translation process), and this is a technique that I’ve never really had to use before so I was able to get a little practical experience.  Still don’t like it though.

–TR


Dec 3 2007

XSLT and Ruby/Rails

While adding REST support for Libraryfind, I found that I wanted to provide an output in XML, but that could also provide HTML if an XSLT was attached.  In Rails, generating XML files is actually pretty easy.  In Rails, output is specified in views.  HTML views are created using a .rhtml extension, while xml views are created using a .rxml extension (at least, until Rails 2.0 when they are to change). 

Anyway, we use libxml for XML processing of large XML documents (since I just have never found REXML up to the task) and was happy to find a gem based on libxml that provides XSLT support as well.  The libxslt-ruby gem provides a very simplified method for doing XSLT processing in ruby.  In the .rxml file, I add the following code:

if params[:xslt] != nil
@headers["Content-Type"] = "text/html"

require 'xml/libxml'
require 'xml/libxslt'

xslt = XML::XSLT.file(params[:xslt])
xslt.doc = XML::Parser.string(_lxml).parse

# Parse to create a stylesheet, then apply.
s = xslt.parse
s.apply
return s.to_s
else
return _lxml
end

Simple and no fuss.  The XML::XSLT.file function can take a physical or remote file location.  To parse the file, you must pass into XML::XSLT.doc a reference to a XML::Document object.  Since libxml doesn’t provide a function to deal with xml data directly (it assumes that you are loading XML via a file), you simply need to use the included XML Parser object to create an XML::Document object dynamically.  Anyway, I’m finding that this works really well.

–TR


Dec 3 2007

LibraryFind installation: dealing with problems relating to openssl and rubygems

I was installing LibraryFind on a server at Willamette University the other day for testing purposes, and ran into something that I had never seen before.  While setting up the dependencies on the test server, I found that the current version of ruby found in the distro’s YUM repository was old (1.8.5), so I decided to download and compile ruby from source.  So, here’s the steps that I followed:

  1. Downloaded Ruby 1.8.6 (current patchset)
  2. Compiled Ruby (no errors)
  3. Downloaded Rubygems
  4. Ran setup…which was successful
  5. Using Rubygem, I tried to install Rails and this is where I ran into problems.  The download starts and then throws the following error:
    ERROR:  While executing gem … (Gem::Exception)
        SSL is not installed on this system

So, I made sure openssl and the openssl-devel packages were installed on the machine.  Well, they were.  After digging around on the web, I couldn’t find anything that helped — however, I did find an email message from a long time back when we were compiling ruby to work with mysql and had to compile from source.  To make it work, we had to go into the ruby ext folder in the source and compile some files directly.  So, I figured I’d give it a try for openssl and it worked.  So, here’s the steps I followed:

  1. Navigate to ext/openssl in the ruby source folder.
  2. Once there, you run the following:
  3. ruby extconf.rb
    make
    make install

  4. After running the install, you can ensure that ruby can “see” the openssl information by running the following from the command-prompt:
    >>irb
    >>require ‘openssl’
    If everything is setup right, you will see the following: =>true.

 

–TR


Dec 3 2007

LibraryFind 0.8.5: threading and XSLT and REST, Oh my

At present, I’m wrapping up the back-end changes to what will be LibraryFind 0.8.5.  Yup, we’ll be skipping 0.8.4 in part because I’d like the release point to represent the broadness of the changes being made.  In fact, had the UI portions of the code been modified to completely support the new back-end searching, we’d likely jump the release point to 0.9. 

So what changes are coming in the back-end side of LibraryFind:

Minor Changes:

  1. SRU support: I’ve added a class to support SRU searching
  2. Connector class refactoring — I’ve made these a little more abstract and created them as a model.  Should continue to simplify the creation of new connector types when necessary.
  3. OpenURL resolution becoming optional:  While LibraryFind has been designed to integrate directly with some OpenURL services to provide inline resolution of items prior to showing them to the users — this does add a barrier for folks wanting to test.  So, I’ve made this optional.  If you don’t enable OpenURL resolution, it will just generate links to the resource if possible.
  4. Finally gotten rid of a warning message that was being thrown each time I processed a collection of Record objects.  When the value of the record objects property was nil, it would throw a warning.  It was causing any problems with rendering or searching, but it was filling the error logs with unnecessary information.
  5. Other optimizations (see trac for a full list)

Major Changes:

  1. Expanding the current API to include an asynchronous set of searching tools.  Currently, LibraryFind utilizes Ruby’s Thread classes to initialize the various searches that it needs to run to retrieve a set of results.  Unfortunately, while this process does thread the various searches, it still requires a primary thread that holds an HTTP connection open until all the threads have finished running.  So, in an effort to make searching more user friendly (and faster), I’ve made use of the spawn plug-in.  This was nice because it supports both ruby’s threads as well as forking of processes.  Add a job queue, some additional models and there you have it.  The 0.8.5 UI has been refactored to make use of the new API — but you will not see any changes to how the results are presented.  In the future, this will change.
  2. REST API.  LibraryFind has supported the full SOAP stack since its inception.  This is in part because I prefer working with SOAP and it worked.  Well, in order to support more light-weight search clients against LibraryFind, I’ve started adding a REST-based protocol.  At present, the only API method currently emulated is the search function.  However, I’m planning on including support for our Asynchronous search and Collection related API.  Hopefully, this will provide one more point for integration.
  3. XSLT support:  In addition to providing a REST based API, these queries will also accept an XSLT file which can be used to transform data on the fly.  This will allow for the development of UIs through the simple creation of an XSLT file. 

Anyway, that’s what’s on the menu for 0.8.5 at the moment.  My guess is that we’ll be doing a feature freeze shortly and start the quality control process.  So things are starting to move quickly.

–TR


Nov 29 2007

LibraryFind and Mobile Services

One of the things I was really impressed with while attending DLF was the presentation on the lightweight web platform being built at NCSU.  Leveraging their endeca catalog, the folks at NCSU have been able to produce a set of REST-based api for querying the catalog.  With those services, they’ve designed a mobile interface and a google widgets interface to their catalog.  It’s a good idea — one that I’m working on moving into LF.  LibraryFind already supports a SOAP-based API (which is actually my preference), but I’m running into more and more cases where a light-weight rest-based api would be nice.  Fortunately, ruby/rails makes this possible. 

Anyway, in the spirit of looking at building client services ontop of libraryfind, I’ve created a quick facebook app using libraryfind.  Whether we’ll actually use it (or be able to contribute it to the facebook app list — who knows), but it was actually pretty simple to put together.  Here’s a couple of quick screenshots (you’ll notice I’m using the iframe option — mostly because I was getting annoyed while testing the app getting dropped out of facebook.  This way, the user stays in facebook, but can query the service.

Facebook LibraryFind app Front:

image

Facebook LibraryFind Search/Results:

image

 

At this point, the app is just rendering LF in the iframe, though I imagine that an interface developed for mobile users would also work best within this environment.  I guess time (and usability testing) will tell (if there is even any interest in having something like this at all).

–TR

Technorati Tags: ,

Nov 19 2007

Dynamically loading and Unloading Assemblies in C#

While working on a plugin manager for a program written in C#, I found myself with a need to be able to load and unload assemblies dynamically be an application.  In C#, loading assemblies is a fairly easy prospect — one just needs to make use of the System.Reflection class.  Something like the following:

System.Reflection.Assembly assembly = System.Reflection.Assembly.LoadFile(@"c:\yourassembly.dll");

However, if you need to unload the assembly — good luck.  The .NET assembly class doesn’t include an unload method.  If you have a need to be able to dynamically load an unload assemblies, you need to work with the AppDomain class.  The .NET framework works on an Application Domain model, so for items like plugins (where you may need to load, unload or modify an assembly), you need to create an Application Domain manager to load assemblies onto.  This way, when you need to unload an assembly, you use the Unload method found within the AppDomain class. 

Of course, when dealing with plugins, you likely will need to create a new application domain for each plugin to be loaded.  This is because the you unload the appdomain, not the assemblies attached to the domains.  So for my project, I decided to create something much like the TempFileCollection.  In a global class, I decided to create a hash that stories a domain name and the domain object.  Using this method, I can do something like the following:

   1:  string path = cglobal.mglobal.AppPath() + "plugins" + System.IO.Path.DirectorySeparatorChar;
   2:              string[] files = System.IO.Directory.GetFiles(path);
   3:   
   4:              lstInstalled.Items.Clear();
   5:              foreach (string f in files)
   6:              {
   7:                  try
   8:                  {
   9:                      System.AppDomain domain = System.AppDomain.CreateDomain(System.IO.Path.GetFileName(f));
  10:                      System.IO.StreamReader reader = new System.IO.StreamReader(f, System.Text.Encoding.GetEncoding(1252), false);
  11:   
  12:                      byte[] b = new byte[reader.BaseStream.Length];
  13:                      reader.BaseStream.Read(b, 0, System.Convert.ToInt32(reader.BaseStream.Length));
  14:   
  15:                      domain.Load(b);
  16:                      System.Reflection.Assembly[] a = domain.GetAssemblies();
  17:                      int index = 0;
  18:   
  19:                      
  20:                      
  21:   
  22:                      for (int x = 0; x < a.Length; x++)
  23:                      {
  24:                          if (a[x].GetName().Name + ".dll" == System.IO.Path.GetFileName(f))
  25:                          {
  26:                              index = x;
  27:                              break;
  28:                          }
  29:                      }
  30:   
  31:                      System.Windows.Forms.ListViewItem item = new ListViewItem();
  32:   
  33:                      item.Text = a[index].GetName().Name + ".dll";
  34:                      item.SubItems.Add(a[index].GetName().Version.ToString());
  35:                      item.SubItems.Add(reader.BaseStream.Length.ToString());
  36:                      lstInstalled.Items.Add(item);
  37:                      reader.Close();
  38:                      cglobal.mglobal.domains.Add(System.IO.Path.GetFileName(f), domain);
  39:                      
  40:                  }
  41:                  catch { }
  42:              }
 

Then, if we need to unload the assembly, we can unload the domain that its attached to.  Something like:

   1:  for (int x = 0; x < lstInstalled.Items.Count; x++)
   2:              {
   3:                  if (lstInstalled.Items[x].Selected == true) {
   4:                      try {
   5:                          if (System.IO.File.Exists(cglobal.mglobal.AppPath() + "plugins" + System.IO.Path.DirectorySeparatorChar + lstInstalled.Items[x].Text)) {
   6:                              System.AppDomain.Unload((System.AppDomain)cglobal.mglobal.domains[lstInstalled.Items[x].Text]);
   7:                              cglobal.mglobal.domains.Remove(lstInstalled.Items[x].Text);
   8:                              System.IO.File.Delete(cglobal.mglobal.AppPath() + "plugins" + System.IO.Path.DirectorySeparatorChar + lstInstalled.Items[x].Text);
   9:                          }
  10:                      }
  11:                      catch {}
  12:                  }
  13:              }

Seems a little more involved that it has to be, but once you know how it works, its not that big of a deal.

–TR


Nov 17 2007

LibraryFind 0.8.4 upcoming changes

At some point, I’ll likely move this to the LibraryFind blog.  I just realized that I couldn’t remember my login information to post to the blog — so, I’ll post here.

I’m not exactly sure if the UI changes will be made in 0.8.4 to incorporate the new spawning/pinging (I think that they will), but the next version of LibraryFind will include an extended set of API that will allow users to develop UI interfaces that allow searches to be done independent of the UI.  This was done using a spawn plugin (you can find it on rubyforge or linked in lf via svn:externals), the creation of a jobs queue, and changes in a lot of the backend server components to make this all work.  I’ve just checked into the 0.8.4-ping branch (and soon into trunk once I’ve finished testing and create some unit tests) all the code necessary to make this work.  This requires the following changes to the LF codebase:

  1. meta_search.rb refactor:  In order for this to work, I’ve had to move the cache checking code out of this file and move it into a more abstracted “connector” class.  The connector classes needed to be modified to support the new pinging functionality

  2. New models: job_query.rb, job_item.rb, cache_search_session.rb, cached_search.rb, oai_search_class.rb, z3950_search_class.rb, opensearch_search_class.rb

  3. API file changes: query_api.rb

  4. Controller changes: meta_search.rb, query_conntroller.rb, dispatch.rb, record_set.rb

  5. Migration changes: New migration has been added — 025_create_job_queues.rb

  6. Changes to the environment.rb file (these will be in the environment.rb.example file)

New SOAP API elements:

  • GetJobRecord (Queries the result set for a single job, returns the array of records)
  • GetJobsRecords(Queries the record sets for an array of jobs, returns the array of records)
  • SearchAsync (Returns an array of job ids, works like search)
  • SimpleSearchAsync (Returns an array of job ids, works like simple search)
  • CheckJobStatus (Returns a JobItem object for a single job)
  • CheckJobsStatus (Returns an array of JobItem objects for an array of jobs)

The presumed workflow with the api is as follows:

  1. Client will utilize the SearchAsync (or SimpleSearchAsync) API to initiate the search.  This will return an array of job ids.
  2. Client will periodically ping the server, using CheckJobStatus (or CheckJobsStatus) for a list of jobs and their status.  Status is an enumeration.  -1 == error, 0 == finished/idle, 1 == searching
  3. Client will retrieve records once items have been processed by calling GetJobRecord (or GetJobsRecords) for the array of records.

With this finished, I’ve completed pretty much all the current tags waiting for 0.8.4 and will be focusing on one additional enhancement that I’d like to see in 0.8.4 (if we can finish and test) or 0.8.5 — that being the addition of a REST-based XML api as well.  While I actually prefer the SOAP stack for some development, I would like to provide a REST-based api for more lightweight clients.

Anyway — that’s the news for now.

–TR


Nov 16 2007

LibraryFind 0.8.3 tagged

Jeremy I’m sure will get the tgz version of the file up and ready on the LF site soon, but the 0.8.3 instance of libraryfind has been tagged.  If you are interested in finding out more about libraryfind, see: http://www.libraryfind.org.

 

–TR


Aug 1 2007

.NET 64-bit processor memory issues when using sendmessage to access a winform element

I’m posting this in hopes that it will save someone else a lot of time or someone that knows .NET a bit better than I can provide a better solution. 

Problem:

Last week, I had someone ping me regarding MarcEdit and a problem that they were running into with the Editor running it on a 64-bit version of Windows 2003 Server.  MarcEdit is compiled for any processor, so in theory, the framework should adjust the variable types to the current CPU type and go on it’s merry way.  And was it not that I have to work with some unmanaged code within my application, I’m sure that this would be the case.  However, when opening the MarcEditor, the user was getting the following error message:

This is odd because I test MarcEdit on every version of Windows from 98 to Vista.  The problem however, is I’ve never ran the program in a 64-bit version of Windows. 

Background:

I did a little bit of research, and found what I thought to be the problem.  The 64-bit version of windows shares many of the same signatures as its 32-bit counter-part, but one place where the signatures differ is in the Messaging Queue.  SendMessage, for example, which uses integers to pass values between processes had been updated to 64 bit integers and would crash if the wrong data type is sent into the function.  No problem, I fixed the signature issue, but the error message remained.  What I didn’t realize is that this wasn’t the actual problem (though it was a problem).  The real problem seemed to be related to simply accessing the RichTextbox Handle and passing it the callback.  Anytime the Handle was touched and passed, this error would be generated.

Solution:

So, Microsoft does make the Enterprise version of Windows 2003 Server available on a trial basis for developers wanting to test their software.  So, I dug up a box with an AMD-64 bit processor and set to installing the software.  Next, I installed SharpDevelop, an Open Source IDE for .NET.  I created a small sample program to isolate the code that was causing me problems.  In my case, the code that was causing the problem is necessary because of MARC being a UTF8 encoded data format.  Microsoft’s Richtext library supports the loading of plaintext (ASCII), Unicode text, text with OLE objects and text in just about any character format, including UTF8.  Unfortunately, the .NET framework only exposes plaintext and Unicode text as supported formats.  This means that in order to load UTF8 data and utilize the components streaming nature to minimize the memory footprint during loading, we need to essentially write our own EditStreamCallback function, create the delegates, the EDITSTREAM struct, etc.  And in that, there is the rub.  When compiling the code in SharpDevelop, I specified that the code should be targeted specifically for a 64-bit processor.  During compile, I got two warning messages that two core .NET components are compiled specifically for 32-bit processors.  Since the signatures on the 64 and 32 bit machines are identical, one can generally ignore these compilation warnings, as the framework does it’s magic.  However, the fact that I’m utilizing functionality from one of these two components within an unmanaged code block causes the problem.  Within the .NET (and 64-bit environment in general), an 64-bit process cannot load a library compiled for a 32-bit process.  A 32-bit process can run within a 64-bit environment, they just cannot share processes between themselves.  My best guess is that this is what was happening.  Since these two .NET components were compiled specifically for the 32-bit processors, my attempts to load them into a 64-bit process and utilize them within an unmanaged code block caused issues.  The solution is a simply one — for the GUI application of MarcEdit (which doesn’t do much anyway), the program simply needs to be complied to target 32-bit processors.  Now it runs just fine within a 64-bit environment, and will remain so until Microsoft cleans up these two core libraries.  With that said, if anyone has a better way of dealing with this problem (code is attached, so if you can make it work, I’d love to here from you), I’d love to hear about it.

RichText Code:

Finally, it’s pretty difficult to find example code dealing with the Richtext components in C#.  I think this is primarily because most folks that use high level languages like C# either don’t have a need for it or don’t have the background in C++ to understand what is actually happening at the Proc level.  Anyway, to that end, I’m posting the source to my small sample program (get it here) that I used to diagnosis this problem.  The trick to doing this type of interaction is to avoid the use of integer class variables.  In .NET, you have to remember that you are dealing with managed code, so when you make the call to a API like SendMessage, you should be Marshalling all your data, and passing it into the function via the IntPtr structure.  The only exception to that with the SendMessage API is the message argument, which microsoft defines and an unsigned 32-bit integer on all platforms, though for practical purposes, the message argument should be classed as a 32-bit integer.

API/Delegate Declarations

   1:  private const int SF_USECODEPAGE = 0x020;
   2:          private const int SF_TEXT = 0x001;
   3:          private const int SF_RTF = 0x002;
   4:          private const int CP_UTF8 = 65001;
   5:   
   6:          private const int WM_SETREDRAW      = 0x000B;
   7:   
   8:          private const int WM_USER = 0x400;
   9:          private const int EM_STREAMIN = WM_USER + 73;
  10:          private const int EM_GETEVENTMASK   = (WM_USER + 59);
  11:          private const int EM_SETEVENTMASK   = (WM_USER + 69);
  12:          private const int EM_STREAMOUT = WM_USER + 74;
  13:          private const int ENM_NONE =    0;
  14:          private const int EM_SETTEXTMODE        = WM_USER + 89;
  15:   
  16:          private const int TM_PLAINTEXT       = 1;
  17:   
  18:          private const int ECO_AUTOWORDSELECTION = 0x00000001;
  19:          private const int ECO_AUTOVSCROLL = 0x00000040;
  20:          private const int ECO_AUTOHSCROLL = 0x00000080;
  21:          private const int ECO_NOHIDESEL = 0x00000100;
  22:          private const int ECO_READONLY = 0x00000800;
  23:          private const int ECO_WANTRETURN = 0x00001000;
  24:          private const int ECO_SAVESEL = 0x00008000;
  25:          private const int ECO_SELECTIONBAR = 0x01000000;
  26:          private const int ECO_VERTICAL = 0x00400000;
  27:          private const int ECOOP_SET = 0x0001;
  28:          private const int ECOOP_OR = 0x0002;
  29:          private const int ECOOP_AND = 0x0003;
  30:          private const int ECOOP_XOR = 0x0004;
  31:  
  32:          private const int EM_SETOPTIONS = (WM_USER + 77);
  33:          private const int EM_GETOPTIONS = (WM_USER + 78);
  34:   
  35:   
  36:          delegate IntPtr EditStreamCallback(IntPtr dwCookie, IntPtr pbBuff, IntPtr
  37:              cb, out IntPtr pcb);
  38:   
  39:  
  40:          struct EDITSTREAM
  41:          {
  42:              public IntPtr dwCookie;
  43:              public IntPtr dwError;
  44:              public EditStreamCallback pfnCallback;
  45:          }
  46:   
  47:  
  48:   
  49:          [DllImport("user32.dll", CharSet = CharSet.Auto, SetLastError = false)]
  50:          static extern IntPtr SendMessage(HandleRef hWnd, Int32 Msg,
  51:                                          IntPtr wParam, IntPtr lParam);
  52:  
  53:          [DllImport("user32.dll", CharSet = CharSet.Auto, SetLastError = false)]
  54:          static extern IntPtr SendMessage(HandleRef hwnd, Int32 msg, IntPtr
  55:              wParam,    ref EDITSTREAM lParam);

In the declarations, you will see that two forms of SendMessage have been defined.  One where the lParam references the EDITSTREAM structure and on where it references an IntPtr structure.  The former is used when streaming data into the RichText window, the latter is used when sending regular messages between controls.  It should be noted, the later could be removed in .NET 2.0 by making use of the System.Windows.Forms.Message class, which essentially allows you to send messages to controls so long as all arguments can be sent as IntPtrs.

After the declarations, the remainder of the code is setting up the actual streaming, and creating the function that the delegate prototypes.  In this example, I’ve called the streaming function, ReadRichTextStream and the actual streaming function, StreamIn.  These functions would look like the following:

ReadRichTextStream: Accepts a RichTextBox Object and the filename of the file to load.

   1:          private void ReadRichTextStream(System.Windows.Forms.RichTextBox objRich,
   2:              string sfilename)
   3:          {
   4:  
   5:              string filename = sfilename.ToLower();
   6:              objRich.Text = "";
   7:              int eType = SF_TEXT;
   8:              if (filename.EndsWith(".mrk")|filename.EndsWith(".mrk8")|filename.EndsWith(".tmp")|filename.EndsWith(".xml"))
   9:              {
  10:                  eType = (((CP_UTF8)<<16)|SF_USECODEPAGE|SF_TEXT);
  11:              }
  12:              else if (filename.EndsWith(".bmrk"))
  13:              {
  14:                  eType = SF_TEXT;
  15:              }
  16:              else if (filename.EndsWith(".rtf"))
  17:              {
  18:                  eType = SF_RTF;
  19:              }
  20:              else if (filename.EndsWith(".txt"))
  21:              {
  22:                  eType = SF_TEXT;
  23:              }
  24:              else
  25:              {
  26:                  eType = (((CP_UTF8)<<16)|SF_USECODEPAGE|SF_TEXT);
  27:              }
  28:   
  29:              //this.Redraw = false;
  30:              long b_length = 0;
  31:              System.IO.FileStream fs = new System.IO.FileStream(sfilename, System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read);
  32:              b_length = fs.Length;
  33:              Application.DoEvents();
  34:              System.Runtime.InteropServices.GCHandle gch = System.Runtime.InteropServices.GCHandle.Alloc(fs, System.Runtime.InteropServices.GCHandleType.Normal);
  35:              EDITSTREAM es = new EDITSTREAM();
  36:              es.dwCookie = (IntPtr)gch;
  37:              EditStreamCallback callback = new EditStreamCallback(StreamIn);
  38:              es.pfnCallback = callback
  39:  
  40:              SendMessage(new HandleRef(objRich, objRich.Handle), (Int32)EM_STREAMIN, (IntPtr)eType, ref es);
  41:  
  42:              //Remember to free allocated memory to avoid leaks.
  43:              gch.Free();
  44:              fs.Close();
  45:  
  46:  
  47:          }

StreamIn: StreamIn is the function that actually reads the data from the file and pushs the data into the RichTextBox callback to print into the control.

   1:          public IntPtr StreamIn(IntPtr dwCookie, IntPtr pbBuff, IntPtr
   2:              cb, out IntPtr pcb)
   3:          {
   4:              byte[] buffer = new byte[cb.ToInt32()];
   5:              uint result = 0;
   6:   
   7:  
   8:  
   9:  
  10:              System.IO.FileStream fs = (System.IO.FileStream)((GCHandle)dwCookie).Target;
  11:              //pcb = cb;
  12:              try
  13:              {
  14:                  pcb = (IntPtr)fs.Read(buffer, 0, cb.ToInt32());
  15:  
  16:                  if (pcb.ToInt32()<=0)
  17:                  {
  18:                      pcb = IntPtr.Zero;
  19:                      result = 1;
  20:                      return (IntPtr)result;
  21:                  }
  22:                  else
  23:                  {
  24:  
  25:                      System.Runtime.InteropServices.Marshal.Copy(buffer, 0, pbBuff, pcb.ToInt32());
  26:                  }
  27:              }
  28:              catch
  29:              {
  30:                  pcb = IntPtr.Zero;
  31:                  result = 1;
  32:                  return (IntPtr)result;
  33:              }
  34:              fs.Close();
  35:              return (IntPtr)result;
  36:          }

Anyway, the gist of all this, is that by setting the compile option to target 32-bit processors in the MarcEdit gui, I’ve been able to solve this issue.  I’m having the user that found the problem verify that I’ve indeed hunted this bug down and squashed it — so as soon as that’s confirmed, I’ll be pushing this fix out with MarcEdit.

–TR


Mar 25 2007

LibraryFind code refactoring

I’ve been spending time this weekend refactoring a major piece of the LibraryFind code partly in an effort to make it easier to add protocol classes.  This change affects a lot of the current API code-base, but the biggest change comes in the meta_search.rb file where nearly all the business logic relating to searching, etc. will be removed in favor of a loose plug-in architecture that I’m hoping will make it easier to add additional search classes to the program.  However, with all refactoring, there’s a bit of debugging that happens and I tell you, this morning at 4 am, it just wasn’t happening.  The big change deals with about 200 lines of code in the meta_search.rb file (which in turn affects the current files that actually make up the protocols and searching).  These 200 lines of code have been replaced by the following block:


if is_in_cache == false
    _tmparray = Array.new()
    objSearch = nil
    eval("objSearch = " + _collect.conn_type.capitalize + "SearchClass.new")
    _tmparray = objSearch.SearchCollection(_collect, _qtype, _qstring, _start.to_i, _max.to_i, _last_id, _session_id, _action_type, _data, _bool_obj)
    if _tmparray != nil: record.concat(_tmparray) end
end 

Basically, this code snippet is called if the query isn’t located in the cache.  Originally, the code that this snippet replaced was a large case statement that performed different actions depending on what protocol was being utilized.  This snippet moves all this logic into models, where search classes are then plugable.  The protocol functions will have a naming convention, take the same values (though they’ll do different things with them) and in theory will make it easier to add support for new search types.  At least, this is what I’m seeing at this moment as I add the ability to query OpenSearch targets to LF. 

Anyway, I worked on this for about 2 hours this morning.  The new plug-ins were working fine — items were going into the cache and results were being returned.  However, they were being lost during the transition from the API to the UI.  Odd.  Couldn’t figure out what was going on and debugging is difficult because the application is threaded (another change — a dedicated global thread pump) so sometimes errors occur while other parts of the application are executing.  Anyway, at 4 am, I decided to knock off and come back to it in the morning with new eyes to see if I could see what I was doing.

Well, I’m glad I decided to sleep on it.  As I was in church this morning and I had an epiphany.   Its in lines 7 and 8.  The way Ruby’s threading works, variables within the threads are isolated and protected from the rest of the application.  To deal with that, ruby has a syntax that allows you to create thread variables that can be accessed outside the application.  So for example, if I have a variable that I want to access outside of the thread, I would use something like:

Thread.current["myrecord"] = Array.new()

This syntax is how plug-ins utilizing the global thread pump will return data to the application.  And there was the rub.  I’d forgotten that Ruby always returns from a function.  Simply for clarity, I always explicitly note what is being returned at the end of each function using the older return syntax:

return xxxxx

I’d conveniently forgotten that feature in the language, and this is what was gumming up the process.  The thread pump would finish evaluating the threads, capturing the thread data and then a string or common array would also be returned (outside of the thread pump) which, since not nil, would overwrite the current record variable.  Once I had these plug-ins start returning nil values and allow data processing to be handled by the thread pump, all was right in the world again.  Unfortunately, I lost 2 hours of sleep last night on this problem, and I’d like to have them back. :)

–TR