Google Sitemaps

I’d run across these a few weeks ago and thought they were pretty nifty. Essentially, I was looking for something that would allow Oregon State University’s CONTENTdm collections to be harvested by Google. Since CONTENTdm has an OAI interface, and Google’s Scholar supports OAI harvesting, I thought there must be an easy way to get this set up. Fortunately, Google’s Sitemap facility provides a method for this to happen. Using the OAI server as the sitemap — I was able to get Google to quickly harvest and index our CONTENTdm collections.

Information on the Google Sitemaps can be found on the Google Sitemaps Help documentation site.

[update]
Some folks have asked (like the comment below) — how this works. Well, I created a small script that replaces the oai.exe process in CONTENTdm, at least for Google’s purposes. The script basically just handles the OAI request. Here’s the simple code:

<?
header("Content-type: text/xml");
//print file_get_contents("http://digitalcollections.library.oregonstate.edu/cgi-bin/oai.exe?" . $_SERVER['QUERY_STRING']);
$handle = @fopen("http://digitalcollections.library.oregonstate.edu/cgi-bin/oai.exe?" . $_SERVER['QUERY_STRING'], "r");
if ($handle) {
   while (!feof($handle)) {
       $buffer = fgets($handle, 4096);
       echo $buffer;
   }
   fclose($handle);
}
?>

–Terry


One Response to “Google Sitemaps”

  • donald Says:

    I was just trying to submit our OAI feed to google for indexing, but it’s unhappy that the OAI URL: /cgi-bin/oai.exe is a directory lower than the root of the site. Do you know of a way around this?

Leave a Reply