Implementing ASDD nodes using Isite

Nodes of the Australian Spatial Data Directory (ASDD) are Z39.50 servers running on computers at various distributed locations. A uniform WWW interface (currently only at ERIN) is used to compose search queries. The search will then be conducted simultaneously at the selected nodes.

These notes provide some guidelines and tips for establishing a Z39.50 server using Isite on a UNIX platform. This document should be read in conjunction with the documentation at ERIN (the technical coordinating node) and with the Isite documentation.

Contents:

Introduction

Z39.50 servers can connect to any back-end document management system. Documents can be stored in any brand of database, or as a collection of SGML/XML text files. Each particular server can interrogate many back-end document collections. Importantly, this enables the custodian to leave their existing document management systems in place and simply install a front-end interface to index and serve them.

Search and retrieve overview

The ASDD user-interface is an HTML form and associated broker software which could reside at any node (currently only at ERIN). The broker initiates queries to all of the selected Z39.50 servers, then monitors progress and delivers results back to the user as a dynamic web page. The user can select a particular document from the result set. The broker retrieves an HTML copy of the relevant metadata document by way of the distributed custodian node's Z39.50 server and presents it through the Web server of the initiating node.

A Z39.50 server that uses Isite has two ways of interacting with the data. One way is for Isite to pass the search or presentation request off to a relational database. The other way is for Isite to use its inbuilt "Isearch" application. Isearch uses programs called "doctypes" which know how to read and interpret metadata as stuctured text files. Specifically there are two Isearch doctypes for parsing geospatial metadata in SGML (Standard Generalized Markup Language) format. The SGML metadata documents must conform to the ANZMETA SGML/XML Document Type Definition (DTD) which defines their structure and required elements. The Isearch doctype "anzmeta" is for the current version (v1.1) of the DTD. The Isearch doctype "anzlic" is for old, basic metadata.

Z39.50 uses "element sets" to specify how the search results will be returned to the client. If the client specifies a "brief" element set ("B") then the server will return a list of hits (the title of each document that matches the query). If the client specifies a "full" element set ("F") then the server will present the relevant full document. Other element sets can be used to retrieve specific elements from the metadata documents (e.g. title + abstract).

The Isearch doctypes will present a particular full document simply by substituting the filename extension. If the client requested a "Record Syntax" (presentation format) of HTML then the doctype will add the ".html" or ".htm" extension and present that document. If the client requested a "Record Syntax" of SUTRS (Simple Unstructured Text Record Syntax) then the doctype will add the ".txt" extension and present the plain text version of the document. XML does not yet have a fully registered Z39.50 Record Syntax Object Identifier (OID) and WWW browsers do not yet support XML. However, it can be delivered if the client so requests.

Available Z39.50 server software

The different nodes of the distributed directory can each use a different Z39.50 server. For example: one node could use Isite, another node could use Zebra, while another node could use Blue Angel. Because they are all based on the Z39.50 protocol, they can all interoperate to form a virtual catalogue.

This part of our documentation now forms part of the comprehensive ASDD technical documentation.

Testing your node

See the ASDD technical documentation - Testing Isite ASDD nodes.

If you are serving collections of SGML/XML files with the Isearch ANZMETA doctype then the indexing phase will have reported most problems with your data, such as invalid date fields or not the correct keywords. Prior to that you would have ensured that your XML documents do pass the tests for structure and content of a validating parser. If you are using a well-designed relational database then you will know that your data is good.

Now you will need to ensure that the search system is configured properly and that searches can be conducted against the actual fields within the metadata documents. It is not enough to just conduct a full-text search and get some results.

The Isite package provides a number of ways to conduct queries directly against the Zserver without going through a WWW interface. Isite includes the command-line Z39.50 clients "zclient" and "izclient" (text-based interactive client, not GUI). There is also the "zbatch" program (discussed below) to run a number of pre-prepared queries. The "zping" program is useful for regularly testing that the Zserver is alive and responding. There are also many third party Z39.50 client software - see the Client and Web Gateway Surveys at DSTC and the Software list at Z39.50 Maintenance Agency - try Znavigator for Windows.

The Isite program called "zbatch" is the best way to test your Z39.50 server. "zbatch" will read an input text file which lists all of your queries, then initiate a session with the specified server, conduct the queries, and report the hit lists from each query.

You will need to understand an Isite query language called "KWAQS" and Z39.50 Use Attributes (ASDD uses the GEO Profile and Attribute Set, but these notes about the BIB-1 Profile will also apply). These resources will be useful ...

Hosted collections

Any one node can also transparently host a collection of geospatial metadata for another organisation. In that way, an organisation that does not have an actual Z39.50 server can appear to be a fully-fledged node of the ASDD. The collection of SGML documents and the corresponding presentation documents need to be on the same machine on which the Zserver is running.

Geospatial metadata management

All of that installing and configuring is the easy part. Management of your metadata is crucial.

Your SGML/XML documents will need to have correct structure so that computer programs can readily parse and interpret the metadata.

The content of certain metadata elements must be valid (especially the title, dates, search words, and spatial fields) otherwise those documents will not be available for searching.

Presentation documents need to be available, be easily re-generated directly from the metadata, and contain links to more information about the data.

IndexGeo support services

We have much experience in the field of geospatial metadata management and resource discovery. We know that establishing a sophisticated information server and search system is far more than installing and configuring software. The real issue is integrating reliable data with that clever information server software. IndexGeo assists data managers to implement a robust ASDD presence for their organisation.

IndexGeo has already implemented a number of Isite installations and metadata preparation facilities, including Eco Companion and the Australian Surveying and Land Information Group (AUSLIG). Some other installations are still in progress. Contact us to discuss your proposed ASDD node and how we can be of assistance. We can also customise the Isite software to develop other search interfaces. We can travel to your site or conduct the work by remote access.

Eco Companion Australasia is IndexGeo's environmental resources catalogue and online document management service. Entry level membership is free and allows you to use all of the document management facilities to ensure that they will be of benefit. The core of the service is the ANZMP geospatial metadata parser which will retrieve and process SGML/XML metadata documents to validate their structure and content and produce presentation documents. Members have a private and secure space to prepare their documents, with an option to publish the dataset descriptions through the Eco Companion catalogue.

"Batch processing" allows grade 3 Eco Companion members to specify a list of geospatial metadata documents. The ANZMP geospatial metadata parser will then retrieve each SGML/XML file from across the network, interpret the metadata, validate the structure, check some content, and produce presentation documents with locality maps. You will be advised by email when the process is complete, then you can view an overall report (with links to the individual documents and reports), and download the package of all documents to your own computer.

IndexGeo can also transparently host your metadata collection on our server to provide an ASDD node for you.

Additional uses of Isite and Isearch

Now that you have a sophisticated search system installed, you should make good use of it.

The Isite Information Server can provide searchable access to any collection of structured data. Straight out of its box, Isearch knows how to index many document types such as HTML, colon tagged text, comma-delimited files, generic SGML, email archives. Additional doctypes can be developed with the C++ programming language. Isearch also has a web interface which could be immediately utilised to provide sophisticated searching of HTML pages and other document types.

Other resources


URL:http://www.indexgeo.com.au/tech/asdd/support.html
Last Modified: 6 June 1999
IndexGeo Pty Ltd