ANZMETA doctype for Isearch


Warning: Some of this information is out-of-date. The anzmeta doctype is certainly current in the most recent versions of the Isite suite, but this document is out-of-date.

Contents:

  • version 1.14 on 12 September 2000

What is the ANZMETA doctype

The Isearch component of Isite can perform indexing, searching, and presentation of many structured document types. Each particular type of document is processed using the relevant "doctype", which is a C++ program that knows how to parse and interpret the particular document instances.

The ANZMETA doctype is the Isearch program to index and present SGML/XML documents that conform to the ANZMETA DTD version 1.1+

Do not be confused with the ANZMETA Document Type Definition (DTD) which formally defines the structure and required elements of Australasian geospatial dataset descriptions in SGML and XML. See the documentation at http://www.auslig.gov.au/anzmeta/


Installation notes

This part of our documentation now forms part of the comprehensive Australian Spatial Data Directory (ASDD) technical documentation.

The ANZMETA doctype for Isearch now forms part of the Isite package.

See the document "Implementing ASDD nodes using Isite" for some guidelines and tips for establishing an Z39.50 server using Isite on a UNIX platform.


Indexing and searching ANZMETA documents

A collection of ANZMETA documents has three files for each document ...

To index your document collection, issue the following commands ...

# define some pathnames
#
# where you installed Isite after compilation
INST_DIR=/path/to/isite/binaries

# your collection of metadata documents (which has *.xml, *.html, *.txt files)
XML_DATA=/path/to/data

# where Iindex will build its index databases
DB_DIR=/path/to/isearch/databases

# the name for your index database of dataset descriptions
DB_NAME=my-database-name

# index the collection to build the searchable database
$INST_DIR/Iindex -d $DB_DIR/$DB_NAME -t anzmeta -m 4 \
-o fieldtype=$INST_DIR/anzlic.fields $XML_DATA/*.xml

The doctype will present a particular document simply by substituting the filename extension. If the client requested a "Record Syntax" (presentation format) of HTML then the doctype will add the ".html" or ".htm" extension and present that document. If the client requested a "Record Syntax" of SUTRS (Simple Unstructured Text Record Syntax) then the doctype will add the ".txt" extension and present the plain text version of the document. In the same manner, the XML document that was indexed will be delivered. However, WWW browsers do not yet support XML, so the document will be poorly rendered


Issues

XML documents must conform to ANZMETA DTD

You will need to ensure that your SGML/XML documents do conform to the current version of the DTD and that the content of certain fields is valid. The doctype will also parse the date structures and interpret the dates. ANZMETA dates are either a keyword from a controlled vocabulary (an authority list) or a date string which complies to ISO 8601 (e.g. 1998-08-04 or 1998-08 or 1998). If the doctype cannot interpret the dates then it will issue a warning at indexing time and that particular document will not be searchable.

IndexGeo has an online XML validating parser which will assist you to ensure that your document collection complies with the ANZMETA DTD. Join the Eco Companion catalogue and document management service (entry level is free) and use the ANZMP geospatial metadata parser to retrieve and process your XML documents (batch processing is available).

You need to have the set of presentation documents

A collection of metadata documents has three files for each dataset ... the SGML/XML file which is used for indexing and the two presentation documents (.html and .txt). If any of those three files are not present, then there will be trouble.

When a client requests the full document, then Isearch will look for a document with the appropriate filename extension. If the necessary document is not present, then the only recourse is to present the file that was indexed (i.e. the raw XML file). Your Netscape browser will either call another application to present the XML document or attempt to display it (with varying results).

So, how will you develop presentation documents for all of your metadata. Use the Eco Companion document management service, of course. Batch facilities are available for data managers to validate XML and produce presentation documents.

"Batch processing" allows grade 3 Eco Companion members to specify a list of geospatial metadata documents. The ANZMP geospatial metadata parser will then retrieve each SGML/XML file from across the network, interpret the metadata, validate the structure, check some content, and produce presentation documents with locality maps. You will be advised by email when the process is complete, then you can view an overall report (with links to the individual documents and reports), and download the package of all documents to your own computer.

Send doctype to CNIDR when stable

When the doctype is suitably stable we will send it to CNIDR for inclusion in the Isite package.
Done: included in Isite-2.07+

Indexing old SGML files with ANZMETA 1.0

The original Isearch doctype called "anzlic" is still available for indexing v1.0 SGML files. IndexGeo has not carried out any testing or investigation of this doctype. It is recommended to phase out this old metadata format a soon as possible in favour of XML metadata with ANZMETA DTD v1.1+


Changes


Feedback

Please send any comments, bug reports, suggestions for improvement to crossley@indexgeo.com.au (David Crossley).


References


URL:http://www.indexgeo.com.au/tech/asdd/doctype/
Last Modified: 12 September 2000
IndexGeo Pty Ltd