Eco Companion Australasia
About
help | about | home | feedback | member services | browse | search | new | glossary | join

About geospatial dataset descriptions

Introduction

A dataset is a particular organised collection of data with a common theme. A dataset description is a concise document that describes a dataset.

The dataset description document is well-defined and structured, using elements of metadata (data about data). It is analogous to a card in the catalogue of your local library which describes the books and gives enough information to know what the book is called, its unique number, how and where to find it. These details are metadata - bibliographic elements such as Author, Title, Abstract.

This document outlines the management of geospatial dataset descriptions using Eco Companion.

Contents:


What is a dataset

In the context of the Eco Companion catalogue, a dataset is any organised collection of data or information that has a common theme. It also has, among its attributes, a reference to a location in earth-centred space.

A dataset might be a list of objects, a digital map, records of geological borehole samples, a collection of photographs at a certain location or of a certain subject, a database comprising records of pollution sites, a scientific report, a listing of results from a school project.

A dataset need not only be in digital form. A paper publication, or report, or collection of maps are also datasets.

Dataset descriptions concisely describe collections of data that are available free-of-charge or for sale.

Some mythical dataset names to help explain what is a dataset

Examples of what would constitute a dataset ...

.

Some useful actual dataset descriptions

Management of dataset descriptions using Eco Companion

As a member of the Eco Companion service you can maintain descriptions of datasets for which you are the custodian.

If you choose to publish those dataset descriptions, then they will be available for searching through the Eco Companion catalogue. This then enables Internet searchers to discover such datasets that may meet their needs.

There are two methods to prepare your dataset description documents:

The help document "Managing your dataset descriptions with ANZMP" explains the methods for preparing and processing your dataset descriptions.

The next sections give a brief foundation of document management, explaining the crucial components.

About metadata

Metadata is data about data. A dataset description is metadata, describing the data, but is not the actual data itself.

The dataset description document is well-defined and structured using elements of metadata. It is analogous to a card in the catalogue of your local library which describes the books and gives enough information to know what the book is called, its unique number, how and where to find it. These details are metadata - bibliographic elements such as Author, Title, Abstract. [see glossary]

Geospatial dataset descriptions are composed of well-defined metadata elements which have a specific structure and order. The dataset descriptions are maintained as documents in the Standard Generalised Markup Language (SGML). The Document Type Definition (DTD) defines the allowed structure and elements that must comprise a dataset description. This DTD then facilitates automatic and consistent processing of dataset descriptions.

About SGML and XML

At its very basic form, the Structured Generalized Markup Language (SGML) describes particular elements of text with easily recognisable nametags in opening and closing pairs. These elements can be parents and children, thus describing a hierarchical, structured relationship between the elements of a document.

<document>
   <title>My Life</title>
   <author>Fred Nerk</author>
   <abstract>Not much</abstract>
</document>

SGML is, of course, much more sophisticated than this simple example. See further explanation and the glossary.

The first application of SGML on the World Wide Web was HTML (HyperText Markup Language). Now XML (eXtensible Markup Language) brings the true power of structured document management.

About Document Type Definition (DTD)

To ensure that all dataset descriptions are of a consistent type, the Document Type Definition (DTD) defines the metadata elements and their order and structure. The DTD is "used to automatically process a document ... and check that all of the elements required for that document are indeed present and correctly ordered" [from A Gentle Introduction to SGML].

This insistence on a uniform document type means that the SGML dataset description documents can be readily transferred between information systems (as SGML is independent of vendors, platforms, and software). The documents can be consistently processed to ensure that they are valid and that search facilities can rely on certain elements (such as the geographic extents and the keywords). The DTD is a vital component of the SGML document management solution. See further explanation and the glossary.

The Australia New Zealand Land Information Council (ANZLIC) has released the ANZMETA XML DTD to define geographic dataset descriptions in Australia and New Zealand. The ANZMETA DTD and supporting documentation can be found at http://www.auslig.gov.au/anzmeta/
During 2000-2001 the ANZMETA DTD is expected to evolve to be the Australian profile of the ISO metadata (ISO 19115).

An example dataset description

So, after all of that explanation, we can now look at an example geospatial dataset description and elaborate on some of the features.

Select this link to show the example in a new browser window - you can now continue to view this current page while we explain, and you can refer to the example in the other window to see the effects.

This example dataset description has been produced by the Eco Companion facility ANZMP (Australia New Zealand Metadata Parser) from this SGML metadata dataset description.

The following fragment of the SGML document produces the Citation section of the dataset description.

<citeinfo>
   <uniqueid>IG1</uniqueid>
   <title>Test dataset description #1</title>
   <origin>
      <custod>IndexGeo Pty Ltd</custod>
      <jurisdic>
            <keyword>Australia</keyword>
      </jurisdic>
   </origin>
</citeinfo>

To ensure consistent document production, the DTD specifies that there must be one and only one TITLE element, and that the CUSTODIAN and JURISDICTION elements must immediately follow the TITLE and be contained in the ORIGINATOR element.


URL:http://www.indexgeo.com.au/ec/about/dataset.html
Last Modified: 9 September 2000