Eco Companion: About dataset descriptions

Eco Companion Australasia

About

About geospatial dataset descriptions

Introduction

A dataset is a particular organised collection of data with a common theme. A dataset description is a concise document that describes a dataset.

The dataset description document is well-defined and structured, using elements of metadata (data about data). It is analogous to a card in the catalogue of your local library which describes the books and gives enough information to know what the book is called, its unique number, how and where to find it. These details are metadata - bibliographic elements such as Author, Title, Abstract.

This document outlines the management of geospatial dataset descriptions using Eco Companion.

What is a dataset

In the context of the Eco Companion catalogue, a dataset is any organised collection of data or information that has a common theme. It also has, among its attributes, a reference to a location in earth-centred space.

A dataset might be a list of objects, a digital map, records of geological borehole samples, a collection of photographs at a certain location or of a certain subject, a database comprising records of pollution sites, a scientific report, a listing of results from a school project.

A dataset need not only be in digital form. A paper publication, or report, or collection of maps are also datasets.

Dataset descriptions concisely describe collections of data that are available free-of-charge or for sale.

Some mythical dataset names to help explain what is a dataset

Canberra taxi-driver collection of fox and other feral pest sightings
Ten guidelines for the care of native wildlife in Australia
Native species of tree and shrub suitable for cultivation in the Southern Highlands of NSW

Examples of what would constitute a dataset ...

school projects collect relevant environmental monitoring data
community environmental management groups would generate data and reports from their monitoring and revegetation projects
observations of the occurrence and habits of native wildlife by skilled ecologists or lay-persons
collections of references to resources about a certain topic
a database that records local environmental issues and actions, which is managed by a local government council or by a community action group
a scientific study with its results and data records
a collection of photographs at a certain location or of a certain ecological subject
records of sightings of feral pests
a database comprising many records of water quality measurements
environmental impact assessments
a commercial business sells environmental data for use with farm management software for sustainable agriculture
a report (including photographs, tables, and appendices) of some aspect of environmental management

Some useful actual dataset descriptions

Australian Water Resources Management Committee: Australian Drainage Basins
- http://www.auslig.gov.au/meta/meta5.htm
- Custodian: Australian Surveying and Land Information Group (AUSLIG)
Topographic Map Sheet Names
- http://www.auslig.gov.au/meta/meta42.htm
- Custodian: Australian Surveying and Land Information Group (AUSLIG)
References for defining Australian locations, geospatial regions, and geographic place names
- http://www.indexgeo.com.au/ec/pub/crossley/dataset/region.html
- Custodian: IndexGeo Pty Ltd

Management of dataset descriptions using Eco Companion

As a member of the Eco Companion service you can maintain descriptions of datasets for which you are the custodian.

If you choose to publish those dataset descriptions, then they will be available for searching through the Eco Companion catalogue. This then enables Internet searchers to discover such datasets that may meet their needs.

There are two methods to prepare your dataset description documents:

use the Eco Companion "Edit dataset description" facility
- Online forms guide you through the process.
or, place your own metadata document at another Internet server
- Prepare your own metadata document and ask our ANZMP geospatial metadata parser to retrieve it across the network. Batch processing for metadata collections.

The help document "Managing your dataset descriptions with ANZMP" explains the methods for preparing and processing your dataset descriptions.

The next sections give a brief foundation of document management, explaining the crucial components.

About metadata

Metadata is data about data. A dataset description is metadata, describing the data, but is not the actual data itself.

The dataset description document is well-defined and structured using elements of metadata. It is analogous to a card in the catalogue of your local library which describes the books and gives enough information to know what the book is called, its unique number, how and where to find it. These details are metadata - bibliographic elements such as Author, Title, Abstract. [see glossary]

Geospatial dataset descriptions are composed of well-defined metadata elements which have a specific structure and order. The dataset descriptions are maintained as documents in the Standard Generalised Markup Language (SGML). The Document Type Definition (DTD) defines the allowed structure and elements that must comprise a dataset description. This DTD then facilitates automatic and consistent processing of dataset descriptions.

About SGML and XML

At its very basic form, the Structured Generalized Markup Language (SGML) describes particular elements of text with easily recognisable nametags in opening and closing pairs. These elements can be parents and children, thus describing a hierarchical, structured relationship between the elements of a document.

<document>
   <title>My Life</title>
   <author>Fred Nerk</author>
   <abstract>Not much</abstract>
</document>

SGML is, of course, much more sophisticated than this simple example. See further explanation and the glossary.

The first application of SGML on the World Wide Web was HTML (HyperText Markup Language). Now XML (eXtensible Markup Language) brings the true power of structured document management.

About Document Type Definition (DTD)

To ensure that all dataset descriptions are of a consistent type, the Document Type Definition (DTD) defines the metadata elements and their order and structure. The DTD is "used to automatically process a document ... and check that all of the elements required for that document are indeed present and correctly ordered" [from A Gentle Introduction to SGML].

This insistence on a uniform document type means that the SGML dataset description documents can be readily transferred between information systems (as SGML is independent of vendors, platforms, and software). The documents can be consistently processed to ensure that they are valid and that search facilities can rely on certain elements (such as the geographic extents and the keywords). The DTD is a vital component of the SGML document management solution. See further explanation and the glossary.

The Australia New Zealand Land Information Council (ANZLIC) has released the ANZMETA XML DTD to define geographic dataset descriptions in Australia and New Zealand. The ANZMETA DTD and supporting documentation can be found at http://www.auslig.gov.au/anzmeta/
During 2000-2001 the ANZMETA DTD is expected to evolve to be the Australian profile of the ISO metadata (ISO 19115).

An example dataset description

So, after all of that explanation, we can now look at an example geospatial dataset description and elaborate on some of the features.

Select this link to show the example in a new browser window - you can now continue to view this current page while we explain, and you can refer to the example in the other window to see the effects.

This example dataset description has been produced by the Eco Companion facility ANZMP (Australia New Zealand Metadata Parser) from this SGML metadata dataset description.

The following fragment of the SGML document produces the Citation section of the dataset description.

<citeinfo>
   <uniqueid>IG1</uniqueid>
   <title>Test dataset description #1</title>
   <origin>
      <custod>IndexGeo Pty Ltd</custod>
      <jurisdic>
            <keyword>Australia</keyword>
      </jurisdic>
   </origin>
</citeinfo>

To ensure consistent document production, the DTD specifies that there must be one and only one TITLE element, and that the CUSTODIAN and JURISDICTION elements must immediately follow the TITLE and be contained in the ORIGINATOR element.

URL:http://www.indexgeo.com.au/ec/about/dataset.html
Last Modified: 9 September 2000

About geospatial dataset descriptions

Contents: