The Calimera Project is funded
under the European Commission,
IST Programme
Calimera Guidelines
Cultural Applications:
Local Institutions Mediating Electronic Resources
Resource
description
|
Calimera
Guidelines |
Issues
dealt with in this guideline include:
Domain-specific
metadata standards
Note: It is
anticipated that institutions will be approaching resource description from
different starting positions and that many, if not most, may need to adopt a
staged approach to implementation. It is also anticipated that these guidelines
may be used to support procurement – either of integrated management systems or
to aid the appointment of contractors or consultants. Some of the technologies
involved in resource description are described in the guideline on Underlying technologies and infrastructure.
POLICY ISSUES Back to Scope
Libraries have long operated in a networked information environment, and
increasingly archives and museums do too. The advent of the Internet means that
even the smallest museum, branch library or branch of a record office can now
have access to an ever increasing amount of distributed digital information
available over the web. The knowledge society, lifelong learning and the
growing impetus towards interaction with central government by electronic means
make easy access to information of increasing importance for all citizens.
Many institutions are also creating new digital content themselves, be it their
own web pages or new multimedia content, sometimes funded by specific
digitisation programmes (see the guideline on Digitisation).
They need to understand how to describe this new content such that it is both
easily retrieved by users and interoperable with other digital content.
The technologies and standards in this area are still emerging and will
continue to change and develop over time. Institutions need to be aware of the
current state of the art so as to avoid the adoption of inappropriate or obsolescent technology and standards.
Institutions need an understanding of these issues in order to plan and
prioritise their work and particularly when they are procuring new systems or
commissioning development work from outside consultants/contractors.
GOOD
PRACTICE GUIDELINES Back to Scope
Many of the technologies used in resource description are described in
the guideline on Underlying
technologies and infrastructure.
Interoperability Back to Scope
In relation to digital content interoperability means that it should be
as widely reusable, portable across different networks, systems and
organisations, and as long lasting as possible. The key to achieving this is
through standards - codified rules and guidelines for the creation, description
and management of digital resources (see Reinventing the
Wheel in D-Lib Magazine, Jan 2002, for more information [1]).
It is important to use standards for description purposes so that users
can more easily search and retrieve information from different sources - across
different catalogues, across different domains (museums, libraries and
archives), across different resource types (books, documents, museum artefacts,
audio-visual media), via different delivery channels (PCs, interactive TV,
mobile phones, handheld devices), and in different languages. (See the
guideline on Discovery
and retrieval.)
Metadata Back to Scope
Metadata has sometimes been defined as
“structured data about data”, but the term is now often used to refer to
machine-processable data that describes resources of many types and that is
used to support a range of different operations.
Cultural heritage and information professionals have been creating
metadata for as long as they have been managing collections. A library
catalogue record, for example, is metadata which describes a particular book,
so the metadata elements associated with a book might include: author, title,
publisher, date, ISBN, classification number etc. Those associated with an
object in a museum might include the object name, brief physical description,
acquisition method, date, location etc. Those associated with an archival
document might include reference code, title, creator, date, etc. , and of a
radio programme might include title, description of content, creator,
broadcaster, language, date, and so on. Libraries, museums and archives all
have catalogues, but these can be structurally different, because the nature
and significance of the relationships between the resources described is
different. e.g. in a library catalogue,
the title level record is usually the main one, with linked records for copies
of the title in the library’s branches; an archive catalogue rarely describes
multiple copies of items, and is concerned with different types of
relationships between resources (e.g. a box may be recorded at one level, and
the contents of the box individually described and linked back to the box
record in a hierarchical structure); a museum catalogue may describe individual
or multipart objects, but the data recorded is usually very different from the
data in library and archive catalogues.
Increasingly such metadata are being incorporated into digital systems,
so metadata associated with a webpage might include title, creator, subject,
description etc. Metadata has come to the fore as a means of improving the
efficiency and effectiveness of finding digital resources on the Web by
adopting a consistent structure for describing websites and other digital
resources. There is a helpful introduction, with examples, on the TASI
(Technical Advisory Service for Images) website [2].
Metadata is sometimes classified according to
the functions it is intended to support. In practice, individual metadata
schemas often support multiple functions and overlap the categories below:
·
Descriptive metadata – to describe resources and facilitate retrieval. It may be necessary for
cultural institutions to create metadata describing several classes of
resource, including:
°
the physical objects that
have been digitised;
°
the digital objects created
during the digitisation process and stored as “digital masters”;
°
the digital objects derived
from these “digital masters” for networked delivery to users;
°
new resources created using
these digital objects;
°
collections of any of the
above.
There are a number of different types of
descriptive metadata of which Marc and Dublin Core are perhaps the most well
known:
°
Marc, or Machine Readable Cataloguing
[3], is a bibliographic metadata schema,
managed by the Library of Congress. The current version is MARC 21;
°
The Dublin Core Metadata Element Set [4] is a simple metadata standard designed
to support resource discovery.
Historically, it was described as applicable to the description of
“document like objects”, but its use has been extended to include other classes
of resource. For guidance on Dublin Core see Using Dublin Core by Diane Hillmann [5].
See also Online Archive of
·
Preservation Metadata (see also the guideline on Digital
preservation) – to support preservation and archiving activities. In June
2003, OCLC (
°
the RLG set of 16 basic
metadata elements to support preservation [8];
°
the Reference Model for an
Open Archival Information Service (OAIS), a high-level framework
which describes the functions involved in the preservation process and the
information required to support those functions [9].
°
For guidance on
preservation metadata see Implementing
Metadata in Digital Preservation Systems by Brian F. Lavoie [10]; and
Preservation Metadata by Michael Day
[11].
·
Administrative Metadata – to manage the digital resource and provide information about its
creation and any constraints governing its use. This might include:
°
technical metadata
describing technical characteristics including hardware/software used in its
creation, formats, standards etc. (See for example the NISO (National Information Standards Organization
[of the
°
source metadata describing
the original object from which the digital object was produced;
°
digital provenance metadata
describing the history of the operations performed on a digital object since it
was created or digitised;
°
rights metadata describing
intellectual property rights in a resource and any use restrictions or
licensing agreements. (See the Indecs
Project for instance [13].)
·
Structural Metadata – to describe the logical or physical relationships between the parts
of a compound object. For example a physical book is one object consisting of a
sequence of pages. A digitised book may consist of one digital image per page
making the digitised book a compound object, and clearly information about the
sequence of pages is essential for use:
°
the Metadata Encoding and
Transmission Standard (METS) [14] provides an encoding format for
descriptive, administrative and structural metadata, and is designed to support
both the management of digital objects and the delivery and exchange of digital
objects across systems;
°
the IMS Content Packaging
Specification [15] describes a
means of describing the structure and organising of composite learning
resources.
·
Other categories of
metadata include:
°
Education Metadata - to
help with the resource retrieval tasks of educational institutions and managed
or virtual learning environments e.g. students records and descriptions of courses.
The primary standard for describing learning resources is the IEEE Learning Object
Metadata standard [16]. See the IMS website [17] and the CETIS website [18] for more information;
°
Geospatial Metadata for use
with digital maps and Geographical Information Systems. ISO
19115:2003 [19] standard for
geographic information metadata was released in January 2003.
Domain-specific
metadata standards Back to Scope
These have been developed to cater for the specific requirements of particular
areas, for example:
·
Archives commonly use ISAD(G) (the General International Standard Archival
Description) [20] for the metadata describing
archive materials and ISAAR (CPF) (International
Standard Archival Authority Record for Corporate Bodies, Persons and Families),
2nd ed. 2004 [21], for metadata
describing the context of the creation of those materials. To render such
descriptions electronically there are the Document Type Definitions (DTDs) EAD (Encoded Archival Description) [22], which is now being used widely all
over
·
Museums – the International Committee
for Documentation of the International Council of Museums (ICOM-CIDOC) [25] produces information on standards and
metadata for museums, including links to, for example, the CIDOC-CRM (Conceptual Reference
Model), the model of choice for many museums. The museum community has created SPECTRUM [26] and CDWA
(Categories for the Description of Works of Art) standards [27]. SPECTRUM is not available free of charge on the
Internet, but the UK mda (Museums
Documentation Association) website contains some useful factsheets summarising chapters
from SPECTRUM [28].
·
Libraries – IFLA has produced a
comprehensive index on metadata resources for digital libraries [29].
·
Government – GILS or
Government Information Locator Service/Global Information Locator [30] is used for government information
although recently many governments seem to be moving to Dublin Core in
preference. The Dublin Core Metadata Initiative (DCMI)
has established a Government Working Group [31].
Many governments, e.g.
Cultural institutions should be aware of the requirements of
community-/domain-specific metadata standards. The metadata schema(s) that are
adopted should be fully documented for all projects. This documentation should
include detailed cataloguing guidelines listing the metadata elements and
describing how those elements are to be used to describe the types of resource
created and managed by the project. Such guidelines are necessary even when a
standard metadata schema is used in order to explain how that schema is to be
applied in the specific context of the project.
To support the discovery of their resources by a wide range of other
applications and services, cultural institutions should be able to generate a
metadata description for each item using the Dublin Core Metadata Element Set (DCMES) [35] in its simple/unqualified form. The DCMES defines fifteen elements to
support simple cross-domain resource discovery: Title, Creator, Subject,
Description, Publisher, Contributor, Date, Type, Format, Identifier, Source,
Language, Relation, Coverage and Rights.
This is the minimum requirement, in practice simple DC metadata will
probably be a subset of a richer set of metadata.
To support discovery within the cultural heritage sector, projects
should also consider providing a metadata description for each item conforming
to the DC.Culture
schema [36]. Projects should show
awareness of any additional requirements for descriptive metadata, and may need
to capture and store additional descriptive metadata to meet those
requirements.
Collection Level Descriptions (CLDs) Back to Scope
In collaborative projects it may also be appropriate to consider the use
of collection level description metadata to describe the holdings of
participating organisations (scope, level, depth, language etc.). For guidance
see, for example, the UKOLN Collection Description Focus [37] for information on the use of CLDs in
the
Collection level description need not be limited to collaborative
projects however. The description of aggregations of items may be useful in
many different contexts. Even within the resources of a single institution or
project, it may turn out to be useful. (See the approach to resource discovery
described in the JISC Information Environment Architecture Functional Model [39].) A digital resource is created not in
isolation but as part of a digital collection, and should be considered within
the context of that collection and the development of the collection. Indeed,
collections themselves are seen as components around which many different types
of digital services might be constructed.
Collections should be described so that a user can discover the
important characteristics of the collection and so that collections can be
integrated into the wider body of existing digital collections and into digital
services operating across these collections.
Museums, libraries and archives should be aware of initiatives to
enhance the disclosure and discovery of collections, such as programme-,
community-, sector- or domain-wide, national, or international inventories of
digitisation activities and of digital cultural content, and should be prepared
to contribute metadata to such services where appropriate.
In describing collections it is usually necessary to map to an
appropriate metadata schema. Good examples are:
·
the Research Support
Libraries Programme (RSLP)
Collection Description schema [40];
·
the collection-level
description schema defined by Minerva D3.2
[41];
·
the emerging Dublin
Core Collection Description Application Profile [42].
If users are to be able to carry out useful searches across distributed data
sets then the producers of those data sets need to be entering values into the
metadata elements in a consistent way.
Recognised multilingual terminological sources should be used to provide
values for metadata elements where possible. Only if no standard terminology is
available, should local terminologies be considered. Where local terminologies
are deployed, information about the terminology and its constituent terms and
their meaning must be made publicly available.
The use of a terminology in metadata records, either standard or
project-specific, must be indicated unambiguously in the metadata records.
Collection-level metadata records could make use of the terminologies
recommended for use with the Minerva collection-level description schema [41].
Controlled vocabularies, thesauri and authority files
To ensure consistency it is best to adopt and use identifiable encoding
schemes or controlled vocabularies for indexing. A good example is the Library of Congress
Subject Headings [43].
A thesaurus is a controlled vocabulary where the terms are arranged in
hierarchies which show relationships such as broader or narrower terms,
equivalence or part equivalence, and where terms are designated preferred terms
or non-preferred terms (for synonym control). They also typically include scope
notes and other useful information.
There are two ISO standards for thesauri: ISO
2788, 1986 Guide to establishment and development of monolingual thesauri [44], and ISO
5964, 1985 Guide to the establishment and development of multi-lingual
thesauri [45]. Work is underway to
revise both these standards. A new standard, BS 8723: Structured vocabularies for information retrieval - guide, is
planned (see the guideline on Multilingualism.)
The
An example of an authority file is ISAAR (CPF) International
Standard Archival Authority Record for Corporate Bodies, Persons and Families,
2nd edition, 2004 [21]
published by the International Council
on Archives.
Traugott Koch has compiled a good list of controlled
vocabularies, thesauri and classification schemes [48]. TASI (Technical Advisory Service for
Images) Controlling your language - links
to metadata vocabularies [49]
provides links to more than 60 vocabulary sources.
An ontology may be described as a formal description of objects and
their inter-relationships. (See the guideline on Underlying
technologies and infrastructure, where the Semantic
Web and RDF
are also covered.)
Object identification Back to Scope
The primary reason why people assign unique identifiers to resources is
so that they, and others, can refer to the resources unambiguously. So they need to be able to rely on their
identifiers being unique (i.e. the same identifier is not assigned to another
resource) and persistent (i.e. it continues to identify this resource – how
long it should continue may be dependent on the context and the nature of the
resource). It may also be a requirement that the identifier can be used to
access the resource i.e. the identifier can be “resolved” by means of a service
to a current location of the resource.
Examples of unique persistent identifiers include:
·
DOIs (Digital Object Identifiers) [50];
·
URNs (Universal Resource Names)
[51];
·
PURLs (Persistent Uniform Resource Locators) [52];
·
ARKs (Archival Resource
Keys) [54].
Paul Miller has written an easy to understand article
explaining about unique identifiers (Miller, Paul: I am a name and a number. Ariadne, issue 24, 21st June 2000 [55]. The International Council of Museums
(ICOM) published International
Guidelines for Museum Object Information : the CIDOC Information Categories [56] in 1995. (See also the guideline on Digital
preservation.)
FUTURE
AGENDA Back to Scope
In
the future people will be surrounded by intelligent, responsive and reliable
machines capable of reacting to them as individuals. The range of technologies
described in this guideline exist now but their mature interaction is still
only imaginable. It will affect homes, schools, hotels, cars, aircraft – in
short every aspect of life. Its effects on cultural institutions will be very
far-reaching.
As the power of computing systems grows, more
information can be stored, e.g. the metadata to describe several million
objects. This will enable huge collaborative catalogues to be made available.
It will also be possible to store more complex
descriptions to accommodate for example the 2nd edition of the
Museum Documentation Association Spectrum standard for museum data which lists
hundreds of fields. Metadata in the cultural sector is likely to develop from
static structured catalogue data to complex free-text descriptions including
interpretations or responses to objects.
REFERENCES Back to Scope
[1] Gill, Tony and Miller, Paul:
Re-inventing the Wheel? Standards, Interoperability and Digital Cultural
Content . D-Lib Magazine, Volume 8 Number 1,
January 2002. http://www.dlib.org/dlib/january02/gill/01gill.html
[2] TASI (Technical Advisory Service
for Images): Metadata and Digital Images. 2002-2004.
http://www.tasi.ac.uk/advice/delivering/metadata.html
[3] Machine Readable Cataloguing
(MARC): MARC 21
[4]
http://dublincore.org/documents/dces/
[5] Hillman, Diane: Using
http://dublincore.org/documents/usageguide/
[6] Online Archive of California
Best Practice Guidelines for Digital Objects (OAC BPG DO), Version 1.0,
prepared and maintained by the Online Archive of California Working Group,
January 2004.
http://www.cdlib.org/inside/projects/oac/bpgdo/
[7] PREMIS (PREservation Metadata:
Implementation Strategies)
http://www.oclc.org/research/projects/pmwg/
[8] RLG set of 16 basic metadata elements
to support preservation;
http://www.rlg.org/preserv/presmeta.html
[9] Reference Model for an Open
Archival Information Service (OAIS)
http://ssdoo.gsfc.nasa.gov/nost/isoas/
[10] Lavoie, Brian F.: Implementing
Metadata in Digital Preservation Systems: the PREMIS Activity in D-Lib
Magazine, April 2004, Volume 10 Number 4. ISSN 1082-9873. http://www.dlib.org/dlib/april04/lavoie/04lavoie.html
[11] Day, Michael: Preservation
Metadata. Prepublication draft of chapter published in: G. E. Gorman and
Daniel G. Dorner (eds.), Metadata applications and management, International
Yearbook of Library and Information Management, 2003-2004, London: Facet
Publishing, 2004, pp. 253-273. ISBN 1-85604-474-2 (Facet Publishing); ISBN
0-8108-4980-1 (Scarecrow Press). http://www.ukoln.ac.uk/metadata/publications/iylim-2003/
[12] NISO Z39.87-2002 AIIM 20-2002
Data Dictionary -- Technical Metadata for Digital Still Images
http://www.niso.org/standards/resources/Z39_87_trial_use.pdf
[13] Indecs Project http://www.indecs.org/
[14] Metadata Encoding and
Transmission Standard (METS)
http://www.loc.gov/standards/mets/
[15] IMS Content Packaging
Specification
http://www.imsproject.org/content/packaging/
[16] IEEE Learning Object Metadata
standard
http://ltsc.ieee.org/wg12/par1484-12-1.html
[17] IMS Global Learning Consortium
http://www.imsproject.org/
[18] CETIS (Centre for Educational
Technology Interoperability Standards http://www.cetis.ac.uk/
[19] ISO 19115:2003 Geographic
information – Metadata
http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=26020&ICS1=35&ICS2=240&ICS3=70
[20] International Standard for
Archival Description (General) (ISAD(G).
Second Edition. http://www.ica.org/biblio/isad_g_2e.pdf
[21] ISAAR (CPF) (International
Standard Archival Authority Record for Corporate Bodies, Persons and Families),
2nd ed. 2004 http://www.ica.org/biblio.php?pdocid=144
[22] EAD (Encoded Archival Description) http://www.loc.gov/ead/
[23] EAC (Encoded Archival Context) http://www.library.yale.edu/eac/