The Calimera Project is funded under the  European Commission,
IST Programme

 

 
Calimera Report cover with logoCalimera Guidelines

 

 

Cultural Applications:

Local Institutions Mediating Electronic Resources

 

 

 

Resource

description

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 

Calimera Guidelines

Resource description

 

                                                       SCOPE                               

 

Issues dealt with in this guideline include:

Interoperability

Metadata

Domain-specific metadata standards

Collection level descriptions

Terminology

Ontologies

Object identification

 

Note: It is anticipated that institutions will be approaching resource description from different starting positions and that many, if not most, may need to adopt a staged approach to implementation. It is also anticipated that these guidelines may be used to support procurement – either of integrated management systems or to aid the appointment of contractors or consultants. Some of the technologies involved in resource description are described in the guideline on Underlying technologies and infrastructure.

 

                                                POLICY ISSUES                         Back to Scope

 

Libraries have long operated in a networked information environment, and increasingly archives and museums do too. The advent of the Internet means that even the smallest museum, branch library or branch of a record office can now have access to an ever increasing amount of distributed digital information available over the web. The knowledge society, lifelong learning and the growing impetus towards interaction with central government by electronic means make easy access to information of increasing importance for all citizens.


Many institutions are also creating new digital content themselves, be it their own web pages or new multimedia content, sometimes funded by specific digitisation programmes (see the guideline on Digitisation). They need to understand how to describe this new content such that it is both easily retrieved by users and interoperable with other digital content.


The technologies and standards in this area are still emerging and will continue to change and develop over time. Institutions need to be aware of the current state of the art so as to avoid the adoption of inappropriate or  obsolescent technology and standards.


Institutions need an understanding of these issues in order to plan and prioritise their work and particularly when they are procuring new systems or commissioning development work from outside consultants/contractors.

 

                                    GOOD PRACTICE GUIDELINES             Back to Scope

 

Many of the technologies used in resource description are described in the guideline on Underlying technologies and infrastructure.

 

Interoperability                                                                                Back to Scope

In relation to digital content interoperability means that it should be as widely reusable, portable across different networks, systems and organisations, and as long lasting as possible. The key to achieving this is through standards - codified rules and guidelines for the creation, description and management of digital resources (see Reinventing the Wheel in D-Lib Magazine, Jan 2002, for more information [1]).  It is important to use standards for description purposes so that users can more easily search and retrieve information from different sources - across different catalogues, across different domains (museums, libraries and archives), across different resource types (books, documents, museum artefacts, audio-visual media), via different delivery channels (PCs, interactive TV, mobile phones, handheld devices), and in different languages. (See the guideline on Discovery and retrieval.)


Metadata                                                                                         
Back to Scope
Metadata has sometimes been defined as “structured data about data”, but the term is now often used to refer to machine-processable data that describes resources of many types and that is used to support a range of different operations.

 

Cultural heritage and information professionals have been creating metadata for as long as they have been managing collections. A library catalogue record, for example, is metadata which describes a particular book, so the metadata elements associated with a book might include: author, title, publisher, date, ISBN, classification number etc. Those associated with an object in a museum might include the object name, brief physical description, acquisition method, date, location etc. Those associated with an archival document might include reference code, title, creator, date, etc. , and of a radio programme might include title, description of content, creator, broadcaster, language, date, and so on. Libraries, museums and archives all have catalogues, but these can be structurally different, because the nature and significance of the relationships between the resources described is different.  e.g. in a library catalogue, the title level record is usually the main one, with linked records for copies of the title in the library’s branches; an archive catalogue rarely describes multiple copies of items, and is concerned with different types of relationships between resources (e.g. a box may be recorded at one level, and the contents of the box individually described and linked back to the box record in a hierarchical structure); a museum catalogue may describe individual or multipart objects, but the data recorded is usually very different from the data in library and archive catalogues.

 

Increasingly such metadata are being incorporated into digital systems, so metadata associated with a webpage might include title, creator, subject, description etc. Metadata has come to the fore as a means of improving the efficiency and effectiveness of finding digital resources on the Web by adopting a consistent structure for describing websites and other digital resources. There is a helpful introduction, with examples, on the TASI (Technical Advisory Service for Images) website [2].

 

Metadata is sometimes classified according to the functions it is intended to support. In practice, individual metadata schemas often support multiple functions and overlap the categories below:

·        Descriptive metadata – to describe resources and facilitate retrieval. It may be necessary for cultural institutions to create metadata describing several classes of resource, including:

°        the physical objects that have been digitised;

°        the digital objects created during the digitisation process and stored as “digital masters”;

°        the digital objects derived from these “digital masters” for networked delivery to users;

°        new resources created using these digital objects;

°        collections of any of the above.

There are a number of different types of descriptive metadata of which Marc and Dublin Core are perhaps the most well known:

°        Marc, or Machine Readable Cataloguing [3], is a bibliographic metadata schema, managed by the Library of Congress. The current version is MARC 21;

°        The Dublin Core Metadata Element Set [4] is a simple metadata standard designed to support resource discovery.  Historically, it was described as applicable to the description of “document like objects”, but its use has been extended to include other classes of resource. For guidance on Dublin Core see Using Dublin Core by Diane Hillmann [5].  See also Online Archive of California Best Practice Guidelines for Digital Objects (OAC BPG DO), Version 1.1. [6].

 

·        Preservation Metadata (see also the guideline on Digital preservation) – to support preservation and archiving activities. In June 2003, OCLC (Online Computer Library Center) and RLG (Research Libraries Group) convened a Working Group, Preservation Metadata: Implementation Strategies (PREMIS), to focus on the practical aspects of implementing preservation metadata in digital preservation systems [7]. Most projects use:

°        the RLG set of 16 basic metadata elements to support preservation [8];

°        the Reference Model for an Open Archival Information Service (OAIS), a high-level framework which describes the functions involved in the preservation process and the information required to support those functions [9].

°        For guidance on preservation metadata see Implementing Metadata in Digital Preservation Systems by  Brian F. Lavoie [10]; and Preservation Metadata by Michael Day [11].

 

·        Administrative Metadata – to manage the digital resource and provide information about its creation and any constraints governing its use. This might include:

°        technical metadata describing technical characteristics including hardware/software used in its creation, formats, standards etc. (See for example the NISO  (National Information Standards Organization [of  the USA]): Data Dictionary - Technical Metadata for Digital Still Images [12]);

°        source metadata describing the original object from which the digital object was produced;

°        digital provenance metadata describing the history of the operations performed on a digital object since it was created or digitised;

°        rights metadata describing intellectual property rights in a resource and any use restrictions or licensing agreements. (See the Indecs Project for instance [13].)

 

·        Structural Metadata – to describe the logical or physical relationships between the parts of a compound object. For example a physical book is one object consisting of a sequence of pages. A digitised book may consist of one digital image per page making the digitised book a compound object, and clearly information about the sequence of pages is essential for use:

°        the Metadata Encoding and Transmission Standard (METS) [14] provides an encoding format for descriptive, administrative and structural metadata, and is designed to support both the management of digital objects and the delivery and exchange of digital objects across systems; 

°        the IMS Content Packaging Specification [15] describes a means of describing the structure and organising of composite learning resources.

 

·        Other categories of metadata include:

°        Education Metadata - to help with the resource retrieval tasks of educational institutions and managed or virtual learning environments e.g. students records and descriptions of courses. The primary standard for describing learning resources is the IEEE Learning Object Metadata standard [16]. See the IMS website [17] and the CETIS website [18] for more information;

°        Geospatial Metadata for use with digital maps and Geographical Information Systems. ISO 19115:2003 [19] standard for geographic information metadata was released in January 2003.

 

Domain-specific metadata standards                                              Back to Scope

These have been developed to cater for the specific requirements of particular areas, for example:

·        Archives commonly use ISAD(G)  (the General International Standard Archival Description) [20] for the metadata describing archive materials and  ISAAR (CPF) (International Standard Archival Authority Record for Corporate Bodies, Persons and Families), 2nd ed. 2004 [21], for metadata describing the context of the creation of those materials. To render such descriptions electronically there are the Document Type Definitions (DTDs) EAD  (Encoded Archival Description) [22], which is now being used widely all over Europe, and EAC (Encoded Archival Context) [23], which is still in development http://www.library.yale.edu/eac/. The EAD help pages [24] include links to further information.

·        Museums  the International Committee for Documentation of the International Council of Museums (ICOM-CIDOC) [25] produces information on standards and metadata for museums, including links to, for example, the CIDOC-CRM (Conceptual Reference Model), the model of choice for many museums. The museum community has created SPECTRUM [26] and CDWA (Categories for the Description of Works of Art) standards [27]. SPECTRUM is not available free of charge on the Internet, but the UK mda (Museums Documentation Association) website contains some useful factsheets summarising chapters from SPECTRUM [28].

·        LibrariesIFLA has produced a comprehensive index on metadata resources for digital libraries [29].

·        GovernmentGILS or Government Information Locator Service/Global Information Locator [30] is used for government information although recently many governments seem to be moving to Dublin Core in preference. The Dublin Core Metadata Initiative (DCMI) has established a Government Working Group [31]. Many governments, e.g. Australia, Canada, Denmark, Finland, Ireland, New Zealand and the UK, have produced guidelines which may be mandatory on public sector organisations. (See the e-gif (Electronic Government Interoperability Framework [32] and e-gms (Electronic Government Metadata Standard) [33] produced by the e-Government Unit in the UK [34] ). The first version of the e-gms was based on simple Dublin Core whilst the second and third moved to qualified Dublin Core with some additional document management elements.

 

Cultural institutions should be aware of the requirements of community-/domain-specific metadata standards. The metadata schema(s) that are adopted should be fully documented for all projects. This documentation should include detailed cataloguing guidelines listing the metadata elements and describing how those elements are to be used to describe the types of resource created and managed by the project. Such guidelines are necessary even when a standard metadata schema is used in order to explain how that schema is to be applied in the specific context of the project.

 

To support the discovery of their resources by a wide range of other applications and services, cultural institutions should be able to generate a metadata description for each item using the Dublin Core Metadata Element Set (DCMES) [35] in its simple/unqualified form.  The DCMES defines fifteen elements to support simple cross-domain resource discovery: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage and Rights.  This is the minimum requirement, in practice simple DC metadata will probably be a subset of a richer set of metadata.

 

To support discovery within the cultural heritage sector, projects should also consider providing a metadata description for each item conforming to the DC.Culture schema [36]. Projects should show awareness of any additional requirements for descriptive metadata, and may need to capture and store additional descriptive metadata to meet those requirements.

 

Collection Level Descriptions (CLDs)                                               Back to Scope

In collaborative projects it may also be appropriate to consider the use of collection level description metadata to describe the holdings of participating organisations (scope, level, depth, language etc.). For guidance see, for example, the UKOLN Collection Description Focus [37] for information on the use of CLDs in the UK, and Minerva: Deliverable D3.1: Inventories, discovery of digitised content & multilingual issues: Report analysing existing content [38].

 

Collection level description need not be limited to collaborative projects however. The description of aggregations of items may be useful in many different contexts. Even within the resources of a single institution or project, it may turn out to be useful. (See the approach to resource discovery described in the JISC Information Environment Architecture Functional Model [39].) A digital resource is created not in isolation but as part of a digital collection, and should be considered within the context of that collection and the development of the collection. Indeed, collections themselves are seen as components around which many different types of digital services might be constructed.

 

Collections should be described so that a user can discover the important characteristics of the collection and so that collections can be integrated into the wider body of existing digital collections and into digital services operating across these collections.

 

Museums, libraries and archives should be aware of initiatives to enhance the disclosure and discovery of collections, such as programme-, community-, sector- or domain-wide, national, or international inventories of digitisation activities and of digital cultural content, and should be prepared to contribute metadata to such services where appropriate.

 

In describing collections it is usually necessary to map to an appropriate metadata schema. Good examples are:

·        the Research Support Libraries Programme (RSLP) Collection Description schema [40];

·        the collection-level description schema defined by Minerva D3.2 [41]; 

·        the emerging Dublin Core Collection Description Application Profile [42].

 

Terminology                                                                                     Back to Scope

If users are to be able to carry out useful searches across distributed data sets then the producers of those data sets need to be entering values into the metadata elements in a consistent way.

 

Recognised multilingual terminological sources should be used to provide values for metadata elements where possible. Only if no standard terminology is available, should local terminologies be considered. Where local terminologies are deployed, information about the terminology and its constituent terms and their meaning must be made publicly available.

 

The use of a terminology in metadata records, either standard or project-specific, must be indicated unambiguously in the metadata records.

 

Collection-level metadata records could make use of the terminologies recommended for use with the Minerva collection-level description schema [41].

 

Controlled vocabularies, thesauri and authority files

To ensure consistency it is best to adopt and use identifiable encoding schemes or controlled vocabularies for indexing. A good example is the Library of Congress Subject Headings [43].


A thesaurus is a controlled vocabulary where the terms are arranged in hierarchies which show relationships such as broader or narrower terms, equivalence or part equivalence, and where terms are designated preferred terms or non-preferred terms (for synonym control). They also typically include scope notes and other useful information.

 

There are two ISO standards for thesauri: ISO 2788, 1986 Guide to establishment and development of monolingual thesauri [44], and ISO 5964, 1985 Guide to the establishment and development of multi-lingual thesauri [45]. Work is underway to revise both these standards. A new standard, BS 8723: Structured vocabularies for information retrieval - guide, is planned (see the guideline on Multilingualism.)


The Getty Museum site makes available a number of thesauri including: The Getty Art and Architecture Thesaurus [46] and the Getty Thesaurus of Geographic Names [47].

 

An example of an authority file is ISAAR (CPF) International Standard Archival Authority Record for Corporate Bodies, Persons and Families, 2nd edition, 2004 [21] published by the  International Council on Archives.

 

Traugott Koch has compiled a good list of controlled vocabularies, thesauri and classification schemes [48]. TASI (Technical Advisory Service for Images) Controlling your language - links to metadata vocabularies [49] provides links to more than 60 vocabulary sources.

 

Ontologies                                                                                        Back to Scope

An ontology may be described as a formal description of objects and their inter-relationships. (See the guideline on Underlying technologies and infrastructure, where the Semantic Web and RDF are also covered.)


Object identification                                                                       
Back to Scope

The primary reason why people assign unique identifiers to resources is so that they, and others, can refer to the resources unambiguously.  So they need to be able to rely on their identifiers being unique (i.e. the same identifier is not assigned to another resource) and persistent (i.e. it continues to identify this resource – how long it should continue may be dependent on the context and the nature of the resource). It may also be a requirement that the identifier can be used to access the resource i.e. the identifier can be “resolved” by means of a service to a current location of the resource.

 

Examples of unique persistent identifiers include:

·        DOIs (Digital Object Identifiers) [50];

·        URNs (Universal Resource Names) [51];

·        PURLs (Persistent Uniform Resource Locators) [52];

·        Handles [53];

·        ARKs (Archival Resource Keys) [54].

 

Paul Miller has written an easy to understand article explaining about unique identifiers (Miller, Paul: I am a name and a number. Ariadne, issue 24, 21st June 2000 [55]. The International Council of Museums (ICOM)  published  International Guidelines for Museum Object Information : the CIDOC Information Categories [56] in 1995. (See also the guideline on Digital preservation.)

 

                                              FUTURE AGENDA                        Back to Scope

 

In the future people will be surrounded by intelligent, responsive and reliable machines capable of reacting to them as individuals. The range of technologies described in this guideline exist now but their mature interaction is still only imaginable. It will affect homes, schools, hotels, cars, aircraft – in short every aspect of life. Its effects on cultural institutions will be very far-reaching.

 

As the power of computing systems grows, more information can be stored, e.g. the metadata to describe several million objects. This will enable huge collaborative catalogues to be made available.

 

It will also be possible to store more complex descriptions to accommodate for example the 2nd edition of the Museum Documentation Association Spectrum standard for museum data which lists hundreds of fields. Metadata in the cultural sector is likely to develop from static structured catalogue data to complex free-text descriptions including interpretations or responses to objects.

 

                                                  REFERENCES                           Back to Scope

 

[1] Gill, Tony and Miller, Paul: Re-inventing the Wheel? Standards, Interoperability and Digital Cultural Content . D-Lib Magazine, Volume 8 Number 1,  January 2002. http://www.dlib.org/dlib/january02/gill/01gill.html

 

[2] TASI (Technical Advisory Service for Images): Metadata and Digital Images. 2002-2004.

http://www.tasi.ac.uk/advice/delivering/metadata.html

 

[3] Machine Readable Cataloguing (MARC): MARC 21

http://www.loc.gov/marc/

 

[4] Dublin Core Metadata Element Set, Version 1.1

http://dublincore.org/documents/dces/

 

[5] Hillman, Diane: Using Dublin Core. 2003.

http://dublincore.org/documents/usageguide/

 

[6] Online Archive of California Best Practice Guidelines for Digital Objects (OAC BPG DO), Version 1.0, prepared and maintained by the Online Archive of California Working Group, January 2004.

http://www.cdlib.org/inside/projects/oac/bpgdo/

 

 

[7] PREMIS (PREservation Metadata: Implementation Strategies)

http://www.oclc.org/research/projects/pmwg/

 

[8] RLG set of 16 basic metadata elements to support preservation; http://www.rlg.org/preserv/presmeta.html

 

[9] Reference Model for an Open Archival Information Service (OAIS)

http://ssdoo.gsfc.nasa.gov/nost/isoas/

 

[10] Lavoie, Brian F.: Implementing Metadata in Digital Preservation Systems: the PREMIS Activity in D-Lib Magazine, April 2004, Volume 10 Number 4. ISSN 1082-9873. http://www.dlib.org/dlib/april04/lavoie/04lavoie.html

 

[11] Day, Michael: Preservation Metadata. Prepublication draft of chapter published in: G. E. Gorman and Daniel G. Dorner (eds.), Metadata applications and management, International Yearbook of Library and Information Management, 2003-2004, London: Facet Publishing, 2004, pp. 253-273. ISBN 1-85604-474-2 (Facet Publishing); ISBN 0-8108-4980-1 (Scarecrow Press). http://www.ukoln.ac.uk/metadata/publications/iylim-2003/

 

[12] NISO Z39.87-2002 AIIM 20-2002 Data Dictionary -- Technical Metadata for Digital Still Images

http://www.niso.org/standards/resources/Z39_87_trial_use.pdf

 

[13] Indecs Project http://www.indecs.org/

 

[14] Metadata Encoding and Transmission Standard (METS)

http://www.loc.gov/standards/mets/

 

[15] IMS Content Packaging Specification

http://www.imsproject.org/content/packaging/

 

[16] IEEE Learning Object Metadata standard

http://ltsc.ieee.org/wg12/par1484-12-1.html

 

[17] IMS Global Learning Consortium http://www.imsproject.org/

 

[18] CETIS (Centre for Educational Technology Interoperability Standards http://www.cetis.ac.uk/

 

[19] ISO 19115:2003 Geographic information – Metadata

http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=26020&ICS1=35&ICS2=240&ICS3=70

 

[20] International Standard for Archival Description (General) (ISAD(G).  Second Edition. http://www.ica.org/biblio/isad_g_2e.pdf

 

[21] ISAAR (CPF) (International Standard Archival Authority Record for Corporate Bodies, Persons and Families), 2nd ed. 2004  http://www.ica.org/biblio.php?pdocid=144

 

[22] EAD  (Encoded Archival Description) http://www.loc.gov/ead/

 

[23] EAC (Encoded Archival Context) http://www.library.yale.edu/eac/

 

[