The Calimera Project is funded
under the European Commission,
IST Programme
Calimera Guidelines
Cultural Applications:
Local Institutions Mediating Electronic Resources
Multilingualism
|
Calimera
Guidelines |
Issues dealt with in
this guideline include:
Transliteration,
transcription and authority files
POLICY ISSUES Back to Scope
“Language is the
foundation of communication between people and is also part of their cultural
heritage. For many, language has far-reaching emotive and cultural associations
and values rooted in their literary, historical,
philosophical and
educational heritage. For this reason the users’ language should not be an
obstacle to accessing the multicultural heritage available in cyberspace. The
harmonious development of the information society is therefore only possible if
the availability of multilingual and multicultural information is encouraged.” [1]
Article 12 of the European Charter for Regional or Minority Languages [2] deals specifically with cultural activities
and facilities – “especially libraries,
video libraries, cultural centres, museums, archives, academies, theatres and
cinemas, as well as literary work and film production, vernacular forms of
cultural expression, festivals and the culture industries, including inter alia
the use of new technologies” . The signatories to this (i.e. the member
states of the Council of Europe) “undertake
to make appropriate provision… for regional or minority languages and the
cultures they reflect”.
Cultural institutions should aim to reach as wide an audience as possible.
Websites can reach a global audience, and there are estimated to be over 6,000
languages in the world. The EU is
committed to integration among its member states but also promotes the
linguistic and cultural diversity of its peoples by promoting the teaching and
learning of languages, including minority and regional languages. The Action
Plan on Language Learning and Linguistic Diversity for 2004 – 2006 [3], states that “language learning is for
all citizens, throughout their lives. Being aware of other languages, hearing
other languages, teaching and learning other languages: these things need to
happen in every home and every street, every library and cultural centre, as
well as in every education or training institution and every business”.
2001 was designated the European Year of Languages [4] and its activities continue annually
through the celebration of the European Day of Languages on 26 September [5].
Museums, libraries and archives will need to consider providing services
in
·
the official EU languages;
·
minority indigenous
languages;
·
the languages of
immigrants;
·
non-European languages – to
some extent this will depend on the nature of their collections and whether
there is likely to be interest outside
·
sign languages.
European languages Back to Scope
There are 20 official languages
in the EU – Czech, Danish, Dutch, English, Estonian, Finnish, French, German,
Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese,
Slovak, Slovenian, Spanish and Swedish. The term “official language” is defined
as a language that can be used in dealings with public authorities and in
official documents, including commercial documents. A citizen may write to an
EU institution in any of these languages and must receive a reply in the same
language.
Information about the regional and/or minority languages of the European Union can be found on the
website of Mercator [6], a research network and information service
set up with the support of the European Commission. It is estimated that there
are over 150 minority indigenous or autochthonous languages within the EU, not
including dialects of any of the official languages, or any of the languages
spoken by immigrant communities. The European Bureau
for Lesser Used Languages (EBLUL)
[7]estimates that over 40 million people in
the EU speak a language which is not the official language of their country of
origin.
Some minority languages are afforded some sort of recognition in
·
languages specific to a
region which may be wholly or partially in one or more member states. This
would cover languages like Basque, Breton, Catalan, Frisian, Sardinian, Welsh
and so on;
·
languages spoken by a
minority in one state but which are official languages in another EU country.
This definition covers, for example, German in southern
·
non-territorial languages
such as those of Roma or Jewish communities (Romany and Yiddish), or Armenian.
Some minority languages are fully developed languages of culture taught
in schools, with established orthographies, extensive literatures and a
considerable amount of publishing. Others may lack some or even all of such
attributes and it may be difficult to make provision for them. Indigenous
linguistic minorities however tend not to present the same challenges as do
immigrants. For example:
·
they are often fully
bilingual and do not require instruction in the majority language or culture;
·
there is no doubt about
their numbers or permanence or socio-economic circumstances.
There are also many non-indigenous
languages spoken in
·
Turkish (mainly in
·
Maghreb Arabic (mainly in
·
Urdu, Bengali and Hindi (mainly in the
·
Balkan languages (spoken in many parts of the EU by
migrants and refugees who have left the region as a result of recent wars and
unrest).
Social inclusion Back to Scope
No official EU protection is afforded these languages, but heritage
institutions will need to be socially
inclusive (see the guidelines on Social
inclusion and in Cultural
identity and cohesion) and so will need to consider the language issue.
Established ethnic minorities may well be bi-lingual, or even, in the case of
second and third generations, monolingual in the majority language. Recent
immigrants pose more of a problem. Museums, libraries and archives must be
aware of the languages used in their communities. In some large cities with a
rapidly changing population this might involve regular monitoring of the
linguistic profile. As well as providing a service in any relevant minority
languages, they must also recognise their responsibility to document and
preserve the cultural identity of
all members of their communities, which could involve collecting materials and
creating content in several languages. Services to immigrants could involve:
·
recruiting staff who speak
the language(s), preferably as native speakers;
·
ensuring all leaflets,
signs and publicity are available in all relevant languages;
·
providing reading materials
and audio-visual materials in all relevant languages;
·
providing word processing
facilities in all relevant languages;
·
providing a translation
service;
·
designing websites in more
than one language.
These must not be overlooked. There
are many different versions of sign language and although not recognised as
official EU languages, the Council
of Europe's Recommendation 1598 (2003) Protection
of sign languages in the member states of the Council of Europe encourages member states "to give the sign languages used in their territory formal
recognition” [8]. Some countries,
including
GOOD
PRACTICE GUIDELINES Back to Scope
These guidelines focus on multilingualism in the digital arena. In
practice organisations have faced a number of difficulties in creating and
maintaining multi-lingual digital content and pan-European products and
services for the global networks. Some of these difficulties are technical and
some relate to the costs and difficulties of translation. In recognition of
this the EC has created an action
line to address multilingual issues under the strategically important
e-Content programme [9].
Information retrieval Back to Scope
As more and more cultural resources are
digitised access is extended to a global audience. The challenge for museums,
libraries and archives is to ensure access to these resources while at the same
time respecting cultural and linguistic diversity. Various projects have been
set up to work in this area, including MACS
(Multilingual Access to Subjects) [10].
Multilingual thesauri Back to Scope
A thesaurus is a set of controlled terms for the detailed subject
indexing of (originally) printed documents. A thesaurus will show relationships
such as hierarchy and equivalence between the terms it uses. A major problem in
the construction of thesauri in more than one language is that terms in one
language may not cover the same semantic fields as terms in another, for
example the English term “teenager” covers a narrower semantic field than the
French “adolescent”.
There are standards for the compilation of thesauri and equivalent terms
across languages (see ISO
5964:1985 Guidelines for the
establishment and development of multilingual thesauri” [11].) This
standard is an adjunct to ISO
2788:1986 [12] which covers monolingual
thesauri and so is not complete in itself, many of the problems in the
construction of thesauri being common to the construction of monolingual and
multilingual thesauri. A revision
of both standards is currently in progress. A new standard, BS 8723: Structured vocabularies for information
retrieval - guide, is planned, to cover both monolingual and multilingual
thesauri. It will be in five parts, as follows:
Part 1: Definitions, symbols and abbreviations (draft published Nov.
2004);
Part 2: Thesauri (draft published Nov. 2004);
Part 3: Vocabularies other than thesauri;
Part 4: Interoperation between multiple vocabularies;
Part 5: Interoperation between vocabularies and other components of
information storage and retrieval systems. [13].
The Getty Information Institute has produced Guidelines
for Forming Language Equivalents: A Model based on the Art and Architecture
Thesaurus [14] (http://www.chin.gc.ca/Resources/Publications/Guidelines/English/).
The chapter on multilingual thesaurus construction in Jean Aitchison, Alan
Gilchrist [and] David Bawden: Thesaurus
construction and use: a practical manual. 4th ed. ASLIB:
Multilingual websites Back to Scope
”A quality website must be aware of the
importance of multilinguality by providing a minimum level of access in more
than one language.”[16] The structure of a multilingual or
bilingual website should be carefully considered from the outset so that
multilingualism is an essential part of it and not just an afterthought. The
MINERVA Project has suggested some criteria
to define a multilingual website [17], the degree
of multilinguality being dependent on the number of these which are met. They
are:
·
some content should be
available in more than one language;
·
some content should be
available in sign language;
·
some content should be
available in non-EU immigrant languages;
·
site identity and profile
should be available in more than one language;
·
core functionality of the
site (searching, navigation) should be available in more than one language;
·
static content (images,
descriptions etc.) should be available in more than one language;
·
switching between languages
should be easy;
·
site structure and user
interface language should be logically separate so that layout does not vary
with the language;
·
multilinguality should be
driven by a formal multilinguality policy;
·
the website should be
reviewed against this policy.
In some cases a bilingual, as opposed to a multilingual, website will be
appropriate. Bilingual websites may be used:
·
in countries or regions
where there is one main minority language, e.g. Wales;
·
to address a readership
which can be expected to consist of bilingual individuals;
·
to address individuals who
may speak one or other of two languages;
·
to make a social or
political point by reminding members of the majority community of the existence
of a minority.
Multilingual websites will be needed:
·
in countries where there
are a number of minority indigenous languages;
·
to address ethnic
minorities, including immigrants and asylum seekers, in their own languages;
·
if the content of the
website is likely to be of interest to a pan-European or global audience.
There are various policy decisions to be made which have far-reaching
effects on the appearance of the website:
·
frames may be difficult in
a multilingual context;
·
multilingual pages are
likely to have a lot of text on them and may have a formidable appearance;
·
some fonts are more
appropriate for one language than another, and it is preferable to use the same
font throughout rather than to appear to make one language more legible than
others;
·
the language of logos must
be sensitively chosen. The use of a majority language in a logo can alienate
minority language users;
·
there may be a role for
touchscreen technology in the design of multilingual websites.
Remember also that a multilingual website is not a cheap option and
that, like other websites, it will require updating and this will not be such a
simple matter as updating a monolingual site.
There are a number of ways in which a multilingual website may be arranged:
·
users may be offered a once
and for all choice of which language they wish to use on the first page, and if
they want to change, may be forced to return to that page. This may be
appropriate in certain settings e.g. in a country in which two languages are
used but by no means everyone is bilingual, for example Belgium and
Switzerland;
·
they may be offered a
choice of language on each page of the site. This may be by means of a button
or filing tab, conventions familiar to most Internet users. Language links
should be at the top of the page rather than at the bottom, as that is the part
of the page displayed by default, and the link should take the user to exactly
the same page but in the other language - not to another part of the site. The
language should be given its native name e.g. French should be called Français;
·
all pages may offer the
same text in all languages. Be aware that the same text in different languages
may take up different amounts of space; typically an original text will be
shorter than a translation;
·
sites may be asymmetrical,
for example some information may be relevant to speakers of only one language
e.g. a social club for Welsh people may have its membership form in Welsh only
but in other respects may be bilingual.
The choice of arrangement may be affected by:
·
which type of audience
being addressed - individuals speaking more than one language or individuals
speaking only one language. Bilingual individuals may want to be able to see
two languages much of the time as a means of double-checking that they
understand the text correctly;
·
on a bilingual site, how
different from each other the two languages are. Some languages are mutually
comprehensible to a degree e.g. Spanish and Catalan, whereas some are not e.g.
English and Welsh. Also some concepts do not appear at all in some languages;
The website of the Welsh
Language Board [18] contains advice
from the School of Education, University of Wales, Bangor and Escola Superior
Politecnica, Universitat Pompeu Fabra, Barcelona on the design of bilingual
websites, including recommendations as to the best ways of incorporating
bilingualism into the design of a website without giving undue prominence to
one language over another, and avoiding giving offence by the use of emotive or
politically charged symbols such as flags to represent languages as many
languages are spoken in more than one country and many countries are bilingual.
Scripts Back to Scope
Computers store letters and other characters by assigning a number for each
one. The enormous diversity of languages and scripts led to hundreds of
different encoding systems for assigning these numbers. Then in the mid-1980s Unicode [19]
began to be developed. It assigns a unique binary code number to every
character in every language, no matter what the platform, program or language.
The Unicode Consortium is a non-profit making organisation founded to develop,
extend and promote the use of the standard. Unicode is continually being
expanded, nowadays even to include such things as archaic alphabets like Ogham
(an ancient Celtic script) and cuneiform, and can cope with numbers, symbols,
punctuation and Braille patterns etc.
Although Unicode Standard Version 4.0. [20] and ISO/IEC
10646:2003 [21] are not the same thing,
the sets of characters, names, and coded representations they contain are
identical. Unicode Version 4.0 covers over 96,000 characters from the world's
scripts. Although by no means the only standard in the field, it is favoured by
the IT industry, as the adoption of one method has obvious advantages for
worldwide communication, software availability, data interchange and
publishing. ISO/IEC 10646:2003 has been widely adopted in new Internet and W3C
protocols and mark up languages such as XML and HTML, and implemented in modern
operating systems and computer programming languages.
Fonts and keyboards Back to Scope
Small caps can be bought which cover the keys of a normal keyboard to
aid in the typing of languages using extended versions of the Roman script e.g.
ð å þ ñ ç æ ć ł etc. This simple method can even enable the Kanji
script of Japanese to be word-processed.
Soft keyboards, or keyboards displayed on a touch screen, may be a flexible way
of dealing with some of the problems of non-Roman or exotic scripts.
Languages with thousands of characters, like Chinese, require special software
before they can benefit from electronic word-processing. For Chinese, a normal
keyboard is used to enter a phonetic spelling of a Chinese word according to
the Pinyin system of transliteration and the software displays those characters
which are pronounced in that way – there may be as many as ten or so. The
correct character or characters are chosen and entered in the document. The
wrong choice would be the Chinese equivalent of a spelling mistake. This system
is very adaptable, enabling the traditional and the simplified Chinese
characters to be word-processed. The use of the Pinyin system does however mean
that the operator needs to know the Mandarin or Pekingese pronunciation of
Chinese, which it is not necessary to know in order to write Chinese by hand.
It is however possible to buy software
based on the Cantonese pronunciation [22].
The software takes up more space on the PC’s memory than the word-processing of
a language written in the Roman alphabet but in cities where there are
considerable numbers of Chinese people it could be justifiable to buy this
software and make it available on a dedicated machine. Arabic scripts present
less of a problem as specially adapted keyboards are available.
It is worth considering commercial fonts, software, and keyboards for
multilingual computing such as those sold by Fingertip Software [23] which are based on Unicode.
Transliteration,
transcription and authority files Back to Scope
In many cases e.g. for the production of catalogues, indexes, toponymic
lists and other works of a bibliographic nature which are meant to be used by
people who can only be expected to be familiar with the Latin alphabet, or for
typographical reasons, it will not be possible or practical to use the
characters of a non-Latin script. In that case, transcription or
transliteration will be necessary.
Transliteration is the process by which the letters of an alphabetic writing
system are converted into the symbols of another alphabetic system e.g.
Cyrillic or Greek into the Latin alphabet. There are problems caused by
alternative systems of transliteration e.g. Чехов
can be transliterated Tchehov or Chekhov.
Transcription is the process by which the sounds of a language are converted
into the symbols of another language. Transcription may in principle be used
for the sounds of any language, but it is the only system which can be used to
convert the sounds of non-alphabetic languages such as Chinese into the symbols
of the Latin or some other alphabetic system.
Clearly there are problems of standardisation as a result of transliteration
and transcription. Different systems or variation in practice would cause
difficulties in searching databases. At the moment there is no standardised
name record format relevant to the needs of European cultural institutions but
a prototype has been developed by the LEAF
project (Linking and Exploring Authority Files) [24]
funded by the EC from March 2001 to 2004. The project results
will be implemented by extending MALVINE,
an online search service for post-medieval manuscripts, into a global
multilingual information service about persons and corporate bodies [25].
International standards are being developed for the transliteration of a
variety of languages. For example there is a standard for the transliteration
of Indic scripts ISO
15919:2001, Transliteration of Devanagari
and related Indic scripts into Latin characters [26].
Machine Translation (MT) Back to Scope
At one time great hopes were entertained of MT but in view of the effort
expended on it over the last fifty years the results may be seen as
disappointing. The kind of problems which are encountered and which have so far
proved impossible to solve are, for example:
·
ambiguities in the meanings
of words;
·
differences in word order;
·
as yet no way has been
found to give computers any knowledge of the real world or context or
readership.
The effectiveness of MT systems is dependent on a number of factors e.g.
documents must be free of any typographic or grammatical errors, words not in
the dictionary of the system, or complex sentence structures.
MT is the application of computers to the task of translating texts from
one natural language to another and nowadays includes software ranging from
simple dictionary lookup programs used as word-processor add-ons to
sophisticated batch-translation systems. Viable applications for MT include:
·
content scanning, that is
using a translation system simply to obtain a rough draft so as to be able to
get the general gist of a text;
·
screening large numbers of
documents to identify those warranting human translation;
·
assisting human translators
- computer-aided translation (CAT) software uses a variety of linguistic tools
to improve the productivity of translators, particularly when translating
highly repetitive texts such as technical documentation.
There are a number of websites offering both free and charged
translation services on the World Wide Web. If a URL (Uniform Resource Locator)
is entered MT software can translate a webpage and documents can be translated
automatically. These sites also often offer translation by human beings, for
example AltaVista Babelfish, Google Language Tools, World
Lingo, Free
Translation, and Systran. The Yahoo
Language Translation and Interpretation Resources page is a useful source
of information about MT sites [27]. A
gateway to a number of web-based translation services, including Internet
search engines, is Babblefish
[28].
For more information about MT see the website of the European Association for Machine
Translation [29].
FUTURE
AGENDA Back to Scope
Work is already
underway to establish a multilingual portal to the cultural heritage of
The European Library [32], developed by the TEL project, will be
launched in 2005 as a portal offering access to the combined resources of the
43 national libraries of
It would be useful to have more central resources of materials in
minority languages along the lines of
The EC Joint Interpreting and Conference Service (SCIC
- from the French acronym) [33] has as one
of its objectives to exploit the possibilities offered by new technologies. It has
set up a unit consisting of members of staff who test
new communication tools and search for ways of bringing multilingualism to
channels of communication such as multilingual chats on Internet, multilingual
communication in the media, and multilingual virtual conferences.
The Cross-Language Evaluation Forum (CLEF) and CLEF 2004 [34] have done a lot of research into multilingual
information retrieval. It is to be hoped that such work will form the basis for
future developments.
Although imperfect in many ways, it would be useful to have some form of
machine translation for minority languages, especially those spoken by
minorities, not just the major languages of
Voice-to-voice
translation Back to Scope
Voice-to-voice translation, that is a machine which translates spoken
language from one language to another, is still science fiction but might be
developed in the comparatively distant future. Such a device would involve the
perfection of a number of complex technologies, each of which at present has
many shortcomings, including:
· &n