[Mimas logo] "epub@mimas"


DC2007 Conference Report

Ann Apps
Mimas, The University of Manchester

DC2007: The International Conference on Dublin Core and Metadata Applications (Application Profiles: Theory and Practice) was held in Singapore on 27-31 August 2007. The conference was hosted by the Singapore National Library Board and held in the Intercontinental Hotel, which was across the road from the superb National Library building.

[Singapore Lion photo]

The first day of the conference was a tutorial day, which I did not attend. The main conference took place on the Tuesday, Wednesday and Thursday. As usual some of the parallel sessions were meetings of the Dublin Core Metadata Initiative (DCMI) Community Groups and there were also some special sessions. On the Friday there were Seminars. These reports are of the sessions I attended and noted. All papers are available via the DCMI Conference Proceedings website. The presentations will be available via the conference website. Overall it was a very interesting, well put-together conference.

[Singapore National Library photo]

Opening Session

The opening speech of the conference was made by Dr Vivian Balakrishnan, Singapore's Minister for Community Development, Youth and Sports and Second Minister for Information, Communications and the Arts. An announcement was made that DCMI is working towards incorporation as an independent legal entity in Singapore as a not-for-profit public company. This activity is in collaboration with the National Library Board Singapore who will provide administrative support to DCMI.

Keynote: Johannes Keizer, Food and Agriculture Organisation of the United Nations

This talk was about information and knowledge management at the FAO, a new focus on the FAO as 'Knowledge Organisation'. The mission of the FAO is to reduce hunger and poverty in the world by 50% by 2015.

The Application Profiles are the building blocks for a semantic web space in food and agriculture. The AGRIS database holds FAO ducumentation. It contains more than 2.5 million bibliographic records from 200 centres round the world, including the full text of many. They identified a need for investment in metadata and controlled vocabularies now they've moved into the electronic age: knitting the semantic web in their area. Resources are 'information objects', but not just publications. They include learning resources, organisations, projects, news and events.

Before 2000, the resources in AGRIS used proprietary formats, which caused conversion problems. They are now using DC as a common data exchange layer: the AGRIS Application Profile (AP). As well as DC this includes AGLS (Australian Government Locator) and the AGMES agriculture vocabulary. Initially there were criticisms that it was too complicated with too many elements to supply. But they persisted and now all AGRIS centres, as well as the central database, are using the AGRIS AP XML as an exchange format. It is also used for OAI harvesting as well as simple DC, and some specific DSpace software supports AGRIS AP.

They also have an AP and a central registry for organisations. They are developing an AP to describe projects, but are still at the stakeholder discussion stage. An AP for news and events is already in use.

The common exchange layer includes both metadata ontologies (the APs) and subject ontologies. This allows for the development of value-added services on top. AGROVOC: vocabulary of agrictural concepts. It has a concept server workbench for input. There is an ontology within the fishery area with a formal knowledge coding. Also a geo-political ontology of countries.

They set up a website of Agricultural Information Management Standards, which is surprisingly well used. It is open to everyone and fosters collaborations.

The semantic web starts from existing data applications. The promise of richer semantics implies a greater possibility of smarter applications, giving a large return on investment. There is a distinction between an AP, which is a type ontology, encoded lists with relations, and an ontology, which is a formalised encoding of knowledge. But there is a general problem in information management to prove investment return and value for money.

Paper Session 1: Conceptual Modelling

Parallel Writing Tradition in East Asian Langauge Data and Metadata Representation: Under the Light of the DCMI Abstract Model. Akira Miyazawa

This paper described a system to write the same thing in multiple representations. The database and metadata communities use parallel writing widely. Eg. In Singapore, and in China and Korea, station names are repeated in multiple languages. RFC4646 can capture different scripts as well as langauges and regions. There is a proposal to register more scripts as sub-langauges.

Annotation Profiles: Configuring Forms to Edit RDF. Matthias Palmér, Fredrik Enoksson, Mikael Nilsson, Ambjörn Naeve

This paper described 'luisa', which is a metadata input editor. There are 3 possibilities for implementing a metadata editor:

Luisa is a metadata editor of the third type. It is an annotation tool, and is embeddable in larger applications. (Here, 'annotation' means the human authoring of metadata.) There are special profiles for annotation roles, eg. librarian, teacher. The editor will show them different fields.

Application Profile Model Plenary Session

Mikael Nilsson, Tom Baker

This session described the new definition of a Dublin Core AP, including Description Set Profiles. The objective is to make an AP machine-readable. The CEN Guidelines for a DCAP are not based on the Dublin Core Abstract Model (DCAM), have no support for description sets, and are a flat list of properties with single values.

A Dublin Core Description Set Profile (DSP) may be used as: a formal representation of constraints of a DCAP; a configuration for databases; a configuration for metadata editing tools. It is an information model defining: the structural constraints on a description set; what descriptions may occur; what properties may be used. Out of scope are: human readable documentation; definition of vocabularies; version control (managed outside of the description set).

A DCAM-conformant AP is now a packet of documentation:

A data input editor (SHAME) has been developed using a DSP as an underlying template. A Wiki sytnax has been developed for generating a DSP.

Parallel Session 2

Collection Description Community Meeting. Ann Apps

Report of meeting

Paper Session 3: Application Profiles: Issues and Practice

Theory and Practice of Application Profile Development. Jon Mason, Helen Galatis

Jon gave an overview of metadata and AP development in education. He finished with a thought provoking slide about the future and community content context. At the centre were: learning, thinking, knowing. Around the centre were: who, what, when, where, how, if. 'If' refers to knowledge management decision support systems - rule-based scenario development.

Application Profiles: Exposing and Enforcing Metadata Quality. Diane Hillmann, Jon Phipps

Are machine-readable APs are a 'good thing'? They are good for data creation because they enforce rules and decision making. What is the potential when the world is full of machine-readable APs? Metadata data quality is important but not well explored. These two will be put together. Then we can measure the product against the attempt. Need to make a statement of intent about how to apply quality criteria with AP.

Data profiles are a data footprint. They are a precursor to evaluation of metadata by machines. AP is a high-level view of data expectations.

Using an Application Profile Based Service Registry. Ann Apps

My paper described the JISC Information Environment Service Registry (IESR), its metadata schema and how the registry might be used. The issues and bebefits of using an AP were discussed.

Paper Session 4: Identification, Registries and Reuse

Identifying the Identifiers. Douglas Campbell

This presentation descontructed identifiers. An identifier is used for communication, to refer to a thing. There is a need to be able to differentiate, to compare by defining and describing. Semiotics is about how to communicate using signs and symbols. An identifier is a thought - semiotic triangle - needs a remembrance. Identifiers are the manifestation of the act of identifying. Granularity: if you have a need to identify it, then identify it.

Using Metadata Schema Registry as a Core Function to Enhance Usability and Reusability of Metadata Schemas. Mitsuharu Nagamori, Shigeo Sugimoto

Designing a new metadata schema, eg. Manga for a graphical cartoon novel. They want to reuse and customise an existing metadata schema. So use a metadata schema registry to find and reuse metadata schemas.

Special Session

Ontology Modelling Using Topic Maps and RDF/OWL. Sam Oh

This popular session rather ambitiously covered ISO Topic Maps and W3C RDF/OWL in an hour and a half.

Topic maps are an international standard for knowledge integration. They are used for organising large bodies of organisational information. The aim is for seamless knowledge. They consist of 'Topics', 'Associations' and 'Occurrences'. There is an information layer, plus a knowledge layer that consists of topics and associations.

An Ontology is any form of classification scheme. A schema is an ontology plus constraints. A 'design ontology' captures rich relationships between data.

Ontopoly Ontopia is a free editor that can be used for schema design. This enables database design with semantic meaning.

Parallel Session 5

Tools Community Meeting. Jane Greenberg

Discussion was about a proposed Application Profile for describing software tools. This would be used to provide a consistent format for the DC Tools and Software page.

Work required to complete the Application Profile was discussed. It will now need to fit in with the Singapore Framework. One possibility is to import the AP into the wiki syntax including constraints, which will probably involve retyping. The advantage of doing this is that the machine-readable, XML form can be generated. The wiki is aligned tightly with the DCAM, so problems in importing the AP into the wiki will indicate a red flag on the modelling. A Task Group is to be set up to complete the AP.

After revision the AP will be submitted to the Usage Board for review. They check that: usage of terms is consistent with the DCAM; all terms are defined somewhere; it fits with the functional requirements and the domain model.

A policy document is needed for the Tools and Software page to define the scope of what should be described there and the functional requirements for this page. Suggested scope was: tools related to the Dublin Core Abstract Model. This would include RDF tools. The scope needs to fit in with the general model and include metadata vocabularies and instance metadata. The scope is implicit in the domain model. Should it be extended to tools for creating Learning Object Metadata (LOM)? Maybe there are two scopes: what can be described with the Tools AP; what tools we want to capture on the DC pages.

Another significant piece of information about a tool is what type of content it works on. There was some inconclusive discussion on the relationship between services and tools.

Keynote 2

Keynote: Zhang Xiaoxing, Deputy Director, National Cultural Information Resource Centre of China

The speaker described the organisation being set up by this national centre, provincial sub-centres, and grass roots centres, eg. local libraries and local communities. Resources are described using DC metadata and DCCAP (Dublin Core Collections Application Profile) collections metadata.

Paper Session 6

Integrating Dublin Core Metadata for Cultural Heritage Collections Using Onotologies. Constantia Kakali, Irene Lourdi, Thomais Stasinopoulou, Lina Bountouri, Christos Papatheodorou, Martin Doerr and Manolis Gergatsoulis

They are doing semantic integration using an ontology. This is the CIDoc Conceptual Reference Model, which is a cultural heritage domain ontology. They have mapped this to DC and DC Type, and also to DCCAP.

Can a system make novice users experts? Important Factors for Automatic Metadata Generation Systems. Sueyeon Syn and Michael B. Spring

They made a benchmark on experts. Then they compared metadata creation performance between experts and novices. This included time, precision and recall during an observed session. This was folllowed up by a post-questionnaire.

Special Session

Identifiers. Douglas Campbell

ISO Identifiers. Juha Hakala

These are some of the identifiers on the ISO agenda, there being many others.

ISSN-L ('Linking ISSN')
One ISSN for the serial and another for a manifestation. For a serial the ISSN is the same as the existing one, but it has different roles. It will be a technical challenge to install this.
DOI
New work item in ISO. Scope problem: a DOI can be assigned to anything, including things that already have another ISO identifier, eg. a book. They are in the process of drafting text to minimise this mis-use problem.
ISNI (International Standard Name Identifier)
This was previously call the Party identifier and is similar to the libraries' ISADN. It is an identifier for public identities. There are conflicting interests making scope definition difficult. Libraries are still seeking an alternative via VIAF.

URN is an identifier system, ie. it is actionable. It is recommended for libraries. There will be a European Resolver discovery service. URNs could be used in MARC records because they are stable. DOI is not so good for use in libraries because there is not 100% coverage, they may not remain free, and there are technical `handle' problems.

Report of NISO Round Table on Identifiers. John Kunze

There is a white paper giving the final results of this workshop. The issues are all about service infrastructure.

Identifiers should be usable with current WWW standards. There is a new simple name resolver n2t (name to thing). This could replace Handle with a simple web server. There is a problem with Handle if the thing identified moves to more than one place. n2t addresses this, whereas as DOI/Handle assumes it all moves to one place.

Another Identifier Workshop is planned in Göttingen in November.

DOIs in the National Library of Singapore Ganesh

Ganesh talked about the use of DOIs by the National Library in Singapore.

Identifiers in the National Library of New Zealand. Douglas Campbell

Identifiers are used as access identifiers. They use many different ones, eg. ISBN for a book in the catalogue and a separate unpublished identifier in the database. A lot are not persistent because they are assigned by computer systems and may be reused. They are working on mapping all these identifiers together. There is a pivot identifier that all others are mapped to. They have added prefixes to all numbers to make them unique. The resolver will work on all these identifiers. Collection items are also available by contexts. This indicates which identifier is to be used in which context.

A lot of identifiers are not resolvable. A service is needed that provides resolution of any identifier. The identifiers are in a registry. The generic resolver can deduce a prefix from the format (eg. ISBN).

Identifier Principles and Compromises. Stu Weibel

WorldCat are looking at functionality and identifier pattern design. There is no perfect answer. The canonical identifier of a resource is the one that is best for all use. Identifiers are needed for aggregating in the social context. Multiple canonical identifiers dilute the social bibliography. But multiple identifiers make it easier for people to get to a given resource. So these objectives need to be brought together.

Branding within an identifier is important. Embedding branding compromises long term persistence. But it is critical to value recognition, provenance and trust. We should trust in libraries not eg Amazon, who have a different place in the world of asset management. A library's billboard is its URL. Every identifier is a micro-billboard. Hackable, extensible patterns within identifiers reinforce user expectations, ie. they make it easier to find things.

To design an identifier it is necessary to judge the business model, functional requirements and system design. Compromises are inevitable.

Discussion

Suggested discussion topics:

Maybe the branding part of a URI should be excluded from a resolution system. An identifier should be unique and persistent starting at the path name. This would allow the identifier to still work if the branding changed. California Digital Library are looking at abstract work identifiers that are opaque. The service access point includes semantics and the leading branding.

Parallel Session

Social Tagging Community Meeting. Liddy Nevile

Tagging is the labelling of items. It includes user tagging and folksonomy.

education.edu in Australia Sarah Hyman (by audio plus slides) and Pru Mitchell (present)

Machine tagging created tag clouds from text in RSS feeds. User tags will be different, and the same user may create different tags at different times. There are many reasons for this. Tags are multi-dimensional creating different tag collections. Tags are meaningful to a user, but can be shared in a community. This knowledge creation informs information management.

There are issues with user tagging:

Managing folksonomies uses three entities: user; tag; item. A hybrid view can impose some control. Currently their application is a proof of concept. Users can tag with either their own, or a suggested, vocabulary. In the future this may lead to a folksonomy inspired taxonomy.

This is social networking (like delicious) for the education community, eg. adding a thread of discussion to a resource. It is for teachers to discuss resources. In delicious users can collect education resources. The project is monitoring certain identified users and what they are tagging. They intend to evaluate the created resource, maybe by harvesting it into a repository.

What are the questions needed to research the area? What is the cost to deal with all the information being gathered?

Social Tagging: an Overview of Issues in Analysing. Emma Tonkin

A tagset is not a finished index. It is not an endpoint, but it could be analysed. Free text tags are keywords in camouflage. They are cheap to create but costly to use.

By whom is annotation intended to be understood? Consistency can be achieved in a small community with agreed terms, but not in general, ie. it depends on the use case. Catherine Marshall has done a review of annotation. It is like annotations in the margin of a book. It is either part of the learning process for now, or explicit annotation for future use. So tagging is not new. It is 'language in use'. It is informal, transient, intended for a limited audience, and implicit.

The aim in a tagging system is to provide server-side search and retrieval. It needs to improve signal to noise so is very expensive, and it may be lossy if there are many tags. It is also a linguistic problem. Some investigations may require data markup. Possibly a simple tag corpus could be analysed by parts of speech, then annotated with DC entities before final analysis.

Cultural Heritage Social Networking. Stu Weibel

This is nothing new - social networking is what people do. But it is now wrapped in a technical envelope. Serendipity may be enabled, eg from 'twittering', posting URLs of what you're looking at. It can surface relationships we don't know we share. Librarians tend to be slow and careful. They offer fixity, being guardians of culture. Social networking is combining this with fluidity: navigating through persistent resources in a fluid way. It can connect people in ways that helps embed their activities in their culture.

Discussion: education.edu (edna) does some prompting of the user when tagging, eg. within the education area to match the appropriate sector (maybe school or higher eduation). Formal classification systems warrant that the terms that get in are used. There are fixity and dynamic parts to folksonomy.

Kinds of Tags Project. Ana Alice Baptista (by video)

(Ana Alice was also listening in to the workshop from Portugal via skype and availabe for questions.)

This project began from a message on the dc-social-tagging community list. It is a community project - newcomers are welcome. Project conditions: bottom-up, born inside the community; completely internet based; communication between project members largely asynchronous; no financial support.

Project objective: How easily can tags be normalised for interoperability using a standard such as DC? They took 50 scholarly documents in both connotea and delicious, each with at least 5 users. The information was stored in a set of spreadsheets. This resulted in a dataset of 4964 tags after repetitions were removed. These were assigned manually to DC elements. For some this was not possible, eg action towards the resource, ' to be used in', rate, depth. Some had multiple alternative elements. Tags with more than one value were merged with hyphens (eg. publication + subject). Users also describe their relationship with a resource.

This is still work-in-progress. Emma Tonkin is leading new developments.

Some Thoughts on the Face Tag Approach. Andrea Asmni (A paper. Presented by Liddy with PowerPoints derived from Andrea's paper)

This is a structured, top-down approach with a faceted engine in the back end. This paper discusses tag hierarchies, facets, folksonomy and linguistic issues, and visibility issues (do tag clouds make it easier to find things?). Research evaluating facets and user interfaces hints that users don't actually want to use this approach.

Siderean provided automatic facet analysis for a long time, but this is no longer available.

DC Community: Can we help in organising tags?. Pete Johnston

Pete is a big 'delicious' user and has looked into structured tagging, both geo-tagging and dc-tagging. Tagging indicates a relationship between a subject resource and a tag (and an individual). A tag may be an indicator of: aboutness; provenance / publisher; creator; genre; status (eg. 'to read', 'possible blog post'). How does one distinguish between tags that are 'about' the person and those that are simply 'by' the person?

A tag is a multi-part entity. There are community conventions of structured tags. Examples: key-value pair; triple tagging (prefix, key, value); RDF subject URI; predicate / property; property URI literal.

Geo-taggers use latitude and longitude. This is widely used in Flickr and Flickr now includes tools to support it.

Pete suggests using dctagging as analagous to geotagging. Eg. 'dc:creator="T. Berners Lee"'. Conformance to DCAM? `dctagged' is an implicit namespace, prividing triple tags. Proposal: A tag is a DCAM statement: prefix/key=URI. The delicious API has no built in support for this structure. Flickr has support for 'machine tags'. This namespace convention is borrowed from xmlns. Flickr API supports queries in structured tags.

Is there a reason to use this structured tagging? Tags are generally regarded as personal information. There can be different tagging by the same person in different situations.

Closing Session

Conference statistics: There were 10 full papers in plenary sessions. 6 project report papers were in parallel sessions. There were 190 participants (though not everyone was there for all days) representing 33 countries. The largest contingent was from Singapore (80-90).

The conference mixed theory and practice, original research and project reports.

Seminars

Introduction to the Semantic Web. Ivan Herman

This seminar was an excellent overview of the semantic web, covering: data integration, relations and graphs; data queries; RDF and RDF schemas; SKOS; SPARQL; GRDDL; the Open Data Community Project; Ontologies and OWL.

Metadata That Works: Making Good Decisions. Diane Hillmann and Sarah Pulis

Diane presented her usual sensible ideas about metadata and its creation using encoding schemes (controlled vocabularies), and conformance to the DCAM. Sarah followed this with an overview of various standard ways in which metadata is distributed.

Socialising and Sight-Seeing

As usual there were lots of opportunities for social interaction. The conference opened with a reception held in the `Possibility and Imagination' Rooms of the Singapore National Library. Although on the fifth floor this was partly outside amongst tropical greenery. The food served was samples of dishes of the various ethnic communities in Singapore. On the final Saturday I attended the DCMI Advisory Board meeting on the fourteenth floor of the library, from where the view was stunning.

[View fron National Library photo]

The conference dinner was in a restaurant in one of the historical houses of Singapore, the house of one of the Sultan's ministers after the British settlement. The food was in the local peranakan style and we were entertained by traditional dancing.

[Singapore Botanic Gardens Orchid Garden photo]

Of course I visited various sights in Singapore in the limited free time I had available. I think my highlight was the Orchid Garden within the Botanic Gardens. An introduction to Chinese Opera was interesting. We were made very welcome in Singapore, all those who assisted with the conference, including hotel staff, being very friendly, polite and helpful. The variety of food provided during the conference, even during the breaks, was amazing.

[Singapore Chinese Opera photo]


9 October 2007

Creative Commons License This work is licensed under a Creative Commons Licence: Attribution Required; Non-Commercial; Share-Alike.

[Go to Electronic Publishing at Mimas]Electronic Publishing          [Go to Mimas home page]Home Page          [Valid XHTML 1.0!]