[Mimas logo]"epub@mimas" Report

Schemas and Ontologies: Building a Semantic Infrastructure for the GRID and Digital Libraries

e-Science Institute, Edinburgh, 16 May 2003, organised by Elizabeth Lyon, UKOLN.
Report by: Ann Apps

The purpose of the workshop was to bring together GRID and digital library implementers to consider approaches to developing, expressing and sharing schemas and ontologies. It was an interesting, if rather intensive and brain-taxing, day.

Participants were welcomed and the day was chaired by Dr Liz Lyon, UKOLN.

Building a Semantic Infrastructure

Prof. David de Roure, University of Southampton, introduced the concept of the `semantic grid'. Science by its nature is collaborative and the Grid problem is one of resource sharing. Open Grid Services Architecture (OGSA) is a service-oriented approach. A new project should be able to reuse data, software components, knowledge, etc. from previous ones, ie. semantic interoperability is needed. This requirement also applies in digital libraries, eg. for e-learning objects.

The web has developed from the classical web to the semantic web. In the classical layer diagram of the web the next layer above RDF metadata description language is `Ontology vocabularies'. This is now being provided by OWL web ontology language, developed from DAML+OIL. RDFS (RDF schema) provides an extension to RDF for vocabularies and ontology efforts.

OWL is an ontology language, not a knowledge representation language nor a description logic. OWL Lite is the thesaurus level. OWL DL is a well-behaved version that provides maximum expressiveness but still keeping computational completeness. OWL Full has no guarantee of being computational. Software tools are needed and there are challenges of doing this on the web.

The semantic web requires a metadata-enabled Web, but how will the metadata be created? It needs ontologies, but where will these come from? What will motivate the generation of ontologies and metadata, which will add value for the future? An example of a start on this is Hyphen which scrapes information about publications from UK computer science web sites to create `linked knowledge' in an RDF-triple store.

Taking semantic web ideas into the GRID suggests a `semantic grid'. Currently the GRID is metadata-based middleware but it is not interoperable. A semantic grid research group is investigating various projects. The myGrid project is developing ontologies for service description.

Diagrammatically, the semantic grid completes the matrix by adding the top right hand quarter:

richer^semantic websemantic grid
seman|classical webclassical grid

The semantic grid is about accelerating the overall scientific process.

Why Ontologies?

Dr. Jeremy Rogers, University of Manchester, a practising clinician as well as a member of the Medical Informatics research group, described the practical development of a medical ontology for use in diagnosis. This was developed using text extraction of clinicians' notes in a repository of cancer patients. It is difficult to measure descriptive as opposed to numerical data. The analysis or aggregation of descriptions is a new science.

Many dictionaries of medical terminology have been developed over time in various areas of medicine. Many are based on codes. All have become over-complicated causing scalability problems. This has made them difficult to use, the problem being how to find an appropriate term or description form a list. If an terminology is too big, lists are too long to browse, though applications can help by showing the most used terms at the top; if it is too small there is not enough clinical detail. There is also a problem of cross-mapping between terminologies - how to determine that items from two dictionaries are actually the same. Computers may not be able to help because data needs human interpretation and may be paper-based.

An ontology can help. A requirement is `conceptual Lego', ie. built from base items. An example was given as to how to describe accidents involving a bicycle where items would be: things hit; roles for injured; activities when injured; contexts. The problem then becomes how to classify things. The CLEF ontological lookup system is based on repeatable rules for comparing implemented in a software reasoner but presented in a user-friendly package.

However, there were some caveats. Ontology logic is required but not sufficient. There can be confusion with words that have several meanings, negated concepts and ambiguities. Use of the system also depends on its users.

Publishing and Sharing Schemas Overview

Rachel Heery, UKOLN, described some of the work UKOLN have been involved in with schema registries and application profiles to enable discovery and reuse of metadata.

Schema registries encourage `declaring and sharing' of metadata elements, element sets and application profiles. They may be used by both software (metadata editors, validators, converters) or humans (developers, implementers, cataloguers). There is a question of how far the data model can serve both without distorting the metadata schema.

An ontology is a vocabulary or terminology used to distinguish members of an element set. It may be a thesaurus or a linked ontology. Examples of several ontology servers were given: a thin registry that don't index or link terms (DAML Ontology library); a browser for knowledge bases (Ontosaurus); descriptions and links (SHOE); links to a number of ontologies (WebODE); a tool for collaborative browsing, creation and editing of ontologies (WebONTO); and ontology repository (Stanford Knowledge Systems Laboratory Server).

Metadata and ontology initiatives have much in common, such as a similar interest in developing language to express semantics, and the possibility of shared tools. But whereas metadata schemas focus on definitions, ontologies focus on relationships between terms. Metadata schemas focus on metadata instances describing resources, ontologies on delineation of knowledge space. Metadata provides the `attribute space' whereas ontologies such as classification schemes, controlled vocabularies and taxonomies provide the `value space'.

Case Study - MEG registry tool

Pete Johnston, UKOLN, gave a quick demonstration of the schema registry developed for MEG (UK Metadata in Education Group). This tool can be used for specifying an application profile by re-using existing metadata elements, possibly with additional restrictions such as `obligation', where suitable ones are already defined. The application profile authoring tool tries to isolate the author from technical details and tries to encourage re-use. It is based on Dublin Core grammatical principles with application profiles to localise metadata schemas. So it is a simplification of real world complexity in metadata schemas.

Implementing Ontologies in (my)Grid Environments

Prof. Carole Goble, University of Manchester.

The Grid is metadata driven middleware in which schemas and ontologies are prevalent and pervasive for carrying semantics. Interaction with Grid middleware services empowers a user or process to discover and orchestrate Grid enabled resources as required. This means cataloguing and indexing available resources using agreed vocabularies, as in digital libraries. Communication of information between sets of Grid services requires the adoption of standard schemas and semantics for data interchanges or a mechanism to map between schemas.

myGrid is open source upper middleware for Bioinformatics. It provides to researchers `knowledge working' within the global knowledge pool. They can create personal workflows that allow management of the data environment when run. These workflows can be discovered and edited. Within these processes the provenance of data and the sequence of queries is recorded so that the process can be reused and repeated.

It is a service-based architecture. Bio resources (database, analysis, person, workflow, etc) and architectural components (workflow enactment engine, event notification service, registry, scheduler, etc) are services. Services come and go, are not owned by the user, and have different levels and kinds of metadata. Realising a service-oriented architecture needs agreed metadata and shared schemas and vocabularies.

A service ontology has various roles: discovery of a service within a registry; invocation by an agent/service; interoperability is increased by describing semantic type of inputs and outputs; composition of new services; verification of a service's properties; execution monitoring.

The services of myGrid are published as web services in a public registry. A service registry should be able to give different, personal, views. A user's view is different from a service provider's. Workflows are collections of services. Both workflows and service need to be able to accept third party annotation descriptions. Service discovery involves: location; quality; how to run it; domain; what does it do. There are classes of service and instances (specific examples) of services. There can be instances of the same service in different places.

Service ontologies are based on DAML+OIL/OWL and based on the DAML-S ontology. Services are classified rather than indexed. The classification scheme is organised according to `reasoning' description logic (beyond scope of this talk). The upper ontology is defined (rather than the lower grounding).

Service instances are discovered based on their operational properties. This includes administrative and provenance metadata. This is done by an extended UDDI implementation in RDF. This is equivalent to the `attribute space' in Rachel's talk. But UDDI seems to be `past it'. It originates in e-business and does not provide the dynamic functionality needed for the Grid. It isn't possible to interrogate a UDDI registry dynamically and make real-time contracts to use a service.

While a service is running state metadata is used. In myGrid service discovery is implemented using Pedro - a free XML schema rapid development tool - that is populated with ontologies. State metadata includes unique identifiers from the application domain. There are 5 types of system operating at once. Service discovery is performed while running a workflow. Thus metadata descriptions are used dynamically - a better service may have been introduced into the Grid while a workflow was running.

An example of using ontologies. A CFD workflow editor uses an ontology to build a simulation. Then the script is annotated using the ontology to capture knowledge from the results. Ontologies are also used for interoperability between different types of broker.


Knowledge Organisation Systems

Dr. Doug Tudhope, University of Glamorgan, talked about Knowledge Organisation Systems (KOS), reviewed some current digital library work on KOS, and discussed research issues drawing connections with Ontologies and the Semantic Grid.

KOS may be: term lists (eg. authority files, dictionaries); classification and categorisation (eg. subject headings, Dewey); relationship systems (thesauri, semantic networks, ontologies). Thesauri have 3 standard relationships between concepts (equivalence, hierarchical, associative), a domain vocabulary, and concept definitions and scope notes. Ontologies are a higher level conceptualisation with a formal definition of relationships, inference rules and definition of roles. KOS includes ontologies and schemas. A classification is like a thesaurus but without the associated relationships (like the difference between XML and RDF).

KOS can be represented in RDF/XML. There are various initiatives to build KOS registries and cross- mappings. A possible KOS-based terminology service for JISC IE will help in matching of user queries. This will be separate from the collections and associated with business models.

Facet analysis can be used to enrich and formalise KOS. There is an ESRC funded facet project using the AAT thesaurus.

There is a need to develop a more formal way to represent an ontology - a syntax using computational linguistic techniques. There is a need to consider the whole KOS lifecycle of indexing and classification including cost benefits and user interface.

Breakout Sessions

There were 3:

1. What are the barriers to sharing ontologies (cross domain, cross sector)

2. Software tools and shared services - how can existing tools and infrastructure be improved?

I attended this session. Unfortunately no-one in the group had any experience of using ontology tools so it wasn't very productive!

3. The process of building a community-led ontology - how can we maximise usage and relevance?

Summary and other points

It was noted that there was no representative from the knowledge management / representation community at the meeting. They have already addressed some of the problems mentioned.

There will be a future joint eScience / digital library meeting. Part of the content may be text mining.

29 May 2003

[Go to Electronic Publishing at Mimas]Electronic Publishing          [Go to Mimas home page]Home Page          [Valid XHTML 1.0!]