ABC

A Logical Model for Metadata Harmonization

Dan Brickley - ILRT Bristol, daniel.brickley@bristol.ac.uk
Jane Hunter - DSTC Brisbane, jane@dstc.edu.au
Carl Lagoze - CS Cornell, lagoze@cs.cornell.edu

Status of this document

This is a strawman document to initiate discussion of a common conceptual model to facilitate interoperability among application metadata vocabularies. The ABC document is a result of the JISC/NSF/DSTC sponsored Harmony project and is not an official working document of the Dublin Core, INDECS or any other metadata initiative. It does however attempt to draw on the work of these groups and formalize a variety of mechanisms to support interoperability. The modeling methodology draws on concepts from the Resource Description Framework (RDF) of the W3C, but should also be applicable in non-RDF contexts.

CARL -- can you avoid using MS Office tools on this doc? The HTML was pretty badly mangled and hard to edit here. Hopefully we can keep it simple enough for text editing to be OK...? Sorry! --dan


Contents

(to be hyperlinked)

Overview

The Warwick Framework [Lagoze, 1996 #34] describes the concept of modular metadata - individual metadata packages created and maintained by separate communities of expertise. A fundamental motivation for this modularity is to scope individual metadata efforts and encourage them to avoid attempts at developing a universal ontology. Instead, individual metadata efforts should concentrate on classifying and expressing semantics tailored toward distinct functional and community needs. Warwick Framework like modularity underlies the design of the W3Cís RDF, which is a modeling framework for the integration of diverse application and community specific metadata vocabularies.

While the Warwick Framework proposes modularization as fundamental to a workable metadata strategy, it recognizes a number of challenges in the implementation of such. An outstanding one is the interoperability of multiple metadata packages that may be associated with and among resources. These packages are by nature not semantically distinct, but overlap and relate to each other in numerous ways. Achieving interoperability among these packages via one-to-one crosswalks (e.g., [, 1997 #107]) is not a scalable solution.

In fact, many entities and relationships - for example, people, places, creations, organizations, events, and the like - are so frequently encountered as to not fall clearly into the domain of any particular metadata vocabulary. ABC is an attempt to formalise these underlying common entities and relationships and to describe them (and their inter-relationships) in a simple logical model. The ABC logical model has a simple mapping to RDF, but neither restricts itself solely to mechanisms built into the RDF core nor assumes an RDF-centric implementation environment.

The concepts and inter-relationships modeled in ABC could be used in a number of ways. In particular:

1. Individual metadata communities could use these underlying concepts (the ABC vocabulary) to guide the development of community-specific vocabularies. These individual communities could use formalisms such as RDF to express the possibly complex relationships between the ABC model and their community-specific vocabularies.

2. The formal expression of the relationships between community-specific vocabularies and the ABC model could form the basis for a more scalable approach to interoperability among multiple metadata sets. Rather than one-to-one mappings among metadata vocabulary semantics, some degree of interoperability could be achieved by mapping through the ABC model.

Scope

In order to scope the ABC effort itself, it is important to state what ABC is not:

The remainder of this document defines some of the underlying assumptions of the ABC model and gives some sample scenarios of the types of entities and relationships that ABC will model.

The ABC Model

There are two equally important components of the ABC model: an architecture and a vocabulary. The architectural overview presented below introduces the descriptive machinery used in the ABC model; the initial vocabulary sketched below makes use of this architectural machinery to characterise a set of abstract base classes for interoperable resource description. Both the architectural and vocabulary aspects of ABC reflect strongly a set of guiding principles. These are outlined below.

ABC Principles

The following principles have guided the design of ABC.

Unique Identification

Successful (meta)data management and modelling requires a framework for avoiding ambiguity in data interchange. In ABC, ambiguity is avoided through an emphasis on the central role played by unique identifiers. ABC uses unique identifiers to specify the elements of metadata vocabularies (attributes, categories, fields etc.), the relationships that hold between those elements, and for the real-world and digital resources described by those elements.

Multiple Views; Multiple Representations

ABC acknowledges that the same item of information may be represented in a number of different ways depending on the application context. A design goal for ABC is to facilitate the transformation of data structures between simple and complex representational models. ABC itself does not advocate 'complex' or 'simple' representations for application-level deployment. Instead, the ABC architecture provides a mechanism for making explicit the rich data structures often implicit in so-called simple metadata records. Conversely, these mechanisms can be use to generate simpler ('flattened', optimised) views of more complex (verbose) data structures.

Logical Models are Intuitive Models

ABC is based around a simple logical core designed to provide a bridge between machine processable data structures and intuitive, understandable conceptual models. As such, the working hypothesis for ABC's development has been that a few logically-oriented modeling principles (such as specialization, partial understanding, multiple views) provide a system within which independently managed metadata vocabularies can build upon a shared understanding of their common semantics. The use of a logically organised, hierarchical architecture allows for the articulation of simple rules to guide modelling and other applications.

Partial Understanding, Dumbing Down and Specialization

ABC is intended to provide 'semantic glue' for interrelating application-specific metadata sets. One well understood technique for aiding data interchange in a heterogenous environment is the use of a 'dumbing down' or 'partial understanding' mechanism. When encountering some previously unknown construct or object, we should be able to mechanically acquire at least some partial understanding through knowledge of the broad 'type' or 'class' of things that it falls into.

By organising objects into hierarchies of such types, it becomes possible for interoperability to occur through partial understanding of common base classes. The ABC architecture provides for partial understanding by emphasising the importance of 'specialization' relationships between metadata vocabulary items; the ABC vocabulary presents some common base classes that can be used as the foundation for more specialised metadata vocabularies.

ABC Architecture

The ABC architecture is simply the conceptual and logical framework within which we set out to describe the ABC vocabulary and its relationship to other metadata vocabularies. Architecturally, ABC borrows heavily from the RDF approach to resource description.

While it is important to avoid entangling the creation of conceptual models (such as the ABC vocabulary) in implementation details such as syntax, it is also important to provide some basic infrastructure and terminology. The modeling primitives of RDF provide an adequate, although not necessarily complete, foundation for specifying the ABC architectural model. The RDF-inspired architectural model adopted for ABC is described in its entirety by this document; detailed knowledge of RDF is not needed to understand ABC.

Before introducing the architectural facilities peculiar to ABC we provide an overview of ABC's broader architectural assumptions. It is these assumptions that provide ABC's connection to the RDF data model and type system, although the current document does not spell out the mapping to RDF in detail.

Basic Assumptions

A handful of underlying assumptions, informed by the ABC principles already listed, provide the basis for the ABC approach to metadata vocabulary description and harmonization.

ABC inhabits a universe of uniquely identifiable resources (or entities).

Everything in ABC is a modeled as a 'resource', including real world entities, intanglible entities such as relationship types as well as categories (classes) of resource.

ABC assumes a single, hierarchically devolved model for the management of resource identifier namespaces. For this ABC adopts the URI system. The URI specification [ref] provides a simple 'umbrella' or wrapper that can encompass any well-defined identification system within the global URI namespace. Examples of identification schemes that have representations as URIs include ISBNs, ISSNs, URNs, URLs, DOIs, Handles, phone numbers, UUIDs and PURLs.

ABC itself needs unique identifiers to accurately name vocabulary elements, both within ABC and defined elsewhere. The URI approach fits with ABC's goals as it is decentralised and generalist in nature; a number of approaches to resource identification are possible within the encompassing framework provided by the URI specification.

A subset of the universe of resources are properties (or attributes). A property expresses a relationship between two resources. Properties, or relationships, take many forms such as containment (e.g., hasPart), derivation (e.g., hasTranslation), and attributes shared by many metadata vocabularies (e.g., author, subject, etc.).

The different types of relationships that hold between resources (hasPart, hasBodyPart etc) can themselves stand in relationships such as specialization.

This flavour of Entity/Relationship modeling provides us with a simple "directed labelled graph" system that is both expressive and reasonably intuitive when represented in diagram form. ABC suggests a number of conventions for representing meta-information in this system, and for converting between complex and simple modeling styles.

By representing properties as just another kind of resource, we can re-apply our modelling approach to describe metadata vocabularies themselves. For example, the relationship hasBodyPart is a specialization of the relationship hasPart.. Since these are simply two more identifiable resources, we can easily model the relationship that holds between them: hasPart is a 'super property' of hasBodyPart. The occurance of the latter (specialized) property tells us everything that the former tells us, and more. Structures such as these provide the basis for 'dumbing down' mechanisms to support partial understanding across metadata applications.

Resources fall into a variety of classes or categories; these can be thought of as sets of resources. Some of these sets are super-sets of other more specialised grouping of resources (eg. Songs versus Musical Works: all Songs are Musical Works). The preferred term for these sets of resources is 'class'; when a resource is a member of a class, we represent that fact with a 'type' property relating the resource to the class.

By attending to the grouping of resources into sets ('classes'), and by giving unique names to those sets, we have a simple foundation upon which to build richer structures. By giving a name to the class-specialization relationship ('super class') we can provide explicit information about hierarchies of resource types.

ABC builds upon these specialization relationships between classes of resource, and between types of property (or relationships). The ABC vocabulary described in this document consists of a hierarchy of entity class definitions and relationship types.

ABC Rules

ABC encourages a logically oriented approach to metadata vocabulary description. Wherever it is possible to state explicitly some fact about appropriate vocabulary usage (eg. that one class is a superset of another), ABC aims to do so. This document makes no assumptions about expected implimentation environments, and does not presume that logic-based software systems will be used. The rules provided here are intended primarily as a guide for vocabulary mapping applications.

Class hierarchy rules

By using the notion of resources belonging to classes, and classes forming hierarchies arranged by super-class relationships between classes, it is possible to write down some simple guiding rules to express claims such as 'all Songs are MusicalWorks': .

If a resource is a member of a class, then it is implied that the resource is a member of all super-classes of that class.

Example: if we consider the class AudioRecording to be a specialization of the broader class Manifestation, which in turn has a super-class Creation, then we know that any resources that are AudioRecordings are also Manifestions and Creations.

Diagram:

[recording_321] --type--> [AudioRecording] --superClass--> [Manifestation] --superClass--> [Creation]
(should show an 'implied property' type arc connecting the recording to each super-class)
Property hierarchy rules

Similarly, we can exploit knowledge of hierarchies of property (relation) types:

If two resources are related by some property, then it is implied that they should also be considered to be related by all super-properties of that property

Example: consider a case where a property such as composer has been used to relate a creative work to some agency (most likely a person) that composed it. If we know that the property composer has a super-property contributor then we implicitly know that the contributor relationship should also hold between the work and the agent. Phrased another way: if we know someone composed a work, we also know they contributed to it.

Diagram:
If we know that...
[composer] -- superProperty--> [contributor] 
[recording_321] --composer--> [person_433]

It is implied also that:

[recording_321] --contributor--> [person_433]

Other rules

A basic level of cross-vocabulary interoperability can be facilitated using simple class and property hierarchies. The ABC vocabulary is targetted at supporting such applications. By providing an approach and a basic but useful core vocabulary, ABC encourages other vocabularies to express mappings to one another and to ABC using a common approach.

In support of the 'multiple views' philosophy behind ABC, it is useful to explore a generalisation of our simple rule mechanism to support representations of more complex vocabulary mappings. The current document does not provide a formal language for the specification of mapping rules; instead, simple prose rules are used, couched in terms of a template rule structure.

Generic ABC Rule Template:

IF we known [some state of affairs] THEN we can conclude [some more information]

The application of this template should become clearer as the ABC vocabulary is itself introduced. A brief example is presented here to give a sense of the possibilities. The ABC Rule template has placeholders for fairly arbitrary statements of affairs expressible within ABC's RDF-like modelling style, but represented in prose.

Example 1: Complex ABC Rules
If we know [that some Event has an input document and an output document, and we know that the event was of type TranslationEvent] then we can conclude that [the input document to that event stands in a hasTranslation relationship to the output document.]

Conversely, we might say the opposite:

If we know [that some pair of resources are connected by the hasTranslation relation] then we can conclude that [there exists some event which had those two resources as input and output documents]

This rule-based approach to vocabulary description is designed to minimise ambiguity and to make explicit facts which are easily left unarticulated. Although ABC does not provide a machine-processable rules language, the use of prose in this vein (in the context of the actual ABC vocabulary) can provide for very precise specifications. Note that the rule above would hold even if the event were actually some particular specialized type of TranslationEvent defined in another metadata community, since the partial understanding mechanism provided by the class hierarchy ensures that the rule applies to all resources of type TranslationEvent.

This example is presented as context for a discussion of the two remaining architectural components of ABC, the Event Model and the Multiple Views strategy.

Multiple Views and the Event Model

ABC adopts a particular philosophy for metadata modeling. Crudely put, ABC's slogan is: "if it's worth talking about, make it a first-class object with an identifier so it can be described in more detail". This is a strategy for dealing with information complexity. If we want (for example) to describe the colour of the building that some creative work was conceived in, ABC (by adopting RDF's graph data model) allows us to express this. We create a model of this situation in which the building, the creative work, and the event by which it were conceived are all represented as 'resources'. Sometimes a complicated, explicit model is useful for applications; other times it is better to have a simple, flattened representation of the 'real' state of affairs. In both cases it is useful to understand how the two representations inter-relate.

The example which introduces ABC's notion of rules ([eg.1]) illustrates this point. We can either take a simple view and say just that some document has a translation into another document. Or else we can take a complex view and describe the full details of the event which transformed the one document into the other.

The notion of rules expressed over the ABC modelling formalism gives us some conceptual machinery for thinking and talking about how these two representations relate to one another. In addition to the rule mechanism we need a common understanding of how best to concoct interoperable complex representational models.

Complexification Strategy: the Event Model

It is inadequate to simply assert that complex and simple views of metadata models should interoperate by describing rules for their mappings. Without common conventions or patterns for dealing with this complexity, the number of rules required to map between vocabularies would be vast. The ABC approach is to build upon a handful of base classes, specializing them only when necessary, and to adopt a simple event-centric model for complex representational challenges.

Just as it makes sense to promote various entities to 'first-class describable status' in our models (eg. agents, places, creative works), ABC builds upon the observation that it is often valuable to create a model of the event or process by which certain states of affairs came about. For example, if one document is a translation of another, there must have been a 'translation event' at some point. If various agencies were involved in the creation of some musical work, we can posit a 'creation event' that can be used to organise our model of the contributions those various agencies made to the creation of that work.

This gives rise to a further working assumption of ABC:

For any state of affairs we wish to describe using metadata, we can posit an Event (of some particular type) by which it came about, and use that event as a focus for descriptive metadata about the agencies and other resources which were involved in that event and which brought about the state of affairs we are trying to describe.

This policy of creating describable resources extends beyond events to provide a general style for ABC modeling - for example, if we want to say something about the contribution a particular agent made to some event, we create a resource representing that event and describe its properties and relationships to other resources.

It is at this point that the ABC Architecture blends into the actual ABC vocabulary...

ABC Vocabulary

A goal of ABC is to define and declare the core set of resource categories that are common across metadata communities. Specifically, ABC treats persons, organizations, and agencies as Ďfirst class entitiesí. Furthermore, in the manner of the IFLA data model [, 1998 #52], ABC draws an explicit distinction between intellectual works and their various manifestations.

As a means of expressing the relationships between works, manifestations, and derivations, ABC pays special attention to the notion of an event.Events are the linkage to the transformation of one resource to another; for example, a translation, summarization, or extraction. Events provide the locus to attach such properties as agency (who or what caused the event - e.g., the translator), date and time, and location (where the event occurred).

The ABC vocabulary consists of a number of different basic entities or concepts; these are ABC's abstract base classes. The base ABC classes include: Resources, Creations, Events, Agents and RelationTypes. The distinction between resources, creations, agents, events, and relation-types is essential to provide the proper attachment points for different properties (or metadata) that are associated with information content and its lifecycle.

Informally, the different attachment points are:

Included below are some initial definitions of these entities and properties to promote further discussion. RDF-like graphs are used to express these definitions.

Resources

Anything that can (in principle) be uniquely identified is considered to be a resource.

ABC is particularly concerned with various sub-classes of resource; 'Resource' itself as a class is of limited interest. Figure 1 shows some resource (an information resource, which we might model as a Creation of some sort) and a number of properties characteristic of it:

[fig 1.]

Events

ABC's event model was introduced informally in the earlier discussion; ABC models often use a representation of an Event to hang together a complex web of information which cannot easily be expressed as (for example) simple properties of document-like objects.

Events:

[ CARL: can you draw the diagram I dictated at this point? --dan ]

The interaction between events and resources can informally described by dividing events into two sub-types:

A resource has a single original creation event but subsequently undergo multiple non-creation events.

This event chaining, expressed as a graph, is conceptually similar to the relationship between works, expressions, manifestations, and items as stated in the IFLA data model [, 1998 #52].

Events and Relations

Certain types of events (such as those which have both input and output resources or which describe an agent's contribution to a resource) provide information that can be expressed as a simpler binary relation between resources. In such cases, ABC provides two representational options and recipies for inter-conversion. When rich information is required, ABC uses the event model. When consise/simple metadata is needed, flatter relations are used.

There are two example scenarios which illustrate this point.

1. Agent Roles.

If an abc:Event has an abc:contribution from some agency (where the agency is identified by an abc:agent property of the contribution), and if that contribution has an abc:role of X, then we can conclude that any abc:output resources associated with the event have a property of type X pointing to the contributing agent.

2. Transformation Types

It is common to use simple binary relations to express directly facts such as 'TranslationOf' that connect different works, expressions and manifestations. These also often have an implicit EventType, eg. TranslationEvent. A great many relationships can be considered to have 'chararacterising events' through which they come about. For example, one resource may be a 'version of' another. This suggests a modification event whose input was the first resource and whose output was the second. The 'event view' of such relations will have an associated Agent, Date and Place. Examples of event/relation pairings include...

Aside: It might be possible to model this as an 'abc:typicalEvent' property of these relations. An ABC Rule might (possibly) be expressible telling us that

If we see some binary relation connecting resources X and Y, and the relation that connects them has an 'abc:typicalEvent' pointing to some particular class of 'abc:Event', then we can conclude that there exists an event of that type which has amongst its abc:input resources X, and amongst its abc:output resources the resource Y.


DAN STOPPED HERE!

Sketch of abc classes (dan)

Haven't had time to flesh these out into prose or diags yet

indentation = class hierarchy; [brackets] = instances

rdf:Resource
  abc:Creation
    abc:Work
    abc:Expression
    abc:Manifestation
     abc:Item

  abc:Action

  abc:Event
    abc:CreationEvent
    abc:TransformationEvent (etc)

  rdf:Property
     abc:Property // these are the relation types defined by ABC
      [abc:agent], [abc:contribution], [abc:role], [abc:input],[abc:output],[abc:date]
      [abc:typicalEvent]
       abc:AgentRoleProperty // these are relations that specify the contributory role of an agent
          [xyz:Painter], [dc:Creator],[marcrel:Composer] etc etc -- other instances of AgentRoleProperty

       
  abc:Agent
    abc:Person
    abc:Organization
    abc:Intstrument





Some events are terminal, in the sense that they donít produce a persistent result (something that can be acted on by another event).Examples of these ephemeral disseminations might be a non-recorded performance or one-time display onto a computer screen.

 

CARL STOPPED PLAYYING HERE.I DONĒT LIKE THE PREVIOUS GRAPH BUT WILL FIX


Agents

Agents represent the people, organisations or instruments (mechanism) which play a role in the event.

As proposed in the INDECS model, we assume that there are three possible agent types:

As well as the type attribute, agents also have a role attribute associated with them e.g. author, composer, publisher, sound engineer. The possible role values are dependent on the event with which the agent is associated.

A vocabulary scheme for agent roles may be defined based on the DC Creator, Contributor and Publisher elements, INDECS agent roles and IFLA agent roles.


Relations

Relations are properties which relate to one resource to another resource (or a part of a resource). Relations between resources can be categorized into three types:

Reference and Context-based relations are simple relationships which are neither event-based nor structurally-based. These relationships are assumed to have been established at the time of creation e.g. References/isReferencedBy, RelatedBySubject.

Structural relations are containment relationships which describe items within a collection or the sub-parts of a larger container or composite resource. For example structural relations can be used to specify the regions within an image, the scenes within a video sequence, the chapters in a book or the tracks on a CD. They are not event-based and hence do not have associated agent, place or time properties. But they may have spatial, temporal or spatio-temporal attributes which define the location of the parts relative to the whole.

Event-based relations are relations which are instantiated as a result of an event i.e. these relationships may have agent, date, and place attributes associated with them - alternatively these relationships could be associated with an event (see Events above). Examples include: