Libby Miller <firstname.lastname@example.org>, Greg FitzPatrick <email@example.com> Dan Brickley <firstname.lastname@example.org>
Latest version: http://ilrt.org/discovery/2002/03/skical-daml/
Libby Miller is part-funded by the Harmony project.
DAML+oil is a language for describing ontologies, building on RDF Schema and XML Schema. It can be used to describe types of objects and the kinds of relationships expected between them. It uses references to XML Schema datatypes to describe integers, dates and other datatypes.
This paper is the result of an attempt to describe the SkiCal and iCalendar data formats using DAML+oil. The iCalendar mime-directory standard is a format for describing events, to-do lists and journals, including associated date-formats, time zones, and alarms. It is used in the calendaring and scheduling applications of most major desktop and PDA personal information managers. SkiCal is an extension to iCalendar which describes public events such as concerts, sports competitions and conferences. Both formats are established, document-based formats for describing objects including times, dates, people, events, locations and prices.
We wanted to see whether it was possible to describe SkiCal and iCalendar using DAML+oil, and whether it would be useful to do so. This paper describes some uses of DAML+oil using parts of SkiCal and iCalendar as examples, and examines some of the benefits and problems of using DAML+oil to describe an existing data format or schema.
DAML+oil based on and uses the RDF Model and Syntax [RDFMS], and extends the RDF Schema [RDFS]. It allows you to describe types of objects and the relationships you expect to hold between them. It is a schema language, describing constraints you wish to hold within your instance data.
RDF Schema provides a rather minimalistic basis for RDF vocabulary description, allowing one to describe nodes as being instances of classes which can be hierarchically organised. It also allows the description of properties hierarchically, and enables one to restrict the types of nodes which can be used at either end of arcs. DAML+oil adds to these restrictions, making for a more powerful and descriptive language for describing the way in which objects in the world relate to one another.
DAML+oil represents a large increase in both in complexity and power over RDF Schema. We were interested in seeing whether the increase in power was sufficiently useful to make up for the increase in complexity. We felt that a good way to do this was to try to describe a data format that was in current use, rather than building a DAML+oil ontology from scratch for the purposes of demonstration.
The structure of the paper is as follows. In section 1, we provide high-level description of the RDF information model which underpins DAML+oil, the relationship between RDF and XML, and the purpose of RDF Schema and related technologies. In section 2 we describe the SkiCal and iCalendar data formats. Section 3 describes some of the reasons we wanted to describe iCalendar and SkiCal in RDF, including namespace-mixing and datatyping. Section 4 sets out three practical uses of DAML+oil for iCalendar and SkiCal, and finally, section 5 outlines some difficulties with using DAML+oil for these data formats.
The W3C's Resource Description Framework (RDF) provides a simple graph-based information model for Web applications that need to exchange data in a flexible yet predictable way. Since RDF data is typically encoded and exchanged using XML documents, RDF in effect provides a set of restrictions or design conventions for well-formed XML documents.
XML provides use with the notion of both well-formed and valid documents. A valid document conforms to some specific DTD (or schema), whereas a merely 'well-formed' document is constrained only to have the basic elements, attributes and content structure shared by all XML documents. RDF can be thought of as an attempt to find a middle ground between the strict notion of 'conformance to a named schema' and the much weaker 'unconstrained tag soup'. A well-formed XML document that is written in conformance with the RDF syntax is understood to encode an edge-labelled directed graph, whose nodes and edges may be labelled with Web resource identifiers (URIs).
RDF is particularly suited for the deployment of XML in the World Wide Web, since the distributed nature of the Web often requires us to mix information from multiple sources and applications within a single document. While the notion of 'wellformedeness' and XML Namespaces mechanism provides the basic infrastructure for mixed-namespace XML markup, RDF supplies a much needed set of constraints and conventions for using these. RDF can be thought of as offering 'design patterns' for creators of mixed-namespace documents. By writing mixed-namespace XML in the style specified by the RDF Syntax recommendation, we reduce some of the unpredictability and variation associated with the unconstrained use of well-formed XML.
As a graph structure, RDF is useful even without knowledge of the specific vocabularies used. While these vocabularies (as documented in RDF schemas associated with namespaces) can provide useful meta-information to support more sophisticated applications, RDF was designed to be useful in the absence of schema information.
If we can use RDF query, storage and API without need for RDF schemas, what use is RDF schema? And what use might we make of more sophisticated extensions to RDF schema such as DAML+OIL or W3C's new Web Ontology language?
The RDF information model essentially consists of nodes and arcs (resources and properties), but RDF additionally makes the stronger claim that these graph data structures are not arbitrary computational constructs, but encodings of claims about the world. An RDF document is the kind of thing that (in some context) can be true or false. When dealing with RDF, as a consequence of this, we often deal with two parallel sets of terminology. Considered as a graph data structure, we talk of 'nodes and arcs' (or edges); considered as a representational formalism that makes claims about the world, we talk of 'classes' of 'resource' and their 'properties'. When we hear that RDF is supposed to be 'semantic', or 'meaningful', it is related to this second set of terminology, and to the goal that RDF documents should be considered as 'saying things about the world'.
RDF Schema provides the first steps to modelling these claims about the world in RDF, by allowing the definition of hierarchies of classes of objects and properties, and be enabling you to describe constraints on where properties can be used. DAML+oil provides a larger set of modelling concepts for describing the world, including
The important difference between XML Schema and DAML+oil (which may appear to have overlapping functionality) is that an XML Schema defines a class of XML documents by describing syntactic constraints on those documents, while DAML+oil (like RDF Schema) is data-orientated, describing constraints on objects, not on documents. DAML+oil and RDF Schema are therefore particularly useful for describing relationships between objects which are described over several different documents, including relationships between schemas or ontologies.
The iCalendar mime-directory standard is a format for describing events, to-do lists and journals, including associated date-formats, time zones, and alarms. It is used in most major desktop and PDA Calendaring and Scheduling applications.
Here is an example event described in iCalendar, taken from RFC 2445 [ICAL]
BEGIN:VCALENDAR VERSION:2.0 PRODID:-//hacksw/handcal//NONSGML v1.0//EN BEGIN:VEVENT DTSTART:19970714T170000Z DTEND:19970715T035959Z SUMMARY:Bastille Day Party END:VEVENT END:VCALENDAR
This fragment describes an event - a Bastille Day party - starting at 5pm UTC on 14th July 1997 and ending the following day at 3.59am.
SkiCal is an extension to iCalendar for describing public events, like concerts, sports events and conferences. The core technology of SkiCal is the 'WHA-machine'; the structuring of information based on the six common interrogatives; WHAT, WHEN, WHERE, WHOW, WHY, and WHO.
Here is a SkiCal event taken from SkiCal Internet Draft [SKICAL] (shortened)
BEGIN:VCALENDAR VERSION:2.0 SKICALVER:1.0 PRODID:-//HandGenerated/SkiCal//NONSGML v1.0//EN BEGIN:VEVENT SKUID:email@example.com SUMMARY:Handel's "Messiah" featuring the National Chamber Orchestra TITLE:Messiah DTSTART:19991217T200000 DTEND:19991217T220000 VENUE:Indoors PERSONS;SKiROLE="conductor":Takao Kanayama PERSONS;SKiROLE="orchestra":National Chamber Orchestra PERSONS;SKiROLE="creator":G.F.Handel PRICE;PRXITEM="SFT:Far side";CURRENCY=USD:17 END:VEVENT END:VCALENDAR
This describes an event with an identifier, a title, and a start and an end time, but also information about the price of the event and the people involved and their roles.
Both SkiCal and iCalendar are designed for the machine interchange of event data, including storage and scheduling applications. They are both modelled using objects, properties and datatypes. They use a syntactic structure called mime-directory, as defined in RFC 2425 [MIME], which is also used for VCARD [VCARD]. Ordering of properties and values within the main container constructs is not important. There are some cardinality constraints over certain properties and their values, and in some places, implicit class hierarchies between the objects.
Both iCalendar and SkiCal are good candidates to mark up in RDF: they describe objects, such as times, dates, events, people, places; and their properties, so that the structure is similar to the RDF information model. The major advantage of marking up these formats in RDF is the utility of being able to combine event information with other kinds of information about people, documents, webpages, geographical locations, and so on.
An example of where namespace mixing can be useful is when describing objects representing aspects of people in different circumstances. ICalendar has a business meeting orientation, and is therefore only concerned with people in their role as calendar users, namely as the initiators and attendees for meetings. SkiCal goes further by defining an explicit Persons property, which enables one to talk about the roles the person might take within an event, such as conductor, guide, manager. SkiCal also allows one to refer to a URL where the user can find more information about the person if they wish.
SkiCal and iCalendar data described in RDF could go further than this, enabling information from several namespaces to be combined, instead of constraining a person to be an organiser or providing information at the end of a URL. For example, event information could be combined with a Vcard [VCARD] or 'friend-of-a-friend' [FOAF] vocabulary to provide more detailed contact information for the person who might be an organiser of a meeting or speaker at a conference. Connecting a well-known vocabulary such as Dublin Core [DC] to event data would enable the organiser of a meeting to specify reading material for a meeting.
Connections to other vocabularies could be described by encoding iCalendar and SkiCal data in RDF using RDF Schema, or simply by mixing the namespaces within RDF instance data. However, DAML+oil can be used to describe these connections with more flexibility and also more precision, for example by specifying how many of a certain property may be used with a particular type of object, or by creating constraints on the type of object linked to by a particular property from a certain object see below). The relationships between objects can be very complex and highly structured, and aim to represent (some of) the complexity of relationships in the world, for example, a father is a parent who is also a male person; a manager_678 is a person working for company X with Y years of experience.
Distinguishing between different forms of data - integers, text, date-times and floats is very important for indexing and querying, especially with respect to dates and times. The RDF Core working group is in the process of defining a model and syntax for datatypes in RDF; DAML+oil can be used with XML Schema datatypes.
This section describes some useful aspects of DAML+oil with examples drawn from the iCalendar and SkiCal draft DAML+oil ontologies [ICAL-DAML], [SKICAL-DAML]. SkiCal and iCalendar are lengthy and complex and so here we pick out three of the more interesting and functional uses of DAML+oil for these calendar ontologies.
In RDF, objects may have globally unique identifiers (URIs) or be 'blank nodes'. Blank nodes are nodes for which there is no appropriate or known identifier for the object. This may occur when an object does not naturally fall into the class of things which themselves have identifiers, such as people. When there is no unique identifier for an object it is often helpful to be able to say that if an object has a certain property, then that property and value together uniquely identify the object.
DAML+oil has a way of saying this. If we state that
<daml:ObjectProperty rdf:about="#SKUID"> <rdf:type rdf:resource="http://www.daml.org/2001/03/daml+oil#UnambiguousProperty"/> </daml:ObjectProperty>
Then this means that two objects with the same value of a SKUID property are in fact the same object. For example:
<VEVENT> <SKUID>firstname.lastname@example.org</SKUID> <DESCRIPTION> Handel's "Messiah" featuring the National Chamber Orchestra </DESCRIPTION> </VEVENT> <VEVENT> <SKUID>email@example.com</SKUID> <SUMMARY> The first performance of Handel's "Messiah" by the National Chamber Orchestra for 20 years. </SUMMARY> </VEVENT> <VEVENT> <SKUID>firstname.lastname@example.org</SKUID> <DTSTART>19991217T200000</DSTART> </VEVENT>
all these property-value pairs are properties of the same VEVENT object.
A nice example of this is for people, as described in [SMUSH]. People do not have identifiers like webpages do, but they do have personal email boxes, which at any given time are in the ownership of one person. Email addresses also have the advantage that they are fairly well-known strings. If we designate CAL-ADDRESS in iCalendar as a daml:UnambiguousProperty which points to an email address, then we can say that if we find the following information:
<wn:Person> <CAL-ADDRESS rdf:resource="mailto:email@example.com"q/> <foaf:name>Libby Miller</foaf:name> </wn:Person> <wn:Person> <CAL-ADDRESS rdf:resource="mailto:firstname.lastname@example.org"/> <foaf:name>Elizabeth Miller</foaf:name> </wn:Person>
we can tell that the person with email address email@example.com has two names, Libby Miller and Elizabeth Miller. daml:UnambiguousProperty is much like the subject indicator construct in topic maps [TOPIC], which allows for aggregation of topics.
Where there is no global identifier for an object, the use of daml:UnambiguousProperty creates a key which identifies the object. Often it does not make sense to assign a URI to an event or a person - and in fact would be a modelling error to use the webpages of an event or the email address of a person as a surrogate for such an identifier, but daml:UniqueProperty can be used as such a surrogate. This still allows that the same event could have two identifiers, but it does mean that if we discover two instances of the same value with this property, then they refer to the same subject.
From a practical point of view, a piece of software that has had information about daml:UnambigousProperty programmed into it can use the values of properties defined in this way in a schema as keys into the database for instance data. This makes for effective indexing, storage and retrieval of instance data in a schema- and daml:UnambigousProperty-aware database. In particular, it can aid with query processing for fast retrieval of information. A property which is known to be a daml:UnambiguousProperty can only retrieve one instance of its subject, and can therefore prune the search space substantially.
RDF Schema allows you to combine objects from the same or different namespaces using the subclassOf and subPropertyOf constraints:
<rdf:Property rdf:about="http://example.com/skical/schema#SKUID"> <rdfs:subPropertyOf rdf:resource="http://example.com/ical/schema#UID"> </rdf:Property> <rdfs:Class rdf:about="http://example.com/skical/schema#PERSON"> <rdfs:subClassOf rdf:resource="http://xmlns.com/wn/1.6/Person"/> </rdfs:Class>
and also domain and range constraints on properties:
<rdf:Property rdf:about="http://example.com/skical/schema#COMPONENT"> <rdfs:domain rdf:resource="http://example.com/skical/schema#VCALENDAR"/> <rdfs:resource rdf:resource="http://example.com/skical/schema#CAL-COMPONENT" /> </rdf:Property>
DAML has a number of useful constructs that allow the linking of classes and properties together directly, by stating that they are different names for the same thing. Example of these are daml:samePropertyAs/equivalentTo, daml:inverseOf and daml:sameClassAs.
These can be useful for creating hooks into existing ontologies or schemas without rewriting all the instance data.
Suppose we have three ontologies, one describing organisational structure, one publications and another company meetings. Instead of rewriting all the instance data conforming to these ontologies so that it all uses the same definition of a Person from a single ontology, one could simply define the different definitions of Person as the same class as each other using daml:sameClassAs. Depending on how the data is stored this could be a very useful shortcut.
DAML+oil also allows the expression of domain, range and subclass and subproperty relationships with more subtlety and in more detail than in RDF Schema. In DAML+oil it is possible to say more about the sort of calendar users expected in iCalendar, which might affect the sorts of transactions expected by a scheduling program:
<rdfs:Class rdf:about="http://example.com/ical/schema#CAL-USER"> <rdfs:comment>every calendar user is a person or a robot</rdfs:comment> <daml:disjointUnionOf rdf:parseType="daml:collection"> <daml:Class rdf:about="http://xmlns.com/wn/1.6/Person"/> <daml:Class rdf:about="http://xmlns.com/wn/1.6/Robot"/> </daml:disjointUnionOf> </rdfs:Class>
Another example: instead of saying that the range of a property COMPONENT must _always_ have value CALCOMPONENT, we can state this restriction just for a particular class of objects, e.g. for VCALENDAR objects:
<rdfs:Class rdf:about="#VCALENDAR"> <rdfs:comment>A container for calendar components</rdfs:comment> <rdfs:subClassOf> <daml:Restriction daml:minCardinality="1"> <daml:onProperty rdf:resource="#COMPONENT"/> <daml:toClass rdf:resource="#CALCOMPONENT"/> </daml:Restriction> </rdfs:subClassOf> </rdfs:Class>
- in this case also adding the restriction that there be at least one of the property COMPONENT.
This restriction to a local range means that we can reuse COMPONENT elsewhere with different restrictions: if we liked we could create an ical:COMPONENT for CALCOMPONENTS like VEVENTS themselves, perhaps referring to subevents of a main containing event.
One can take existing ontologies and connect them in minute and precise detail using daml properties such as daml:disjointWith, daml:intersectionOf, daml:unionOf, daml:complementOf. ICalendar and SkiCal are a good example of a pair of ontologies where one depends on terms defined in another. SkiCal TITLE property has a ical TEXT value; SkiCal OPTIME has most of the same properties of ical:VEVENT and so we can define VEVENT and OPTIME as subclasses of a more general class. In principle DAML+oil can make relationships such as these more precise, but in practice, at least in this case finding examples of this is hard. It is relatively straightforward to say that person class is disjointWith plant class (all plants are not people) or that a father class represents the intersection of parent class and male things class (all fathers are both males and parents). But for iCalendar and SkiCal more vaguely defined relationships seem to be sufficient, and any more detail is not easily definable.
In addition, attempting to connect two ontologies with such precision assumes a degree of information about the intentions of the creator of the ontologies that is probably unwarranted. Person, father, plant are comparatively easy to define and understand; time, location, role are less straightforward.
However, if a degree of imprecision is acceptable and still informative, then simple relationships like daml:sameClassAs can be time-saving and useful.
Calendaring and scheduling applications need to be able to do operations on dates and times, such as:
To display calendar information or schedule events, applications need to be able to makes these types of queries of a source of data, and that requires knowing that certain objects are the sorts of things that one can perform a 'greater than', 'less than' or 'between' operation on. Datatyping can also enable or increase the speed of indexing in databases (for example in several relational databases, datatyping is the only way that these types of queries about dates can be performed).
The RDFCore group is currently finalising a model for expressing datatypes. DAML+oil has its own methods of using datatypes. It has certain syntactic structures which point to XML Schema datatypes, both at the schema and the instance level.
For example, defining iCalendar in DAML+oil, one could say:
<daml:DatatypeProperty rdf:ID="DTSTART"> <rdfs:range rdf:resource="http://www.w3.org/2000/10/XMLSchema#dateTime"/> </daml:DatatypeProperty>
In which case sample instance data could look like this:
<VEVENT> <DTSTART>2002-03-15T15:00Z</DTSTART> </VEVENT>
and a DAML+oil processor could determine using the schema that the string 2002-03-15T15:00Z represented a XML Schema datatype, and index it accordingly.
Unfortunately the syntactic description of a date-time in iCalendar is not an identical subset of ISO 8601 to that used by XML Schema datatypes, and so translating iCalendar correctly to DAML+oil would involve creating a new datatype. This can be achieved by creating an XML Schema file with the correct restrictions and pointing to the datatypes in that.
One distinction that we found difficult to draw clearly at times was between what might be termed the 'default' datatypes of RDF and the XML Schema datatypes. Although RDF does not yet have its own datatypes, nodes must either be Literals or Resources. In some cases it is not clear whether to say something is a XML Schema datatype or an RDF resource or literal: for example do you say
ical:DESCRIPTION is a daml:DatatypeProperty with a xsd:String value, or
ical:DESCRIPTION is a daml:ObjectProperty with a rdf:Literal value
similarly, is it:
ical:URI is a daml:DatatypeProperty value xsd:anyURI, or
ical:URI is a daml:ObjectProperty value rdf:Resource?
DAML+oil makes a distinction between properties pointing to DAML+oil objects and properties pointing to datatypes (which are always XML Schema datatypes). These value spaces are disjoint, but there is an apparent overlap, as described above.
In many cases however, XML Schema datatypes are a useful and also extensible mechanism for datatypes. In SkiCal information about age restrictions are currently described in structured text, but we could define an extension to SkiCal specifically for movies and define our own XML Schema datatypes for the categories of age group used:
(example adapted from the DAML+oil walkthrough [DAML-WALK])
<xsd:schema xmlns:xsd ="http://www.w3.org/2000/10/XMLSchema#"> <xsd:simpleType name="under18"> <!-- under18 is an XMLS datatype based on decimal --> <!-- with the added restriction that values must be >= 17 --> <xsd:restriction base="xsd:decimal"> <xsd:minInclusive value="17"/> </xsd:restriction> </xsd:simpleType> </xsd:schema>
and then point at this in the SkiCal extension like this: (schema)
<rdfs:Class rdf:about="Minor"> <daml:intersectionOf rdf:parseType="daml:collection"> <daml:Class rdf:about="#Person"/> <daml:Restriction> <daml:onProperty rdf:resource="http://xmlns.com/foaf/0.1/age"/> <daml:hasClass rdf:resource="http://example.com/xmlschemas/cinema#under18"/> </daml:Restriction> </daml:intersectionOf> </rdfs:Class>
<ical:VEVENT> <ski:TITLE>Texas Chain Saw Massacre, The</ski:TITLE> <mdb:UKCertification>18</mdb:UKCertification> <ski:PROHIBITED> <ski:Minor /> </ski:PROHIBITED> </ical:VEVENT>
The authors found the process of creating DAML+oil schemas for iCalendar and SkiCal difficult and slow. Part of this difficulty came from translating between schema specification languages. Such a translation requires a very good understanding of the schema to be translated, and in most schema languages, it is not possible to capture exactly what the intention of the authors was at the time of writing.
Another significant problem was the awkward syntax and limited expressivity of the DAML+oil language. The syntactic difficulties are mainly due to the problem inherent in expressing complex constraints within the binary RDF model. The problems with expressivity are because DAML+oil is a language designed to harness the tractability of a certain class of Description Logics (themselves a subset of first Order logic). This makes for a very constrained functionality that can be puzzling and frustrating to use. A short position paper by Pat Hayes [HAYES] explains this difficulty very clearly.
Our experience suggests that there will only be limited benefits to most users of defining very precise relationships between objects using DAML+oil, because of the high cost of doing so, both in terms of the time and effort expended, and the high potential for error.
Where DAML+oil is used to describe relations between objects in ontologies written by different authors, connecting the ontologies involves hypothesising what the author meant by the ontology. A machine readable DAML+oil ontology does not capture all the aspects of the world in the area which it claims to represent, and so there is immense potential for error, still more so when the ontologies are defined in RDF Schema or Mime directories or other means less precise than DAML+oil.
Even within a single ontology, the degree of precision that DAML+oil enables you to describe between objects is in many cases greater than the degree of precision that you know pertains in those particular circumstances. This means that arbitrary decisions are made for those descriptions, leading to errors of interpretation, and, paradoxically, a reduction in precision, because of the introduction of this random noise.
However, as we have shown, a subset of DAML+oil can be very useful, with practical benefits for indexing RDF information for storage and querying, for querying and storing datatypes, and for creating approximate links between schemas or ontologies.
Internet Calendaring and Scheduling Core Object Specification
Network Working Group November 1998 Request for Comments: 2445
Category: Standards Track
F. Dawson, D. Stenerson
SkiCal Internet Draft Network Working Group Internet-Draft
Expires: July 8, 2002
G. FitzPatrick, P. Lannera, N. Hjelm
Annotated DAML+OIL (March 2001) Ontology Markup (DAML+oil
Frank van Harmelen, Peter F. Patel-Schneider and Ian Horrocks, editors (also Lynn Andrea Stein, Dan Connolly, and Deborah McGuinness, editors of previous versions)
Reference description of the DAML+OIL (March 2001) ontology
Frank van Harmelen, Peter F. Patel-Schneider and Ian Horrocks, editors.
Contributors: Tim Berners-Lee, Dan Brickley, Dan Connolly, Mike Dean, Stefan Decker, Pat Hayes, Jeff Heflin, Jim Hendler, Ora Lassila, Deb McGuinness, Lynn Andrea Stein, and others
*Rough* Notes on defining SkiCal in DAML+oil 2002-03-22
Draft iCalendar DAML+oil schema (partial) 2002-03-22
XML Schema Part 2: Datatypes W3C Recommendation 02 May
Paul V. Biron, Ashok Malhotra
XML Schema Part 0: Primer W3C Recommendation, 2 May 2001
David C. Fallside (editor)
iCalendar in UML
Draft Hybrid RDF Calendar Schema Michael Arick and Libby
RDF Schema Specification 1.0
Dan Brickley and R.V. Guha
W3C Candidate Recommendation 27 March 2000
RDF Model and Syntax Specification
Ora Lassila and Ralph R. Swick
W3C Recommendation 22 February 1999
1999-07-02 Dublin Core Metadata Element Set, Version 1.1:
Friend of a friend RDF vocabulary
Dan Brickley and others
A MIME Content-Type for Directory Information
Network Working Group Request for Comments: 2425
Category: Standards Track T. Howes, M. Smith, F. Dawson September 1998
Network Working Group Request for Comments: 2426
Category: Standards Track
F. Dawson, T. Howes September 1998
XML Topic Maps (XTM) 1.0
Editors: Steve Pepper, Graham Moore
Catching the Dreams
RDFWeb notebook: aggregation strategies