Author: Dan Brickley <firstname.lastname@example.org>
This version: 2001-01-22
This document is based on a presentation on W3C's Resource Description Framework (RDF) originally given at the European Commission Metadata Workshop, Luxemburg 1999.
Note: the current document does not include the full textual content of the overhead slides used during the original presentation, and may therefore lack clarity unless the embedded images are read. Textual and graphical HTML versions of the original PowerPoint slides are also available online and will at some point be more accessibly intergrated into this document. Apologies for the current dependency on embedded GIF images.
Feedback on this version of the document is welcomed; future revisions may be appropriate for a more general audience.
This presentation of RDF attempts to provide a high level overview of W3C's Resource Description Framework, while grounding the discussion in enough technical detail to highlight RDF's novel and interesting characteristics.
For those who have not heard of RDF before, RDF is a relatively new framework for metadata interoperability devised by the World Wide Web Consortium, home of HTML and other Web standards. This overview does not assume any special knowledge of RDF, but does assume some basic familiarity with concepts such as 'metadata', 'URI', 'XML'.
To understand RDF it is essential to understand something of the way in which 'technology' meets 'society' in the RDF design. To that end, this overview describes...
The aim is to show just enough rdf/xml syntax to make certain points. Rather than dwell on technical detail, a full understanding of RDF needs to begin with some historical context.
Aim first is to begin at the beginning, situating RDF in historical context.
First we'll look at the broad architectural aims of the initiative, specifically at the extraordinarily wide range of target applications we expect to see RDF used for.
Then I hope to make clear the connections from this background context to the detail of "how it works": (one view of) why the RDF Working Groups made the particular engineering decisions they did.
There's a clear distinction in RDF between an abstract data model and the concrete syntax used to deploy RDF in the real world. You can think of the "model" as, in effect, RDF's philosophy of what it is to decribe a resource, while the RDF syntax is is basically the "file format". After the abstract information model and the XML interchange syntax, the third component of RDF is the schema language system. We'll gloss over the details, and again only at enough detail to understand how the approach taken fits into RDF's broader context.
Finally, we finish up with a quick status report and some tough questions facing RDF in the future...
Some historical and political context. RDF is the core technology of the W3C's Metadata Activity, and as such aims to provide a coherent umbrella framework suitable for use by the various metadata applications of W3C: digital signatures, priviacy preference management, and perhaps most importantly PICS, "the Platform for Internet Content Selection". PICS' main concern was with providing a neutral technology that could be used to filter out pornography and other controversial content; at the time various governments were wondering about "banning the internet"...
A second strand of metadata work that fed into the design of RDF was Dublin Core and the associated Warwick Framework for metadata modularisation. We haven't time to look at the details of the Warwick Framework here, but for anyone familiar with that work, the influence on RDF should be clear.
The RDF effort was kicked off in early 1997 after a number of industry submissions to W3C concerning metadata-related standards. XML-Data from Microsoft and Meta Content Format (MCF) from Netscape. Representatives from these diverse groups, amongst others, formed a PICS "Next Generation" Working Group, which ultimately became the RDF Model and Syntax Working Group.
A final, often ignored, strand of work that fed into RDF is the URI (Uniform Resource Identifier) Specification (IETF RFC xxxx). This provides a distributed, extensible namespace which hangs the whole Web together; URI is an umbrella name for URNs, URLs and other identifiers that share a common syntactic structure.
So, after all this diverse input, what was RDF actually trying to achieve?
First, most pragmatically, something was needed that could be used with PICS, Dublin Core, Digital Signatures, and so forth. We wanted a convention for metadata interchange on the Web. Less abstractly, we wanted a common syntax or file format to save needless duplication of effort. By 1997, it was clear that XML was the appropriate technology for this.
What more? Perhaps most crucially a need was felt for vocabulary semantics to be defined "in the community" and not by slow moving, overly-politicised industry standards committees. Relatedly, there was a realisation that the world isn't parcelled into distinct metadata communities and that any solution would need to mix and match overlapping data structures defined in multiple application domains.
Having sketched some historical background on the RDF working groups, we can now look at what they went away and built. Strangely enough, the name itself is almost self-explanatory. They built a "Framework for Describing Resources."
Let's take those three words in turn.
To this end, RDF builds on two key technologies. XML and URIs
We've heard a "diversity" a few times now. Let's take a look at what this might amount to. The question to bear in mind all along is "What can these diverse applications have in common?"
We've got Dublin Core-ish stuff: finding things, creating (by hand or by Web robot) descriptions that can be used to characterise "document like objects" and their associated resources. Not just a book or an image but the details of the creator of that image, his/her organisation, address etc.
Then we have person-oriented metadata. Describing people and their information preferences.
Aside: what's the difference between saying a book is about sociology or profiling me as being interested in sociology?
Then we have sitemaps, web collections, channels: practical problems for web masters: how to describe their content as a complex evolving whole...
We've also got to deal with content-rating and filtering applications (this site is 'boring', 'cool', 'racist', 'rude'...).
And of course IPR (Intellectual Property Rights) issues surrounding the people and content described in these various ways...
Hopefully this has set the scene for a quick overview of RDF itself.
Rather than deal in abstractions, we'll first jump in and look at the concrete syntax used by RDF 1.0 applications to interchange metadata descriptions as XML files. Then we'll work back to the abstract information model that underpins and informs the structure of this markup.
What does RDF add to XML?
A common question (a very common question) is: What more does RDF add? Doesn't XML itself address these needs?
The answer, to be blunt, is "no".
XML is a single universal file format. XML is the file format to end all file formats. XML, famously, is the "new ASCII". Useful, but (like TCP/IP and Unicode) something that has to be built upon.
RDF's contribution is simple. When we see XML files written according to the RDF specifications, we know how to interpret them. We know what they're saying (claiming... asserting...).
We can read an RDF document and figure out that the markup is making statements about the named properties of named resources. Even if we've never seen this particular RDF application before.
So, what does it look like in practice...?
This is the only example of RDF syntax in this presentation of RDF. We see depicted here a single statement: "Some document has a Dublin Core 'creator' property with the value 'Joe Smith'. Joe Smith is the creator of this document. That's it. The world's simplest RDF statement.
But... look at the markup. It seems complex. What we have here is RDF's way of unambiguously writing down the simple statement. Investigating the markup, we see highlighted in red the most important bits. You can take it on trust that the RDF specification defines very carefully how to turn these simple looking diagrams into XML. And back again. We won't go into the details here.
Now... the things to notice is the use of a URL (URI in fact) to identify the notion of creator, and to hook it up to the full Web identifier for the Dublin Core metadata vocabulary.
From now on, we'll just look at the abstract model, as depicted here; and take the detail of the XML syntax for granted.
The RDF model itself. The interesting bit!
To recap our historical review: a key problem RDF attempted to addres was that of multiple overlapping domains. A few are listed here as examples. We have to deal with: Dublin Core, IMS, V-Card, subject classifications, industry initiatives, and numerous others.
This is the problem that the Warwick Framework addressed through the notion of diverse metadata packages. What we'll see now is how RDF embraces and extends that philosophy to allow what we might call "fine grained" metadata mixing.
This is another simple piece of metadata. Earlier we saw how to represent the statement '"Joe Smith" is the DC:creator of (some document)".
What we see here starts to reveal the extensibility of RDF.
This thing (resource, object, whatever...) is stated to be the creator of some document, and it in turn has additional properties ascribed to it. In this case "name" and "email".
The feature to note here is that RDF allows us to use Dublin Core (for example) to define the creator arc (property) while using a people-oriented schema (vocabulary) for notions of name and email address describing properties of that "creator".
Time for a little more detail.
We've seen single statements and a slightly more complex RDF "model" or graph that was more structured. Now the full horror. There are two basic forms of statement in RDF.
The power of RDF comes from the fact that these statements can be combined to any level of detail providing for arbitrarily complex resource descriptions.
The big RDF idea is the same idea that underpins the Web.
This is the key idea: the arcs (properties) and nodes (resources) in these diagrams (graphs... descripions...) are identified using URIs.
Much of RDF's practical utility comes from this design decision.
Both the things we describe and the metadata vocabularies we use to describe them can be considered identifiable resources within the Web.
As such, they can be described in RDF...
A few more justifications for the engineering decision to use a "nodes and arcs" information model.
Firstly, it is naturally easy to extend. No built in problems for adding new properties (we might contrast with the relational database model here...).
Secondly, the RDF model allows us to draw seamlessly on multiple descriptive vocabularies
Thirdly, it is something we can all agree on, even if we disagree about the actual problem domain being described.
Finally (although we don't have time to dwell on this)... the combination of a graph data model and URs is ideally suited for aggregating data. You can take two RDF graphs and superimpose them to pool knowledge.
A quick run-through of the principles behind the RDF Schema system.
This is another nodes-and-arcs diagram to illustrate how a set of RDF class definitions can be modelled as "just more metadata".
Here we show relationships between a hierarchically organised set of classes or categories.
This lets us build simple inferencing systems: a leopard is a big cat; searching for 'big cat' should retreive photos of a leopard.
One last thing to say on RDF Schema. What we have here is a foundation for something richer. RDF Schemas give us a bare-bones framework but establish an important principle. The terms defined in metadata vocabularies are simply more Web resources. As such, we can use RDF to describe their relationships and attributes (eg. multilingual labels).
Here, for example, we show how somebody's unknown notion of "author" can be mapped across as a specialisation of the Dublin Core notion of "creator".
That's all we really need to know about the content of the RDF Model, Syntax and Schema specifications.
"Understanding RDF" (1999/2001), Dan Brickley, ILRT.