latest version: http://ilrt.org/discovery/2000/08/bized-meta/
This version: 2001-07-01
See also: old rdf sitemaps proposal; Mozilla; dc-architecture; RSS 1.0;
This document provides an overview of a small RDF testbed we're working on in Bristol. It is intended primarily for developers and members of the metadata community. It describes work in progress, and has been made available before the testbed itself is complete.
Biz/ed (Business Education on the Internet) provides an interesting testbed site for exploring the real-world application of RDF metadata. The Biz/ed site was originally established in 1995, providing online support for educators and students in the field of Business and Economics.
The Biz/ed site has always used embedded metadata (eg. to augment search result screens with abstracts for pages that match some search term). Since 1997 we have embedded simple Dublin Core descriptions in most of the HTML documents on the site. In parallel, Biz/ed has been developing a large Internet Catalogue providing descriptions and classifications of other high quality Web sites. This testbed is intended to explore the use of RDF metadata for sites like Biz/ed.
As of the 0.9.5 release of Mozilla, HTML LINK metadata is supported in the Site Navigation toolbar, pictured here showing a Biz/ed page. Note the up, previous and next links in the 3rd toolbar.
The aim of this work is to explore the use of an RDF-based system for processing mainstream Web metadata, and to evaluate possible tools that will help webmasters provide more usable search, browse and sitemaps facilities using W3C metadata technology.
We intend to make available RDF dumps of the metadata embedded within the HTML page content of the Biz/ed site, alongside a small subset of the Biz/ed Internet Catalogue data. The former consists of Dublin Core properties such as 'title','description', 'creator', plus classified links to related Biz/ed pages (eg. 'next','previous','parent','child'). The Internet Catalogue records are basically Dublin Core metadata including rich classification using the Dewey Decimal system.
The current Biz/ed Web site uses a (non-metadata aware) full text indexing tool over the HTML documents, and the ROADS database system to manage and search the Internet catalogue data. A sitemap for the main site is maintained (by hand). Searches of the HTML website dynamically extract the 'description' metadata field from pages to provide more informative search results, but do not currently make use of the other embdded metadata on the site.
This will not be of much interest until we can show this data being used. We aim to show how this kind of simple website metadata can be loaded into an RDF database (eg. RDFdb) and queried to provide value-adding views into existing Web content.
In the absense of a running demo, here is a sketch of the kinds of metadata structures we intend to explore within this testbed.
The potential uses of an RDF database that describes and entire site can be illustrated be a simple example, based on the Web of inter-relationships that connect the pages on our site. This example walks up the logical structure of the site, starting with a very detailed page within a specialised subsection of Biz/ed.
If someone searches the Web site and finds the document whose URL is http://www.bized.ac.uk/compfact/bmw/bmw23a.htm, our metadata store knows that the title of this page is "Formula One", and that it has a description "Details about BMW's involvement in Formula One Motor Racing". While useful, this doesn't give the full picture, since there is a lot of additional information that is associated with related pages on the site. Biz/ed has included rich linking information in the metadata for each page (see example below), and we can use the RDF representation of this data to find other related data in our metadata store.
sample 1 - embedding rich linking information: <link rel="parent" href="/compfact/bmw/bmwindex.htm"> <link rel="previous" href="/compfact/bmw/bmw22.htm"> <link rel="next" href="/compfact/bmw/bmw23a.htm">
This tells us that the 'Formula One' page has a 'parent page' whose URL is "http://www.bized.ac.uk/compfact/bmw/bmwindex.htm". We can therefore consult the RDF database to find the title and description of this page too, providing more contextual information. The title of the index page is "BMW Company Facts Home Page", and the metadata embedded in describes it as the "Index for the answers to BMW's frequently asked questions".
In turn, this index page itself contains metadata pointing to its own 'parent' page, the Company Facts home page. The description of this page provides a broad overview of this section of the Biz/ed web site: "Answers to top companies and organisations' frequently asked questions". The Company Facts home page points to the main home page of Biz/ed as its own parent page. The metadata description embedded in the site home page describes Biz/ed as follows: "Biz/ed is a freely available Internet service for students, teachers and lecturers in business and economics".
Since we have aggregated all metadata for the site into a single RDF representation, it is possible to ask our metadata system questions such as the following:
Such facilities can help provide more contextual information in user interfaces such as sitemaps and search result screens. The next stage of this testbed is to load the Biz/ed metadata into a simple database system and build a WWW front-end to allow direct visualisation (and quality/consistency checking) of the site's embedded metadata.
The RDF model describing the documents and relationships mentioned in this example can be represented in RDF/XML syntax, or depicted graphically. The data shown is only a tiny fragment of the full RDF representation of this site, but is enough to show how the metadata for various pages can be merged together into a single database.
Some simple dumps of the embedded metadata from the bp, bmw and staff support areas of the site have been generated. The metadata has been mapped into the RDF data model (ie. lots of short 3-part statements), but the dumps have not yet been converted into an XML representation.
The following files are available; these encoded in a somewhat strange RDF-based format; a normal RDF encoding will be supplied later.
Sample Internet catalogue records are not yet available. We also hope to export linking data, ie. which pages link to which other pages. Mail the author if you want to be notified when the testbed is ready.
The following excerpt from the BMW section of the Biz/ed RDF database indicates the nature of the content. Everything (documents, properties of documents, relationship types) is specified using long Web identifiers, 'URIs'. This can seem rather verbose, but it does remove ambiguity from the data, making it easy to combine information from multiple sources.
The sample shown here also illustrates a classic problem of web metadata management: redundancy. Each page has a separate 'publisher' property with identical content. It might be more practical to exploit the known relationships amongst Biz/ed pages and associate 'publisher' information only with the higher levels of the Web site structure. RDF-based applications could explore the data and find the most accurate 'publisher' information for any given page, by traversing the 'parent' and 'child' connections between documents.
sample 2 - raw data dump (excerpt) for two Biz/ed pages: http://purl.org/dc/elements/1.1/creator ( http://www.bized.ac.uk/compfact/bmw/bmwlp.htm , "Education & Youth" ) http://purl.org/dc/elements/1.1/subject ( http://www.bized.ac.uk/compfact/bmw/bmwlp.htm , "economics, business, basic, company information, faqs, BMW, customer finance, purchase" ) http://purl.org/dc/elements/1.1/description ( http://www.bized.ac.uk/compfact/bmw/bmwlp.htm , "Example of a BMW purchase arrangement." ) http://purl.org/dc/elements/1.1/publisher ( http://www.bized.ac.uk/compfact/bmw/bmwlp.htm , "The Institute for Learning and Research Technology" ) http://purl.org/dc/elements/1.1/date ( http://www.bized.ac.uk/compfact/bmw/bmwlp.htm , "1996-12-11" ) http://purl.org/dc/elements/1.1/language ( http://www.bized.ac.uk/compfact/bmw/bmwlp.htm , "en" ) http://xmlns.com/hlink/0.1/parent ( http://www.bized.ac.uk/compfact/bmw/bmwlp.htm , http://www.bized.ac.uk/compfact/bmw/bmwindex.htm ) http://xmlns.com/hlink/0.1/previous ( http://www.bized.ac.uk/compfact/bmw/bmwlp.htm , http://www.bized.ac.uk/compfact/bmw/bmw25.htm ) http://purl.org/dc/elements/1.1/creator ( http://www.bized.ac.uk/compfact/bmw/bmw1.htm , "Education & Youth" ) http://purl.org/dc/elements/1.1/subject ( http://www.bized.ac.uk/compfact/bmw/bmw1.htm , "economics, business, basic, company information, faqs, BMW, company family tree" ) http://purl.org/dc/elements/1.1/description ( http://www.bized.ac.uk/compfact/bmw/bmw1.htm , "Describes how the UK arm of BMW works and details subsidiary activities." ) http://purl.org/dc/elements/1.1/publisher ( http://www.bized.ac.uk/compfact/bmw/bmw1.htm , "The Institute for Learning and Research Technology" ) http://purl.org/dc/elements/1.1/date ( http://www.bized.ac.uk/compfact/bmw/bmw1.htm , "1996-12-11" ) http://purl.org/dc/elements/1.1/language ( http://www.bized.ac.uk/compfact/bmw/bmw1.htm , "en" ) http://xmlns.com/hlink/0.1/parent ( http://www.bized.ac.uk/compfact/bmw/bmw1.htm , http://www.bized.ac.uk/compfact/bmw/bmwindex.htm ) http://xmlns.com/hlink/0.1/next ( http://www.bized.ac.uk/compfact/bmw/bmw1.htm , http://www.bized.ac.uk/compfact/bmw/bmw2.htm )
At the time of writing, there is no consensus around the best
form for an RDF sitemaps vocabulary. For historical interest,
here is a screenshot of an early RDF sitemap implementation:
the first release of Mozilla (Netscape 5). That format was
based on the (over) use of
nc:child arcs to
indicate structure. As such, the sitemap format didn't
preserve document ordering of items described sequentially.
Other subequent variants have experimented with the use of
This sreenshot is from an early Mozilla release; it is rendering the file webcat.rdf merged with the file bized.rdf. These files are in Mozilla'a original RDF sitemap format - neither are legal RDF 1.0. note also that the on-screen rendering seems to preserve the ordering of items in the document, even though that information should be lost by the RDF parser. Mozilla's RDF implementation is now much more mature, and should support RDF sitemap formats more reliably now. See also the file netcenter.rdf for an explanation of (a later version of) this RDF format.