Author: Dan Brickley
Date: 26th January 1999
Version: first draft
Latest Version: http://purl.org/net/rdf/papers/sitemap/
Status: This is very much thinking-out-loud work in progress. I wanted to see whether sitemaps using DC + HTML links looked plausible, so I tried to write it up plausibly... comments etc to email@example.com
This note proposes a minimalistic approach to the representation of Web sitemaps using W3C Resource Description Framework(RDF). By taking advantage of RDF's ability to create data structures that draw upon multiple vocabularies, we can build upon a minimalistic foundation for Web sitemap representation simply by drawing upon further RDF vocabularies. This document presents a general approach as well as a proposing a common sitemaps vocabulary consisting of the typed links enumerated in the HTML-4.0 specification combined with the Dublin Core properties for discovery-oriented resource description.
This note does not propose any new vocabulary, architectural or protocol components to support Web sitemaps. The term 'sitemap' is used here to refer to hierchically-oriented representations of Web content such as might appear in a 'table of contents' for a Web site, a bookmarking system in a browser or in a 'tree control'.
The HTML 4 specification [http://www.w3.org/TR/REC-html40/] defines both a core set of "link types" for describing the relationship between web documents [http://www.w3.org/TR/REC-html40/types.html#h-6.12]. These link types are reproduced below:[*** to do: fix dud hyperlinks ]
These are intended for use with the LINK element in HTML 4.0 documents. The HTML specification notes that "Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile to cite the conventions used to define the link types". The HTML specification does not define a syntax for profiles defining new types of link, but does ensure that profiles are identified using URIs.
This proposal does not require that HTML profiles are expressed using RDF. It does however require that the core HTML link types introduced at [http://www.w3.org/TR/REC-html40/types.html#h-6.12] have a representation within the RDF model, and for this the need to be addressable using URI references. One approach to assigning URIs to the HTML link types would be to produce a URI for each link type by combining the URI for the HTML 4.0 specification with the name of each link type. An alternative approach, adopted here, is to use an entirely separate set of URIs.
An additional complication is that the HTML link type names are case insensitive, whereas RDF identifiers are case sensitive. For this reason, any RDF representation of these types of link must choose one particular form (ie. 'Next','NEXT','next' in HTML 4.0 are all equivalent link types, but in RDF would be modelled as separate properties). The convention in RDF is to use initial lower case letters to name properties, where properties are the 'arc labels' or 'link types' which connect a graph of web resources within an RDF graph.
The most natural representation of HTML link types in the RDF model is as properties, ie. as typed links between web resources. Consequently this note adopts the convention of 'initial lower case', and represents the HTML 4.0 link type 'Prev' as an RDF property whose URI is 'http://purl.org/net/rdf/papers/sitemap#prev'. We use the URI of this document as the basis for the RDF URIs.
The XML Link working group are investigating richer typed linking in an XML context. A working group within the Dublin Core initiative are creating a specialised set of relationship types for use with the DC:Relation property. There are many other possible sources of relationship types that could be used in creating Web sitemaps. This document uses a combination of the HTML 4.0 linking vocabulary and the unqualified Dublin Core properties since they represent stable and well known 'best practice' specifications for Web content. Other types of relationship can easily be included within this approach to sitemaps, so long as an RDF representation of those relationships is available. RDF Schema provides a minimalist mechanism for expressing relationships between properties: rdfs:subPropertyOf. This allows us to say, for example, that rs:Copyright is an rdfs:subpropertyOf dc:Rights, ie. the former is a more specialised property whose presence can be taken to imply the presence of the latter. This might be used to indicate 'specialisation' relationships between the simple HTML-derrived vocabulary items used here and other useful sets of relationship types.[todo: ... also WebDAV? classic hypertext refs? ....]
Appendix A (below) provides an RDF Schema for the link types (properties) introduced in HTML 4.0. An RDF representation of the Dublin Core is defined elsewhere [http://purl.org/dc/elements/1.0/] [http://www.w3.org/TR/WD-rdf-schema#dublincore].
The following types of RDF relationship (properties) are defined, based on the HTML link types:
Note that there is some overlap between the coverage of the Dublin Core and the HTML link types (eg. certain uses of DC:Relation and HTML's 'Alternative' overlap). Although the complete set of HTML link types are represented here for completeness, it is likely that only a subset of these will be used in an RDF sitemaps context.
By defining an RDF representation of the HTML link types, it is possible to create an RDF graph representing the structure of a site either through parsing HTML LINK elements or by using RDF's XML-based serialisation syntax, or some combination of both. For example, the following use of LINK in the HTML document http://www.w3.org/TR/REC-html40/types.html can be mapped into a set of RDF statements. In the following examples we use the abbreviation 'DC' to refer to the Dublin Core RDF vocabulary and 'RS' to refer to the RDF Sitemap vocabulary.
...is equivalent to the following RDF statements:(HTML from the http://www.w3.org/TR/REC-html40/types.html page) <LINK rel="previous" href="charset.html"> <LINK rel="next" href="struct/global.html"> <LINK rel="contents" href="cover.html#toc">
<RDF xmlns="http://w3.org/TR/1999/PR-rdf-syntax-19990105#" xmlns:rs="http://purl.org/net/rdf/papers/sitemap#" xmlns:dc="http://purl.org/dc/elements/1.0#"> <Description about="http://www.w3.org/TR/REC-html40/types.html" <rs:previous resource="charset.html" /> <rs:next resource="struct/global.html" /> <rs:contents resource="cover.html#toc" /> </Description> </RDF>
The relationship information based on the HTML vocabulary might be augmented by additional metadata drawing on Dublin Core properties such as 'Creator' and 'Description'. We can also take advantage of the more sophisticated (and verbose) RDF syntax to express a number of statements about the structure of a website using a single block of RDF.
The following RDF shows how a sitemap might document the relationships that exist between documents alongside properties such as 'Title' for presentation in some form of user interface (eg. graphical map). Here we describe the fact that 'types.html' has a 'previous' document 'charset.html', whose Title and preceding document are also described.
<rdf:RDF xmlns:rdf="http://w3.org/TR/1999/PR-rdf-syntax-19990105#" xmlns:rs="http://purl.org/net/rdf/papers/sitemap#" xmlns:dc="http://purl.org/dc/elements/1.0#"> <rdf:Description rdf:about="http://www.w3.org/TR/REC-html40/types.html"> <dc:Title>Basic HTML Data Types</dc:Title> <rs:previous> <rdf:Description about="charset.html"> <dc:Title>HTML Document Representation</dc:Title> <rs:previous rdf:resource="struct/global.html" /> </rdf:Description> </rs:previous> <rs:next rdf:resource="struct/global.html" /> <rs:contents rdf:resource="cover.html#toc" /> </rdf:Description> </rdf:RDF>
These statements can be represented in RDF triple form as follows:
triple("http://purl.org/dc/elements/1.0#Title","http://www.w3.org/TR/REC-html40/types.html","Basic HTML Data Types"). triple("http://purl.org/dc/elements/1.0#Title","charset.html","HTML Document Representation"). triple("http://purl.org/net/rdf/papers/sitemap#previous","charset.html","struct/global.html"). triple("http://purl.org/net/rdf/papers/sitemap#previous","http://www.w3.org/TR/REC-html40/types.html","charset.html"). triple("http://purl.org/net/rdf/papers/sitemap#next","http://www.w3.org/TR/REC-html40/types.html","struct/global.html"). triple("http://purl.org/net/rdf/papers/sitemap#contents","http://www.w3.org/TR/REC-html40/types.html","cover.html#toc"). (note that relative URIs should be resolved before storing in an RDF database)
This document does not make any recommendations about the way in which sitemaps should be represented in concrete form on a Web site. This will vary depending on context, document format and workflow issues. In many cases, HTML 4.0 alone could be adequate, while others might prefer to manage all information about document relationships in a more centralised manner. The vocabulary proposals made here are equally appropriate in both contexts (see [DublinCore] for guidelines on expressing the Dublin Core within the constraints of the HTML metadata 'META' tags).
The following RDF describes the HTML link types as a set of RDF properties. The formal identifiers for each property are lower case (eg. 'next'), and are accompanied by language-tagged human readable labels and textual comments abstracted from the HTML 4.0 definitions. A machine-processable version of this RDF is also embedded in the source of this document (which will consequently fail to validate against the HTML DTD).
<RDF xml:lang="en" xmlns="http://w3.org/TR/1999/PR-rdf-syntax-19990105#" xmlns:rdfs="http://www.w3.org/TR/WD-rdf-schema#" > <Property ID="alternate" rdfs:label="Alternate" rdfs:comment= "Designates substitute versions for the document in which the link occurs."/> <Property ID="stylesheet" rdfs:label="Stylesheet" rdfs:comment="Refers to an external style sheet."/> <Property ID="start" rdfs:label="Start" rdfs:comment= "Refers to the first document in a collection of documents. This property tells search engines which document is considered by the author to be the starting point of the collection."/> <Property ID="next" rdfs:label="Next" rdfs:comment= "Refers to the next document in an linear sequence of documents."/> <Property ID="prev" rdfs:label="Prev" rdfs:comment= "Refers to the previous document in an ordered series of documents."/> <Property ID="contents" rdfs:label="Contents" rdfs:comment="Refers to a document serving as a table of contents."/> <Property ID="index" rdfs:label="Index" rdfs:comment= "Refers to a document providing an index for the current document."/> <Property ID="glossary" rdfs:label="Glossary" rdfs:comment= "Refers to a document providing a glossary of terms that pertain to the current document."/> <Property ID="copyright" rdfs:label="Copyright" rdfs:comment="Refers to a copyright statement for the current document."/> <Property ID="chapter" rdfs:label="Chapter" rdfs:comment="Refers to a document serving as a chapter in a collection of documents."/> <Property ID="section" rdfs:label="Section" rdfs:comment=" Refers to a document serving as a section in a collection of documents."/> <Property ID="subsection" rdfs:label="Subsection" rdfs:comment="Refers to a document serving as a subsection in a collection of documents."/> <Property ID="appendix" rdfs:label="Appendix" rdfs:comment= "Refers to a document serving as an appendix in a collection of documents."/> <Property ID="help" rdfs:label="Help" rdfs:comment= "Refers to a document offering help (more information, links to other sources information, etc.)"/> <Property ID="bookmark" rdfs:label="Bookmark" rdfs:comment="Refers to a bookmark. A bookmark is a link to a key entry point within an extended document. Note that several bookmarks may be defined in each document."/> </RDF>