XML, RDF and the structured Web

RDF as structured babel

Often hear talk about prospect of turning the Web into a 'giant database', and that XML is the technology that will enable this. RDF suggests that the crucial technology here is one we've had since the beginning: a universal addressing system, the URI. This is not to underplay the importance of XML (or indeed RDF), but to note that the key to solving many of these problems lies in the foundational technology of the Web. Unique Identifiers.

Analogy: relational database technology is now a desktop commonplace. How do we hook together different tables of information describing different classes of object? Uniquely identifying keys.

Simple SQL example:
	select 	firstname, surname, street 
	from 	Person, Address 
	where 	WorksAt.person_id = Person.person_id  and
		WorksAt.address_id = Address.address_id

Can be thought of as a graph...

	person_34 --- worksAt---> building_101
	   \-----firstname---> "Dan"
	   \-----surname----> "Brickley"

	building_101 ---street--> "Berkeley Square"

Fragmented data is joined through use of reliable identifiers, in this cases, for people and their various addresses. RDF and the Web does exactly the same thing, but on a global scale. The enabling technology here is not a file format but a convention for uniquely identifying objects: the URI.

Goals / The Semantic Web promise

data aggregation on a massive scale. RDF's graph data model provides a simple formalism for aggregating data from diverse sources. Simple overlay one graph on top of another. URIs provide a global framework for joining together diverse collections of (meta)data.

Open Issues / challenges /dangers

what is the URI of a person? Of a museum artifact? Of an idea? Of a Web site versus a Web page versus a content or language negotiated rendering of that Web page?

Dangers: 'category mistakes' (Ryle 1945)

Inspecting URIs: can we tell from inspecting the URI 'http://www.mozilla.org/' whether it refers to a Web page or a Web site? These are two very different objects, each with different properties. Danger of ambiguity: if I decide to use my home page URI or my email address to represent myself in metadata, this may confuse others.

If we get this wrong, confusion follows: does 'http://purl.org/net/danbri/' have a size-in-bytes or a weight-in-pounds? (Solution: Inscrutable URIs? uuid:342342-2342342-2342342-2342)