ILRT Technical Report Number: 1065
Publication Date: 2003/10/10
Last Modified : 2003/10/13 11:45
Author(s): Libby Miller , Martin Poulter
Abstract
This is a write-up of work done by one of us (LM) on tools to generate machine-readable descriptions of several aspects of images. Two interfaces were created; a command-line version and a point-and-click version. Both use client-side javascript to consult multiple remote database-driven services
The assumption is that you wish to annotate a photo or picture which is somewhere on the internet. You can say things like:
The existing vocabularies for each of these properties can be combined in an RDF/XML document with multiple namespaces. This document can be parsed and imported into a database, and it can also reside on the Web where it can be harvested by external programs, thus forming a part of the semantic web. Here is an example of such a file, with tags from different namespaces in different colours:
By combining many of these documents in a database, one could automatically produce:
Many tools already exist for creating this sort of annotation, for examples of which see the references. We would like to produce annotations simply and rapidly, using a language that is multiplatform, and which does not require a download or install. This started as a Java application. However, Java on Debian is not straightforard to install; and Java on Windows machines now usually requires administrator access.
Two technologies came to our aid. Firstly javascript, for which there are RDF tools written by Jim Ley. Secondly, web services that allow access to remote RDF data about people, locations and keywords over HTTP. Javascript was chosen because of the good support available in two major browsers - the multiplatform Mozilla and IE for windows. Javascript is a fully functional programming language running inside browsers. It doesn't require installation; however each browser runs a different variant of the language, so interoperability is difficult. In particular, the capacity to download and parse XML using the browser's native parser is restricted to IE on Windows and Mozilla. This functionality is used by the javascript RDF parser used to access the remote services, and so the application only runs on these browsers.
Applications such as Matt Biddulph's IRC bot-based conversational interface to image annotation, and Damian Steer's foaffinger rendezvous-based foaf creation tool (see References) have successfully used stateful, text-based interfaces for data creation. For a trained cataloguer doing a large batch of annotation work, command-line tools can be faster to use. Hence an early version of the tool used a text-based interface based on JavaScript Shell (see References).
However, a significant issue for image annotation is that as you catalogue you need to be able to see the image. It is also useful to be able to pick from a list of thumbnail images and then annotate several; this limits the usefulness of command line or bot interfaces. In response to user feedback on the first version, a clickable version was produced. The visual cues this gives makes cataloging images faster, although there are several significant problems with layout of the information.
In the rest of this paper, we describe the functionality enabled by the combination of javascript and web services. However, you can go direct to the tools. Once you select an image and annotate it, you can see the RDF document being built up. By emailing Libby Miller, you can ask for a password to store your the data; otherwise you will need to copy and paste the RDF into a text editor and put it on a server. If you then notify the RDFWeb database of the location of that file, the that file will be harvested and its data available for querying through the.
The tool uses a proxy to download 1) a page of links to thumbnails, 2) a page with images in it, or 3) a single image, into an iframe. The images are accessed using the DOM, and displayed. Clicking on an image triggers a download of the image or html page linked to in the initial thumbnails page, and then the tool figures out if it is an image or an html page. If the latter, it makes a guess about which is the correct image, and makes that the main item to be catalogued. At this stage we have something like this:
and the RDF generated looks like this:
<rdf:RDF xmlns='http://xmlns.com/foaf/0.1/' xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#' > <rdf:Description rdf:about=""> <annotates rdf:resource="http://swordfish.rdfweb.org/photos/2003/06/12/2003-06-12-Images/4.jpg"/> </rdf:Description> <Image rdf:about="http://swordfish.rdfweb.org/photos/2003/06/12/2003-06-12-Images/4.jpg"> <thumbnail rdf:resource="http://swordfish.rdfweb.org/photos/2003/06/12/2003-06-12-Thumbnails/4.jpg"/> </Image> </rdf:RDF>
Dan Brickley has produced a service whereby appending a noun to the namespace http://xmlns.com/wordnet/1.6/ gives you the wordnet heirarchy for that noun, if it exists. The image annotating tool uses this trick, so if you type 'parrot' into the 'keyword' box, the tool uses Jim Ley's RDF parser to fetch the RDF associated with http://xmlns.com/wordnet/1.6/Parrot, and display it in a useful way so that the tool user can check that it displays the term they are interested in, and also see if a sublcass of the main term might be more appropriate.
The wordnet term is then added to the generated RDF by clicking on it, for example:
<rdf:RDF
xmlns='http://xmlns.com/foaf/0.1/'
xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'
>
<rdf:Description rdf:about="">
<annotates
rdf:resource="http://swordfish.rdfweb.org/photos/2003/06/12/2003-06-12-Images/4.jpg"/>
</rdf:Description>
<Image
rdf:about="http://swordfish.rdfweb.org/photos/2003/06/12/2003-06-12-Images/4.jpg">
<thumbnail
rdf:resource="http://swordfish.rdfweb.org/photos/2003/06/12/2003-06-12-Thumbnails/4.jpg"/>
<depicts>
<wn:Parrot/>
</depicts>
</Image>
</rdf:RDF>
For images containing people, it's useful to be able to say that the image depicts a particular, identified person. See the codepiction experiment for more information about this approach.
One issue is a convenient way of finding people's sha1-encoded email addresses (or their actual email addresses and converting them using a tool). This is where a remote service from a database which already contains this information is useful. This could be, for example, a private address book with a remote interface which produces RDF. In this case, we use an interface to a harvested RDF database.
Sha1-encoded mailboxes and images are shown in response to a query on a substring of a name. Clicking on the image or the name produced adds the person to the RDF. If the person is not in the database, they can be added manually using the forms. At no time is an email address made public.
Ideally the location of a photo would come from the EXIF (EXchangeable Image Format) data the camera produces: the same goes for date. The first of these is not yet available so we have had to come up with something else. The latter is readily available, but we have not come across EXIF parsers in javascript as yet.
We have chosen to use the nearestAirport property to associate an image with location data at this time. This is because information linking airports with latitude and longitude is freely available. As an added bonus, this method preserves privacy.
The key issue in terms of accessing geodata is human-readable to lat/long mappings. As a rough pass, the airports data works well because there is a human-readable name for the airport which includes the nearest town or city. This means we can search on the airports data using user-inputted names of places and get out the lat/longs. A similar (and more finegrained) approach would be to use the spacenamespace data; at the moment this is UK-only however.
Modelling the nearestAirport information was difficult. It is not the nearestAirport to the picture as an artifact (the picture may be held on one or more servers, well away from the location). Nor is it necessarily a picture of a location. Instead, it's the location the camera was in when the picture was taken. Similar arguments apply to the date the picture was taken. An experimental new property, creationEvent, was created to test this out. The use of creationEvent masks a hidden resource - an object representing the event, to which nearestAirport and date can be attached.
Users can also add a freetext description. This is coded as the Dublin Core description of the image.
As we have seen, several different kinds of information are presented in separate boxes on the page, and these can easily overflow a single screen, and some familiarity with the tools are required before users know where the result of a query will appear. With more work on the javascript and style sheets, these prototypes could be developed into visually appealing and unintimidating services.
RDF can be used to say anything about anything, and coupled with the ability to annotate any image on the web, this could lead to both
Retaining the source of these annotations within the application and the sotware is therefore essential, in order to be able to remove annotations where there are privacy implications.
At the moment, users have two things they can do with the annotation once they have created it. If they are authorised, they can automatically upload the finished RDF/XML to Libby's server. Since the interfaces display the RDF as it is being created, users can copy and paste it into a text editor and then save it in their own web space. They would then have to publicise the URL of the resulting document if they want it harvested. The visible, colour-coded RDF/XML serves an educational purpose by making the machine-readable end product easy to understand. For deployment among cataloguers, the RDF itself will have to be invisible to the user.
To handle the aforementioned issues of trust and privacy, tools will have to encode and process information about the provenance of data. As discussed above, the present application made use of the properties foaf:annotator and foaf:creationEvent, which are not in the official FOAF schema. They are here as an experiment and will be probably moved into another namespace.
Further kinds of data that we might want to include in annotations include:
Codepiction (Dan Brickley)
http://rdfweb.org/2002/01/photo/
Codepiction search interface (Libby Miller, Dan Brickley)
http://swordfish.rdfweb.org/discovery/2001/08/codepict/
Codepiction paths interface (Damian Steer, Libby Miller, Dan Brickley)
http://swordfish.rdfweb.org/discovery/2002/02/paths/
FOAFFinger (Damian Steer)
http://rdfweb.org/people/damian/foaffinger/
A Semantic Web Shoebox - Annotating Photos with RSS and RDF (Matt Biddulph)
http://www.hackdiary.com/slides/www2003.pdf
JavaScript Shell: A command-line interface for JavaScript and DOM (Jesse Ruderman)
http://www.squarefree.com/shell/
Adding SVG outlines to co-depiction photo metadata (Jim Ley)
http://jibbering.com/rdf/foafwho.html
Simple javascript RDF Parser and query thingy (Jim Ley)
http://jibbering.com/rdf-parser/
spacenamespace (Jo Walsh)
http://space.frot.org/