What is Annotation? – A Short Review of Annotation and Annotation Systems [DRAFT]

ILRT Research Report Number: 1053

Publication Date: 2003/09/05
Last Modified : 2003/10/09 15:30

Author(s): Paul Shabajee , Dave Reynolds

Abstract
The annotation of Web-based data by user communities is a widely used means to augment and add value to resources (Shabajee et al 2002) and there are numerous examples of different types of annotation system across the Web (see below for examples). Different types of Web-based project will require different approaches to annotation. This short paper is part ongoing work as part of the ARKive-ERA (Educational Repurposing of Assets) project (ARKive-ERA, 2002) to characterize annotation in the context of Web-based resources and aims to provide 1) a discussion on the basic nature of annotations in a Web context and 2) a review of some significant annotation systems both personal (providing local annotation tools) and Web-based , i.e. that allow and support Web-based storage of annotations and/or collaborative annotation.

Paul Shabajee* and Dave Reynolds**

*Graduate School of Education and Institute for Learning and
Research Technology (ILRT), University of Bristol, Bristol, UK
Email: paul.shabajee@bristol.ac.uk

**Hewlett Packard Labs, Bristol, UK
Email: der@hplb.hpl.hp.com

What is Annotation?

“A note added to anything written, by way of explanation or comment.”

OED-Online 2002

The story so far…

A simple working definition of annotation could be characterized as follows:

“Broadly speaking annotation is metadata (i.e. data about data Gilliland-Swetland 1998) created after the creation or capturing of a piece of content or thing, in general the actual object itself remains unchanged by the annotation. It is the post hoc nature that distinguishes it from other forms of metadata. There are many 'traditional' types of annotation...” (after Shabajee et al 2002)

However, in fact the concept of annotation is both richer and more subtle than this simple definition suggests and these subtleties can have a direct bearing on the design of annotation systems. So we shall first examine some of these deeper issues and then arrive at a broad characterization of annotation.

It is not strictly the case that the original object that has been annotated is unchanged. For example an annotation written on a paper document, a plaque physically attached to a statue or a text annotation embedded within a word processor document have all altered the annotated object to some extent.

However, in all cases, the essence of the original object remains unchanged and the “necessary” original data can still be extracted, e.g. the original text is still visible under any annotation, or an un-annotated copy is available for comparison. However, the definition of “necessary” information depends very much on context. For example, in some contexts the layout of a text document is relevant, in the design of a poster for example, in others it is not at all, e.g. a preliminary draft of a project report. The “essence” of an ancient manuscript includes so many subtleties of aesthetics and physical composition that only annotation of a copy is likely to be deemed to have left the original unchanged.

There is also no fundamental set of “original objects”. The annotated object or the annotation itself can in turn be annotated.

Accessing Annotation – Linking: traditional examples of annotation (e.g. written annotation on text document) tend to possess an inbuilt/transparent system for viewing the annotation. The object and its annotation are physically connected. However with many computer-based examples, studied for this project, access to the annotation depends on having special devices, generally some specific kind of computer/internet platform. The annotation may be stored in a physically separate location to the annotated object but a fundamental characteristic of an annotation is that it must be linked to the base object. In the case of paper documents, adding ink, physically and chemically links it to the paper, in electronic environments annotations are hyper-linked or embedded (identifiably) to or in to the original object.

In computer-based examples it is then the viewing system that interprets the linkage between the object and its annotation and makes it apparent to the user. Thus annotation nature is not inherent in the annotation itself but in the combination of the annotation and the viewing mechanism – it is a system property.

To illustrate this, consider a system (an example of which is described later) in which geographic locations can be annotated by use of their GPS (Global Positioning System) coordinates. Thus a user at a location may view comments on the nearby restaurant, geological information or the location of the nearest petrol/gas station. By use a portable access device we can enable a user to treat these linked data items as virtual annotations on the physical space, which they can both access and create. However, the creators of the separate datasets (on restaurants, gas station locations, geological information) probably did not conceive of their data that way. It is the access system that has converted a pure dataset indexed by some attribute (GPS in this case) into an apparent annotation of objects or locations identified via that attribute.

Access control and security: once we separate object and annotation in this way it becomes possible to apply different access control policies to the annotation and the base object. Thus an object may appear un-annotated to one set of users while be annotated from the point of view of a different user community. Thus annotation nature also depends on the user and their context (their access rights and knowledge) and not just the view system itself.

Even in the physical world this is true. For example, during the US Great Depression ‘hobos’ used (‘secret’) chalk marks on the pavements to indicate the likely welcome/status at the given location (http://www.worldpath.net/~minstrel/hobosign.htm). This is echoed in the evolving ‘warchalking’ (http://www.guardian.co.uk/online/story/0,3605,748499,00.html), used to signal where wireless network connections exist, and their status. In these examples unless the code is known or understood the meaning of the annotation is unknown and may not even be apparent as annotations.

Electronic systems based on various approaches to security e.g. user-name/password through to the various ‘digital signature’ technologies, provide many such opportunities.

Defining annotations

The issues raised above show the complex and problematic nature of defining exactly what an annotation is and when. This is similar to the more common problem of the distinction between data and metadata, what is data in one context e.g. species population figures in a context of conservation, become annotations (metadata) when viewed from the context of annotating a map or physical location.

The keystone that allows us to separate annotation from mere data about an object is that annotation is directly linked to and accessible from the original object. To illustrate this, if we take the case above; geographic co-ordinates being used to link ‘annotation’ to a physical location e.g. a restaurant. This data seems to change from ‘data about’ to ‘annotation’ when the user is physically at the restaurant and uses some kind of device to access the data about the location, there and then. Whilst if they were looking up the same data earlier in the day, while at their office, it would be regarded as looking up data on a database. It seems that annotation must be viewed alongside the original object or it ceases to be annotation.

We can summarize this understanding of the nature of annotation as follows:

However we do not claim that these are exclusive and believe that the concept of annotation is necessarily ‘fuzzy’ in a similar manner to that of the concept of metadata, see above.

Examples of Existing Computer-based Annotation Systems

As indicated above, there are many Web-based projects and sites that utilize annotation as a means of adding value to, or augmenting their data. Eighteen examples are given below that illustrate a range of key types and approaches, these are divided up into thee sections 1) Web-based annotation systems, 2) personal annotation tools, 3) generic collaborative annotation systems.

Examples of Websites that use annotation systems

Amazon.co.uk (www.amazon.co.uk): Users can review products (e.g. books, video and software) either by ‘rating’ the product on a 1-5 scale or entering prose text of 1,000 word maximum, with a recommended length of 75 to 300 words, along with a rating. In order to make a rating or text review the user must log on, having pre-registered providing an e-mail address and password. The average rating and text reviews are made available to any user of the page. Although it does not seem to be documented on the site it seems that text reviews are moderated i.e. checked that the guidelines are followed (http://www.amazon.co.uk/exec/obidos/subst/misc/author-review-guidelines.html/026-1836225-8318031). This is indicated in that the site notes that “reviews are posted on our site within five days, but occasionally it takes longer…”. Rating reviews seem to be processed more automatically.

PseudoCAP (http://www.pseudomonas.com/annotation.html): People with cystic fibrosis, burn victims, individuals with cancer, and patients requiring extensive stays in intensive care units are particularly at risk of disease resulting from P. aeruginosa (a bacterium) infection. Annotation of DNA sequences, by experts working on this bacterium provides a means of learning about the form and function of the DNA sequences and thus the action of the bacterium.

The initial phase of annotations are now complete, this involved a commercial organization (PathoGenesis Corporation - http://www.pathogenesis.com/) and community of 64 scientists. Annotations provided by the Pseudomonas scientists were subjected to peer review and used to aid the final genome annotation that was published. The original work was done by e-mail but a web-based form system was developed to facilitate submissions. Other users can 'get involved' if they wish to contribute or wish to propose changes via a registration process. Members log in and authorship is noted in the database, a systematic review process then takes place.

Slashdot (http://slashdot.org) : Is a community news portal ("News for nerds, stuff that matters"). News items are posted to the site by members of the community other members can then comment (annotate) the articles and other comments. Members must register before they can post or annotate. There is a complex and sophisticated system of moderation that makes the system, with its 10,000s of postings a day effectively self maintaining. Briefly the system is based on a set of rules about who can moderate based on length of membership, willingness to contribute and number (and quality) of positive contributions – as assessed by other members including meta-moderators. For more information see http://slashdot.org/faq/com-mod.shtml#cm520 and metamoderation http://slashdot.org/faq/metamod.shtml

Gimp-Savvy.com (http://www.gimp-savvy.com/) is a ‘Community-Indexed Photo Archive’ which provides simple tools for users to add indexing terms to images in an on-line image database (http://gimp-savvy.com/PHOTO-ARCHIVE/) any user can add key word indexing terms to describe the any image, these are then used, along with existing terms, to aid retrieval by other users.

JIME (Journal of Interactive Media in Education):(http://www-jime.open.ac.uk/) is based around the ‘tool for document-centric discussion’ Digital Document Discourse Environment – D3E (http://d3e.sourceforge.net/ - see below). It is an example of an on-line academic journal that augments the text with discussion and comments (annotation) from ‘users’. Users can choose to become ‘members’ which involves completing a form or simply enter their e-mail as an ID. They can then take part in a threaded discussion about aspects of the article.

Berkman Centre, Open Law Annotation Master: (http://eon.law.harvard.edu/cite/annotate.cgi) is an example of a more generic on-line tool (Annotation Engine) for users to annotate any on-line documents by placing a link in the text at the point that the user wishes to annotate – there are some limitations as the text string to be annotated must be unique in the document. Using this system any user can annotate any (web-based) document and the annotations are held on the annotation server. Users can identify themselves if they wish. There appears to be no moderation.

Learning About the Holocaust Through Art: (http://art.holocaust-education.net/) Users can ‘explore’ the image collection, once they have registered (name and e-mail) they are sent a password by e-mail and may choose particular images to store in their personal ‘collection’. They then have the opportunity to attach text notes (annotations) to the images in they have chosen. The collection can be viewed at any time and the notes edited. There seems to be not public access to these notes.

WorldBoard: (http://worldboard.org) This concept, is yet to be realized, but is very interesting. It based on the idea of utilizing the convergence of mobile computing, Global Positioning Systems (GPS) and Internet technologies to allow users to annotate space. The basic principle is simple the user links via the Internet to a worldboard server; the mobile device has a GPS receiver, and so knows where the user is. The user can thus access any information (annotation) associated with that place. This can be as simple as a text restaurant review or a 3D view of the under ground utilities pipes and wires (using special 3D projection glasses). Thus augmenting reality with Internet based annotation. (see Spohrer, 1999 for more detail)

CoolTown: (http://www.cooltown.com/cooltownhome/index.asp) This concept is based around the idea of people and other physical objects having Web accessible URIs (Uniform Resource Identifiers) and ‘beacons’ (Hewlett-Packard Labs 2002). This enables Web enabled any devices to send and receive data e.g. a car fault diagnostic data, a refrigerator the freezer temperature, or a beacon on a advertising poster links to more data about the product or event advertised when this is combined with GPS (e.g. in a similar way to the WordBoard example above) annotations of place also becomes possible (see Semantic spaces by Pradhan, 2002). Many such systems (not using CoolTown technology) are being implemented in museums, providing multimedia annotation/interpretation related to their exhibits.

There are many thousands of other examples of systems that allow users to annotate web-based or other data, with the annotation available via the Web or other Internet protocol. These examples are chosen to illustrate some of the very much wider range of approaches that were identified as part of the ARKive-ERA research work on annotation. They also illustrate a small subset of the uses that annotation can be put to. In these examples the particular approaches are matched to their purpose and underlying aims in providing the facility.

Personal Annotation Tools

The above examples are of Web-based systems of course there are personal tools that provide similar functionality for annotating local data. Examples include:

Word processors: Some word processors (e.g. Microsoft Word) provide support for annotation via ‘comments’ (including audio files) embedded in the text, customizable text styles (including hidden text) and change tracking.

Adobe Acrobat: provides a comments which enables a variety types of annotation to be attached to or embedded in the acrobat document e.g. text, audio file and ‘stamps’ (as in rubber ink stamps) that are used to mark documents with graphics or text.

Browser add-ins: There are many browser add-ins that allow users to capture and/or annotate Web based data, most generally web pages, and store the data locally. e.g. Cogitum Co-Citer (http://www.cogitum.com/co-tracker-text/more.shtml) is a tool that allows users to capture, annotate and organize textual Web-based data, retaining links to the original Web source.

Qualitative Data Analysis tools: (e.g. Atlas ti, Scientific Software http://www.atlasti.de/ and Nud*ist, QSR International, http://www.qsr.com.au/) these tools are designed to assist researcher analyze data by providing tools to annotate the data with comments and ‘codes’ which can then be used to identify and find patterns in the data sets. The more sophisticated tools such as Atlas ti provide tools to annotate text, still images and video objects.

Indexing systems: any piece of software that provides tools for users to applying index (often simple keywords) data to resources for example in the case of digital still images commercial products include those as part of imaging software such as Adobe Photoshop (http://www.adobe.com/photoshop/) or specific image management systems such as Apple’s iPhoto (http://www.apple.com/iphoto/) or Extensis Portfolio (http://www.extensis.com/portfolio/). There are also experimental systems that utilize newer Semantic Web technologies (see below) for example the use of ontologies to assist human indexers (see for example Schreiber 2001) and auto-indexing tools based on image characteristics such as shape or ‘texture’ of objects in the images (see for example, Venters and Cooper 2000).

Genotator - A Workbench for Sequence Annotation and Browsing: (http://www.fruitfly.org/~nomi/genotator/) is a specialist tool that can be used to annotate locally stored genomic data in a similar manner to that of the PseudoCAP project (see above). It is interesting as it provides tools to automatically annotate the gene sequences based analysis comparing it to known external data and provides tools for personal annotation.

Generic Collaborative On-line Annotation Tools

Generic tools to support on-line annotation, in principle provide customizable toolsets which can be used as the basis for many different types of application of annotation and in many contexts. An example of a generic collaborative annotation tool is the Annotea system under development by the W3C (W3c 2002).

Annotea: (http://www.w3.org/2001/Annotea/) is an project is part of the Semantic Web work of the W3C (W3C 2002)it uses an approach similar to that of the Berkman Center Open Law Annotation system (above) in that it enables uses to create annotations (comments, notes, explanations, or other types of external remarks) and stores them on a 3rd party server (annotation server). It is based on open W3C standards such as RDF (Resource Description Framework) and X-Pointer (W3Cb 2002) to help ensure that it is interoperable and as flexible as possible. W3C work is also taking place on Graphical annotation using SVG (Scalar Vector Graphics) providing means of annotation identifiable segments of visual data.

D3E (http://d3e.sourceforge.net/) : D3E provides the underlying architecture and authoring tool set, for the JIME Journal (see above). In principle, this forms the basis for generic text-based annotation systems, which can be customized to meet the needs of a diverse range of projects.

Summary and Conclusions

The potential to add value to digital resources by using various approaches to ‘annotation’ is illustrated by the large number and diverse approaches and implementations of such systems on the Web, with yet greater potential when Networked data is linked to the ‘real’ world via electronic beacons or GPS systems.

This paper proposes a set of characteristics of annotation that we feel forms part of a useful definition of ‘annotation’ in the context of Web-based resources. We also note that the concept of annotation is necessarily ‘fuzzy’ in a manner similar to that of metadata e.g. what is and is not metadata (as opposed to data or meta-metadata) is a matter of perspective and context.

The paper also provides a review of some illustrative examples of Websites that utilize annotation systems, personal information tools that provide annotation functionality and two generic Web-based annotation tools. We hope that these illustrate the wide range of potential applications of annotation and annotation support in both personal and collaborative contexts.

Acknowledgements

The authors would like to thank Brian McBride (Hewlett-Packard Labs), Libby Miller and Dave Becket (Institute of Learning and Research Technologies, University of Bristol) and Andy Dingley (Codesmiths, Bristol) for their help in exploring the ideas on which this paper is based.

The ARKive-ERA project is funded by Hewlett-Packard Labs, Bristol.

References

ARKive-ERA Project (2002) ARKive-ERA Homepage, available on-line: http://www.ilrt.bristol.ac.uk/projects/project?search=ARKive-ERA

Australian Libraries Gateway (ALG). (2002) Australian digitisation projects. Australian Libraries Gateway (ALG), National Library of Australia. Available: http://www.nla.gov.au/libraries/digitisation/projects.html.

Gilliland-Swetland, Anne J. (1998) "Setting the Stage: Defining Metadata" in Introduction to Metadata: Pathways to Digital Information, Murtha Baca, ed. Los Angeles: Getty Information Institute, Available on-line: http://www.getty.edu/research/institute/standards/intrometadata/2_articles/index.html

Judd, Karen. (2001) Copyediting: A Practical Guide, Crisp Publications

Koper, R. (2001) Modeling units of study from a pedagogical perspective the pedagogical meta-model behind EML. Educational Technology Expertise Centre Open University. Available: http://eml.ou.nl/introduction/docs/ped-metamodel.pdf.

National Science Foundation (NFS). (2002) Digital Libraries Initiative Phase 2. National Science Foundation (NSF). Available: http://www.dli2.nsf.gov/

New Opportunities Fund (NOF). (2002) NOF-Digitise. New Opportunities Fund. Available: http://www.nof-digitise.org/

Pradhan, S. (2002) Semantic Location. Internet and Mobile Systems Laboratory, Hewlett-Packard Laboratories. Available: http://www.cooltown.com/dev/wpapers/semantic/semantic.asp.

Schreiber, A. T., Dubbeldam, B., Wielemaker, J. and Wielinga, B. (2001) Ontology-Based Photo Annotation, IEEE Inteligent Systems, May/June 2001: 2-10.

Shabajee, P., Miller, L. and Dingley, A. (2002) In Museums and the Web 2002: Selected Papers from an International Conference (Eds, Bearman, D. and Trant, J.) Archives & Museums Informatics, Boston, USA. Available: http://www.archimuse.com/mw2002/papers/shabajee/shabajee.html

Slashdot (2002) Slashdot Homepage. Available: http://slashdot.org/

Spohrer, J. (1998) What Comes After the WWW? Learning Communities Group, ATG, (c)Apple Computer, Inc. Available: http://www.worldboard.org/pub/spohrer/wbconcept/default.html.

Spohrer, J. (1999) Information in Place, IBM Systems Journal, 38(4): 602-628. Available: http://www.research.ibm.com/journal/sj/384/spohrer.html.

Sumner, Tamara., Buckingham, Simon Shum., Wright, Michael., Bonnardel, Nathalie., and Chevalier, Aline. (2000) Redesigning the Peer Review Process: A Developmental Theory-in-Action. In Proc. COOP'2000: Fourth International Conference on the Design of Cooperative Systems, (Sophia Antipolis, France: 23-26 May, 2000). Available: http://kmi/kmi-abstracts/kmi-tr-96-abstract.html

Venters, C. C. and Cooper, M. (2000) A Review of Content-Based Image Retrieval Systems. 01/07/00. JTAP (Joint Technology Applications Programme), JISC. Available: http://www.jtap.ac.uk/reports/htm/jtap-054.html.

Vredenburg, K. (2001) User-Centered Design: An Integrated Approach, Prentice Hall.

W3C (2002) Annotea Project Homepage, http://www.w3.org/2001/Annotea/

W3C. (2001) Semantic Web. Available: http://www.w3.org/2001/sw/.

W3C. (2002a) Naming and Addressing: URIs, URLs, ..., Available: http://www.w3.org/Addressing/

W3C. (2002b) W3C XML Pointer, XML Base and XML Linking. Available: http://www.w3.org/XML/Linking.

World ORT. (2001) Learning about the Holocaust Through Art Homepage. World ORT. Available: http://art.holocaust-education.net/ .