Download AnnoLab - User Manual
Transcript
Analysis pipelines Table 3.2. Valid mapAs values Value Description segments This contains the stand- yes off anchors no ignore Don't do anything with yes this yes error Trigger an error if this is yes present yes reference Map this as a reference yes no feature Map this as a feature yes yes default Automatically determine yes whether something should be mapped as reference or feature yes dominance In a tree layer this in- yes dicates the relation that makes up the tree no <relation> <feature> 3. Generalised Annotation Markup Generalised Annotation Markup (GAM) is a set of XML tags and attributes that can be used to extend XML formats so they can be used in a multi-layer environment such as AnnoLab. The format has been inspired by existing XML annotation formats such as the CD3 format used by Systemic Coder 1, the MAS-XML format used by GuiTAR 2, the TEI XML 3 format as well as HTML. All use different tag-sets and encode different semantics by the XML document structure. However, they also have some similarities: • they are used to annotate text; • the complete text exists in the XML document; • the text is not contained in attributes but in text nodes; • iterating over all text nodes from the beginning to the end of the XML document yields the full text in the correct order. Any XML format conforming to these four points may be called document-centric XML as the text being marked up by the XML tags provides the dominant structure. Document-centric XML formats can easily be converted for use within AnnoLab by adding stand-off anchors allowing the XML annotations and the underlying text to exist independently of each other. The idea is to leave the original XML format of as much as possible untouched and the conversion process from or to AnnoLab as simple as possible. During the conversion process two changes are applied: • the text nodes are replaced by gam:seg tags representing segments that anchor an annotation layer to a signal; • they are wrapped in a gam:layer tag that carries attributes such as gam:id and name that are necessary to address and handle of a layer within the framework. All GAM tags traditionally reside in the XML namespace darmstadt.de/PACE/GAM and use the namespace prefix gam. http://www.linglit.tu- GAM is used in the context of AnnoLab data stores and when exporting data from AnnoLab. Depending on the context, different elements are defined. 1 http://www.wagsoft.com/Coder/ http://cswww.essex.ac.uk/Research/nle/GuiTAR/ 3 http://www.tei-c.org 2 11