Download AnnoLab - User Manual

Transcript
Analysis pipelines
Table 3.2. Valid mapAs values
Value
Description
segments
This contains the stand- yes
off anchors
no
ignore
Don't do anything with yes
this
yes
error
Trigger an error if this is yes
present
yes
reference
Map this as a reference
yes
no
feature
Map this as a feature
yes
yes
default
Automatically determine yes
whether something should
be mapped as reference or
feature
yes
dominance
In a tree layer this in- yes
dicates the relation that
makes up the tree
no
<relation>
<feature>
3. Generalised Annotation Markup
Generalised Annotation Markup (GAM) is a set of XML tags and attributes that can be used to extend XML
formats so they can be used in a multi-layer environment such as AnnoLab. The format has been inspired by
existing XML annotation formats such as the CD3 format used by Systemic Coder 1, the MAS-XML format
used by GuiTAR 2, the TEI XML 3 format as well as HTML. All use different tag-sets and encode different
semantics by the XML document structure. However, they also have some similarities:
• they are used to annotate text;
• the complete text exists in the XML document;
• the text is not contained in attributes but in text nodes;
• iterating over all text nodes from the beginning to the end of the XML document yields the full text in
the correct order.
Any XML format conforming to these four points may be called document-centric XML as the text being
marked up by the XML tags provides the dominant structure. Document-centric XML formats can easily
be converted for use within AnnoLab by adding stand-off anchors allowing the XML annotations and the
underlying text to exist independently of each other. The idea is to leave the original XML format of as much
as possible untouched and the conversion process from or to AnnoLab as simple as possible. During the
conversion process two changes are applied:
• the text nodes are replaced by gam:seg tags representing segments that anchor an annotation layer to a
signal;
• they are wrapped in a gam:layer tag that carries attributes such as gam:id and name that are necessary
to address and handle of a layer within the framework.
All GAM tags traditionally reside in the XML namespace
darmstadt.de/PACE/GAM and use the namespace prefix gam.
http://www.linglit.tu-
GAM is used in the context of AnnoLab data stores and when exporting data from AnnoLab. Depending on
the context, different elements are defined.
1
http://www.wagsoft.com/Coder/
http://cswww.essex.ac.uk/Research/nle/GuiTAR/
3
http://www.tei-c.org
2
11