Download A framework for processing and presenting parallel text corpora

Transcript
2.4
·
XTE - A new standoff markup scheme
had to be done in the DTD case (compare with listing 2.15). Instead this extensibility feature
is provided by the XML Schema language. On the other hand, the XML Schema language
also allows the creator of an encoding to use the final attribute on a type to specify which
element types should not be further refined by derivation.
Finally, the customized XTE XML Schema created in listing 2.24 could be used to validate a document instance by including the attributes shown in the following listing into the
root element of the document:
Listing 2.25: An example XML file which uses the XML Schema defined in listing 2.24
<?xml version="1.0" encoding="UTF-8"?>
<XTE xmlns="http://www.language-explorer.org/XTE"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="resources/div1pages.xsd">
...
</XTE>
The XTE XML Schema realized with derivation
Besides the possibility of realizing the XTE Schema extensibility with substitution groups,
it is also possible to achieve the same results by using the XML Schema derivation mechanism. This mechanism has been used already in the last section to make elements defined
in a partial encoding customizable by other users. In the case of the base XTE XML Schema,
derivation will be applied to the body element. The type of the body element has to be defined
as follows:
Listing 2.26: The definition of the body type for the XTE Schema realized with derivation
<xsd:complexType name="body">
<xsd:attribute name="encodingName" use="required"/>
<xsd:attribute name="type" use="required">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="default"/>
<xsd:enumeration value="auxiliary"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute name="view" use="required"/>
</xsd:complexType>
The only change with respect to the old definition of the body type (see listing 2.20) is the fact
that body now contains no other elements. By default there are just a few attributes defined
for this element. However in document instances, the plain body element type will be not
used. Elements which have a type derived from body will be used instead. The sentence- and
page-wise encoding already presented in listing 2.23, would have to be defined as follows
to work with the new schema:
Listing 2.27: Definition of the page-wise encoding for the XTE Schema realized with derivation
...
<!-- derive a new body type from the abstract ’body’ type in XTE.xsd which
uses the ’pages’ encoding schema -->
Dissertation der Fak. f. Informations- u. Kognitionswissenschaften, Univ. Tübingen - 2004
37