Download A framework for processing and presenting parallel text corpora
Transcript
156 Appendix A · Constants Predefined character blocks in Unicode 3.0 LowSurrogates CJKCompatibilityIdeographs ArabicPresentationForms-A CJKCompatibilityForms ArabicPresentationForms-B HalfwidthandFullwidthForms PrivateUse AlphabeticPresentationForms CombiningHalfMarks SmallFormVariants Specials Specials Table A.1: LanguageExplorer supports the character block names defined in Unicode 3.0 when constructing certain regular expressions (see section 5.4.6 on page 130). Notice that these names omit the space characters which are used in the Unicode standard as word separators (e.g. “BasicLatin” is defined as “Basic Latin”). The character categories defined Unicode 3.0 Category Explanation L Lu Ll Lt Lm Lo Letter. Uppercase letter. Lowercase letter. Title case letter. Modifier letter. Any other letter. N Nd Nl No Number. Decimal digit. Letter number. Any other number. S Sm Sc Sk So A symbol. A mathematical symbol. A currency symbol. A modifier symbol. Any other symbol. Characters Numbers Symbols P Pc Pd Ps Pe Pi Pf Po Z Zs Zl Zp M Mn Mc Me C Punctuation marks A punctuation mark. A connector. A dash. An opening punctuation mark. A closing punctuation mark. An initial quote. A final quote. Any other punctuation mark. Separators A separator. A space separator. A line separator. A paragraph separator. Combining marks A combining mark. A nonspacing mark. A spacing combining mark. An enclosing mark. Other characters Any other characters. ..to be continued on the next page ➥ Dissertation der Fak. f. Informations- u. Kognitionswissenschaften, Univ. Tübingen - 2004