Download FLCL - User Manual

Transcript
FLCL - User Manual
232 / 764
00000000 to 001FFFFF hex
If a code point is not defined in the substitution table (USRTAB and/or SYSTAB), the appropriate substitution character (SUBCHR) is used. If SUBCHR is not set, no substitution is performed. Depending on the MODE, character conversion stops with
an error (STOP) or ignores the character (IGNORE).
Text before and after brackets are comments.
Between "(" and "=" or "/" or ")" you can define hex digits until the first non-hex digit. All non-hex digits up to the next separator
are interpreted as comment. Leading whitespace is ignored.
REPLACE GERMAN SZ (00DF=0073/0073)
WITH ss
THIS IS AN EXAMPLE FOR COMMENTS IN CODE POINT LIST:
REPLACE EURO MARK (20AC= 45#E / 55#U / 52#R / 4F#O) WITH EURO
REPLACE BOM MARK (EFFF=)
WITH NOTHING
Leading zeros are allowed. If you don’t supply a hex value, then 0x00 is used.
REPLACE GERMAN SZ (00DF=/)
WITH 0x00 0x00
REPLACE EURO MARK (20AC=00000045/00055/000052/000004F) WITH EURO
ATTENTION: Please don’t use parentheses "()" or the operators in your comments.
To describe your own subsets, you can use a user table without a system table. The USRTAB can also be used to overwrite or add
transliterations when a system substitution table (for example SYSTAB=ICONV) is used. The transliteration works recursively,
that is if one of the substitution code points is not in the target set, this substitution will be used instead and so on.
REPLACE GERMAN OE (0000D6=004F/0045)
WITH O E
REPLACE GERMAN SZ (0000DF=0073/0073)
WITH s s
REPLACE EURO MARK (0020AC=00D6/00DF/52/4F) WITH OE SZ R O
If you have a EURO sign in your text and convert it to Latin1, the resulting byte string will be D6DF524F. On the other hand, if
you convert it to ASCII, the byte string will be 4F457373524F.
The mapping as described above is always done and is also done recursive including the transliteration result. For example, if
you define a mapping (*30=39) then the target data won’t have any zero in the resulting text.
By replacing other code points recursively, you could easily cause infinite loops. To prevent this, the amount of replacements
and the length of one replacement is limited to a maximum of 64. Note that this could still result in a large expansion of data.
Therefore, be careful about defining recursive substitutions.
A sample user table for the ICONV system table to change the transliteration of German umlauts to AE, OE or UE is located in
the SAMPLE directory under CCUTDEXL. Another sample user table called CCUTNPAS defines the string.latin subset (XOEF)
which is mainly used for statutory reporting.
The order of definition is the order of processing the definitions. For exampel, if you define a mapping or a transliteration for
code point X and later you make this code point invalid, then the code point is invalid. On the other hand, you can first deactivate
all code points, then activate your subset and define your transliterations (best fit) or mappings.
For example: To define a non-expansion best fit mapping a user table can be used to delete all combined characters for single
byte code pages. For the ICONV system transliteration table, a sample user table can be found under the name CCUTBFM1.
Some other subset user table definitions were added. For example:
• currently valid SEPA characters (<128) for money transfers (CCUTSEPA)
• characters valid in ISO-8859-15, CP1252 and IBM1142 (CCUTDELA)
• characters valid in ISO-8859-15, CP1252, IBM1142 and XOEF (CCUTDLAX)