Download Tilde`s wrapper system for CollTerm

Transcript
Contract no. 248347
Russian (and maybe some other languages), the length limit is around 5000 bytes. This is
because in Greek and Russian a character takes 2 bytes while in Chinese a character takes 3
bytes. This conversion is the same under the UTF-8 encoding.
Therefore, in our toolkit we set the length limit of a translation request at 5000 characters for
Google translation API and 5000 bytes for Microsoft translation API.
Also, as we prepare the text by line, we can send each line as text string for translation.
However, for a large collection of documents, this will require too many calls and can
quickly reach the translation access limit (as both Google and Microsoft APIs will report
errors or exceptions if there are too many translation requests within a relatively short time
from the same IP address). Therefore, in order to reduce the number of translation requests, it
is better to send longer strings for translation (i.e. around 5000 characters). This is because, as
long as input string length does not exceed the length limit, the translation access limit of
both Google and Microsoft APIs are based on the number of translation requests, but not the
length of overall input string length.
The toolkit supports two different manners of translation. For each translation call, you can
send either a text string, or a string array for translation. Technically, the calls are:
Manner 1:
String result =
Translate.execute(String text, SourceLanguage, TargetLanguage)
Manner 2:
String[] result =
Translate.execute(String[] text, SourceLanguage, TargetLanguage)
Command line usage:
java
-jar
Translation.jar
SourcePath TargetPath
option=1|2
SourceLanguage
TargetLanguage
Translation.jar: GoogleTranslate.jar or BingTranslate.jar
Options:
option=1: merge several lines into a long string for translation (Manner 1);
option=2: store lines as a string array for translation (Manner 2);
SourceLanguage: one of the languages supported by the specific engine (see below);
TargetLanguage: one of the languages supported by the specific engine (see below);
SourcePath: the path to the text collection to be translated (absolute path to the input
data directory);
TargetPath: the destination directory where to store translations (directory path – the
directory has to exist, otherwise the API will return an exception).
D2.6 V3.0
Page 151 of 164