Download Tilde`s wrapper system for CollTerm
Transcript
Contract no. 248347 Russian (and maybe some other languages), the length limit is around 5000 bytes. This is because in Greek and Russian a character takes 2 bytes while in Chinese a character takes 3 bytes. This conversion is the same under the UTF-8 encoding. Therefore, in our toolkit we set the length limit of a translation request at 5000 characters for Google translation API and 5000 bytes for Microsoft translation API. Also, as we prepare the text by line, we can send each line as text string for translation. However, for a large collection of documents, this will require too many calls and can quickly reach the translation access limit (as both Google and Microsoft APIs will report errors or exceptions if there are too many translation requests within a relatively short time from the same IP address). Therefore, in order to reduce the number of translation requests, it is better to send longer strings for translation (i.e. around 5000 characters). This is because, as long as input string length does not exceed the length limit, the translation access limit of both Google and Microsoft APIs are based on the number of translation requests, but not the length of overall input string length. The toolkit supports two different manners of translation. For each translation call, you can send either a text string, or a string array for translation. Technically, the calls are: Manner 1: String result = Translate.execute(String text, SourceLanguage, TargetLanguage) Manner 2: String[] result = Translate.execute(String[] text, SourceLanguage, TargetLanguage) Command line usage: java -jar Translation.jar SourcePath TargetPath option=1|2 SourceLanguage TargetLanguage Translation.jar: GoogleTranslate.jar or BingTranslate.jar Options: option=1: merge several lines into a long string for translation (Manner 1); option=2: store lines as a string array for translation (Manner 2); SourceLanguage: one of the languages supported by the specific engine (see below); TargetLanguage: one of the languages supported by the specific engine (see below); SourcePath: the path to the text collection to be translated (absolute path to the input data directory); TargetPath: the destination directory where to store translations (directory path – the directory has to exist, otherwise the API will return an exception). D2.6 V3.0 Page 151 of 164