No category

Download NiuTrans Open Source Statistical Machine Translation System

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

Transcript

26
Command
$ cd NiuTrans/bin/
$ mkdir ../work/lex/ -p
$ ./NiuTrans.PhraseExtractor --LEX \
-src ../sample-data/sample-submission-version/TM-training-set/chinese.txt \
-tgt ../sample-data/sample-submission-version/TM-training-set/english.txt \
-aln ../sample-data/sample-submission-version/TM-training-set/Alignment.txt \
-out ../work/lex/lex
where
--LEX, which indicates that the program (NiuTrans.PhraseExtractor) works for extracting lexical translations.
-src, which specifies the source sentences of bilingual training corpus.
-tgt, which specifies the target sentences of bilingual training corpus.
-aln, which specifies word alignments between the source and target sentences.
-out, which specifies the prefix of output files (i.e., lexical translation files)
Also, there are some optional parameters, as follows:
-temp, which specifies the directory for sorting temporary files generated during the processing.
-stem, which specifies whether stemming is used. e.g., if -stem is specified, all the words are stemmed.
Output: two files ”lex.s2d.sorted” and ”lex.d2s.sorted” are generated in ”/NiuTrans/work/lex/”.
Output (/NiuTrans/work/lex/)
- lex.s2d.sorted
- lex.d2s.sorted
3.2.3
B "source → target" lexical translation file
B "target → source" lexical translation file
Generating Phrase Translation Table
The next step is the generation of phrase translation table which will then be used in the decoding step.
Basically the phrase table is a collections of phrase-pairs with associated scores (or features). In NiuTrans,
all the phrase-pairs are sorted in alphabetical order, which makes the system can efficiently loads/organizes
the phrase table in a internal data structure. Each entry of the phrase table is made up several fields. To
illustrate their meaning, Figure 3.7 shows a sample table.
In this example, each line is separated into five fields using ” ||| ”. The meaning of them are:
• The first field is the source side of phrase-pair.

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Top types

Top brands

Download NiuTrans Open Source Statistical Machine Translation System