Download Tilde`s wrapper system for CollTerm

Transcript
Contract no. 248347
3.1.4
Installation
The TildeNER system does not require installation. Simply copy the whole “TildeNER”
directory to a directory from where you would like to run the named entity recognizer and
execute the Perl workflow scripts whenever it is necessary using a Perl interpreter (for
example, Strawberry Perl on Windows) from the command line (Command Prompt or
PowerShell on Windows or any programming language that supports shell executions).
The user will have to create a property file in order to execute the training or tagging scripts.
Sample property files are located within the “Sample_Data” subdirectory of the “TildeNER”
directory. The user will also have to alter the “gazette” parameter in the property files so that
the system is able to find the gazetteer data files in the user’s system (the sample addresses
are relative and will work only if the system’s working directory will be the TildeNER root
directory). The gazetteer files used for training of the Latvian and Lithuanian NER models
are located in the “LV_Gazetteer” and “LT_Gazetteer” subdirectories of the “Sample_Data”
directory. Please use the “/” directory separation character in gazetteer file addresses also on
Windows. The Stanford NER classifier does not process the Windows directory separation
character “\” correctly; therefore, the Linux variant should be used instead.
The Latvian and Lithuanian NER models and gazetteer data is located within the
“Sample_Data” directory. More details on the provided sample data can be found in section
3.1.5.6.
Dependency installation on a Linux OS:


For installation of Perl refer to http://www.perl.org/get.html.
For installation of Java refer to http://openjdk.java.net/install/.
Dependency installation on Windows OS:


For installation of Perl refer to http://strawberryperl.com/.
For installation of Java refer to
http://www.java.com/en/download/help/windows_manual_download.xml.
For installation of .NET Framework 4.0 Redistributable refer to
http://www.microsoft.com/download/en/details.aspx?id=17718
3.1.5
Execution instructions
The TildeNER system consists of multiple workflows (external execution scripts), which
create the general use case scenarios of the TildeNER system. Each of the workflows makes
use of internal execution scripts (see 3.1.5.3), which are developed to offer partial workflow
functionality and modules (see 3.1.5.4), which contain utility and functionality methods used
by the scripts. All Perl modules and scripts have well documented code; therefore, if any
additional questions arise, the user should refer to the comments within the code.
The system has two general use cases - the bootstrapping of a new NER model (see Figure 6)
and tagging of a plaintext document (see Figure 7).
D2.6 V3.0
Page 77 of 164