Download Tilde`s wrapper system for CollTerm
Transcript
Contract no. 248347 3.1.4 Installation The TildeNER system does not require installation. Simply copy the whole “TildeNER” directory to a directory from where you would like to run the named entity recognizer and execute the Perl workflow scripts whenever it is necessary using a Perl interpreter (for example, Strawberry Perl on Windows) from the command line (Command Prompt or PowerShell on Windows or any programming language that supports shell executions). The user will have to create a property file in order to execute the training or tagging scripts. Sample property files are located within the “Sample_Data” subdirectory of the “TildeNER” directory. The user will also have to alter the “gazette” parameter in the property files so that the system is able to find the gazetteer data files in the user’s system (the sample addresses are relative and will work only if the system’s working directory will be the TildeNER root directory). The gazetteer files used for training of the Latvian and Lithuanian NER models are located in the “LV_Gazetteer” and “LT_Gazetteer” subdirectories of the “Sample_Data” directory. Please use the “/” directory separation character in gazetteer file addresses also on Windows. The Stanford NER classifier does not process the Windows directory separation character “\” correctly; therefore, the Linux variant should be used instead. The Latvian and Lithuanian NER models and gazetteer data is located within the “Sample_Data” directory. More details on the provided sample data can be found in section 3.1.5.6. Dependency installation on a Linux OS: For installation of Perl refer to http://www.perl.org/get.html. For installation of Java refer to http://openjdk.java.net/install/. Dependency installation on Windows OS: For installation of Perl refer to http://strawberryperl.com/. For installation of Java refer to http://www.java.com/en/download/help/windows_manual_download.xml. For installation of .NET Framework 4.0 Redistributable refer to http://www.microsoft.com/download/en/details.aspx?id=17718 3.1.5 Execution instructions The TildeNER system consists of multiple workflows (external execution scripts), which create the general use case scenarios of the TildeNER system. Each of the workflows makes use of internal execution scripts (see 3.1.5.3), which are developed to offer partial workflow functionality and modules (see 3.1.5.4), which contain utility and functionality methods used by the scripts. All Perl modules and scripts have well documented code; therefore, if any additional questions arise, the user should refer to the comments within the code. The system has two general use cases - the bootstrapping of a new NER model (see Figure 6) and tagging of a plaintext document (see Figure 7). D2.6 V3.0 Page 77 of 164