TreeTagger

The TreeTagger is a language independent probabilistic part of speech tagger. It has been applied to the following languages for which modells exist:

German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Portuguese, Galician, Chinese, Swahili, Slovak, Latin, Estonian and Old French

TreeTagger for Windows

Download the Windows version of the TreeTagger from:

http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tree-tagger-windows-3.2.zip

Unpack the zip-archive to the directory:

C:\TreeTagger

Next, download the parameter files for the languages you want to tag from the list of model files on the TreeTagger homepage. Make sure you select the appropriate files for your operating system; for Windows these are listed under:

Parameter files for PC (Linux, Windows, and Mac-Intel)

Unpack these to the TreeTagger sub-directory

C:\TreeTagger\lib

You are now set to tag.

Navigate to the directory C:\TreeTagger\bin

To tag an English text, type in tag-english.bat plus the file name of a file you want to tag and watch the result flicker on your screen.

TreeTagger for MacOS X and UNIX-like OSes

Download tagger package and tagging scripts plus the installation script from the TreeTagger page. Put them into a directory /TreeTagger

Do not unpack any of the files. Navigate to this directory and execute the installation script by typing

bash install-tagger.sh

To save you from having to always type the entire TreeTagger path, add the TreeTagger sub-directories

TaggerDirectory/bin

and

TreeTaggerDirectory/cmd

to the PATH variable of the operating system. To do this type:

$ PATH=$PATH:/PathToTreeTaggerDirecory/bin:/PathToTreeTaggerDirecory/cmd $ export PATH To check that this has actually happened type:

echo $PATH

Tagging with the TreeTagger

In order to tag with the TreeTagger, the call looks basically like this:

tree-tagger -options- <parameter file> <input file> <output file>

To test whether the TreeTagger actually tags your text, type:

tree-tagger -token German.par input.txt output.txt