Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
linguisticsweb:tutorials:linguistics_tutorials:automaticannotation:stanford_pos_tagger [2019/01/26 22:43]
sabinebartsch [The Stanford POS Tagger]
linguisticsweb:tutorials:linguistics_tutorials:automaticannotation:stanford_pos_tagger [2019/05/15 11:18]
sabinebartsch [2 Installation and requirements]
Line 1: Line 1:
 ====== The Stanford POS Tagger ====== ====== The Stanford POS Tagger ======
  
-==== author: Sabine Bartsch, ​TU Darmstadt ====+==== author: Sabine Bartsch, ​Technische Universität ​Darmstadt ====
  
-Tutorial ​based on input from the Stanford ​NLP website.+Tutorial ​builds ​on software and input from the [[https://​nlp.stanford.edu/​software/​tagger.html|Stanford ​PoS Tagger ​website]].
  
 Related tutorial: [[linguisticsweb:​tutorials:​linguistics_tutorials:​automaticannotation:​stanford_pos_tagger_python|Stanford PoS Tagger: tagging from Python]] Related tutorial: [[linguisticsweb:​tutorials:​linguistics_tutorials:​automaticannotation:​stanford_pos_tagger_python|Stanford PoS Tagger: tagging from Python]]
Line 17: Line 17:
 =====  2 Installation and requirements ====== =====  2 Installation and requirements ======
  
-Requirements:​ The Stanford PoS Tagger requires Java. As many programmes in corpus and computational linguistics require Java and as Java is used widely in this field, it is advisable to install the full Java JDK (Java Development Kit) which includes also the JRE (Java Runtime Environment). Please consult the following page to download software that is a system prerequisite for many corpus and computational linguistic applications:​ [[https://​www.oracle.com/​technetwork/​java/​javase/​downloads/​index.html|Oracle Java]].+Requirements:​ The Stanford PoS Tagger requires Java. As many programmes in corpus and computational linguistics require Java and as Java is used widely in this field, it is advisable to install the full Java JDK (Java Development Kit) which includes also the JRE (Java Runtime Environment). Please consult the following page to download software that is a system prerequisite for many corpus and computational linguistic applications:​ [[linguisticsweb:​tutorials:​linguistics_tutorials:​basics:​environment:java|Open JDK]].
  
 The Stanford PoS Tagger does not require much of an installation. The following steps get you started in no time at all. The Stanford PoS Tagger does not require much of an installation. The following steps get you started in no time at all.
Line 45: Line 45:
 | **-model** ​ | different taggers are available, but at one has to be specified: e.g. edu.stanford.nlp.tagger.maxent.MaxentTagger | | **-model** ​ | different taggers are available, but at one has to be specified: e.g. edu.stanford.nlp.tagger.maxent.MaxentTagger |
 | **-textFile** ​ | for plain text input files  | | **-textFile** ​ | for plain text input files  |
-| -xmlInput ​ | Example value: <​body>;​ The value specified here determines the element of an xml file the contents of which is being tagged. ​ | +**-xmlInput**  | Example value: <​body>;​ The value specified here determines the element of an xml file the contents of which is being tagged. ​ | 
-| **-outputFormat** ​ | xml, tsv, slashTags, -tagSeparator \# |+| **-outputFormat** ​ | xml, tsv, slashTags, -tagSeparator \#|
  
  
Line 123: Line 123:
 Please note that for different languages the tagger uses different tag-sets as there is no universal tag-set that fits all linguistic phenomena in all languages. Make sure you find out what tag-set is being used in a model for a specific language and what the tags mean.  Please note that for different languages the tagger uses different tag-sets as there is no universal tag-set that fits all linguistic phenomena in all languages. Make sure you find out what tag-set is being used in a model for a specific language and what the tags mean. 
  
-  * English: ​the Penn Treebank site. There is a simple listings on the [[http://www.comp.leeds.ac.uk/amalgam/tagsets/upenn.html|AMALGAM project page]] +  * English: [[https://www.ling.upenn.edu/courses/Fall_2003/ling001/​penn_treebank_pos.html|Penn Tree Bank tag set]] 
-  * Chinese: [[http://www.cis.upenn.edu/~chinese/|the Penn Chinese Treebank]] +  * Chinese: [[https://verbs.colorado.edu/​chinese/​posguide.3rd.ch.pdf|Penn Chinese Treebank]] 
-  * German: [[http://​www.ims.uni-stuttgart.de/forschung/​ressourcen/​lexika/​TagSets/stts-table.html|Stuttgart-Tübingen Tag Set (STTS)]]+  * German: [[http://​www.sfs.uni-tuebingen.de/resources/stts-1999.pdf|Stuttgart-Tübingen Tag Set (STTS)]]
   * French: [[http://​www.llf.cnrs.fr/​Gens/​Abeille/​French-Treebank-fr.php|the French Treebank]] ​   * French: [[http://​www.llf.cnrs.fr/​Gens/​Abeille/​French-Treebank-fr.php|the French Treebank]] ​