Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
linguisticsweb:tutorials:linguistics_tutorials:automaticannotation:stanford_pos_tagger_python [2019/03/07 18:19]
sabinebartsch [Running the local Stanford PoS Tagger on a directory of files]
linguisticsweb:tutorials:linguistics_tutorials:automaticannotation:stanford_pos_tagger_python [2019/03/07 18:32]
sabinebartsch
Line 21: Line 21:
 from nltk import StanfordTagger from nltk import StanfordTagger
  
-text_tok = nltk.word_tokenize("​Just a small snippet of text to test the tagger.")+text_tok = nltk.word_tokenize("​Just a small snippet of text."​)
  
 # print(text_tok) # print(text_tok)
Line 70: Line 70:
 In the code itself, you have to point Python to the location of your Java installation:​ In the code itself, you have to point Python to the location of your Java installation:​
  
-''​+<sxh bash; gutter: false>
 java_path = "​C:/​Program Files/​Java/​jdk1.8.0_192/​bin/​java.exe" ​ java_path = "​C:/​Program Files/​Java/​jdk1.8.0_192/​bin/​java.exe" ​
-''​ 
- 
-''​ 
 os.environ["​JAVAHOME"​] = java_path os.environ["​JAVAHOME"​] = java_path
-''​+</​sxh>​
  
 You also have to explicitly state the paths to the Stanford PoS Tagger .jar file and the Stanford PoS Tagger model to be used for tagging: You also have to explicitly state the paths to the Stanford PoS Tagger .jar file and the Stanford PoS Tagger model to be used for tagging:
  
-''​+<sxh bash; gutter: false>
 jar = "​C:/​Users/​Public/​utility/​stanford-postagger-full-2018-10-16/​stanford-postagger.jar"​ jar = "​C:/​Users/​Public/​utility/​stanford-postagger-full-2018-10-16/​stanford-postagger.jar"​
-''​ 
- 
-''​ 
 model = "​C:/​Users/​Public/​utility/​stanford-postagger-full-2018-10-16/​models/​english-bidirectional-distsim.tagger"​ model = "​C:/​Users/​Public/​utility/​stanford-postagger-full-2018-10-16/​models/​english-bidirectional-distsim.tagger"​
-''​+</​sxh>​
  
 Note that these paths vary according to your system configuration. You will need to check your own file system for the exact locations of these files, although Java is likely to be installed somewhere in ''​C:​\Program Files\''​ or ''​C:​\Program Files (x86)''​ in a Windows system. Note that these paths vary according to your system configuration. You will need to check your own file system for the exact locations of these files, although Java is likely to be installed somewhere in ''​C:​\Program Files\''​ or ''​C:​\Program Files (x86)''​ in a Windows system.
Line 182: Line 176:
 As we will be writing output of the two subprocesses of tokenization and tagging to files in your file system, you have to create these output directories in your file system and again write down or copy the locations for further use. In this example these directories are called: As we will be writing output of the two subprocesses of tokenization and tagging to files in your file system, you have to create these output directories in your file system and again write down or copy the locations for further use. In this example these directories are called:
  
-''​data_path''​ +<sxh bash; gutter:​false>​ 
- +data_path 
-''​tokenized_data_path''​ +tokenized_data_path 
- +tagged_data_path 
-''​tagged_data_path''​+</​sxh>​
  
 Once you have installed the Stanford PoS Tagger, collected and adjusted all of this information in the file below and created the respective directories,​ you are set to run the following Python program: Once you have installed the Stanford PoS Tagger, collected and adjusted all of this information in the file below and created the respective directories,​ you are set to run the following Python program: