[tutorial status: work in progress: extension - 04.2022]
This small example illustrates how the Stanford Named Entity Recognizer (NER) can be driven from Python 3:
# Stanford NER 3.9.2 stand-alone version # classifier: english.muc.7class.distsim.crf.ser.gz import nltk from nltk import * import os from nltk.tokenize import word_tokenize from nltk.tag.stanford import StanfordNERTagger java_path = "C:/Program Files/Java/jdk1.8.0_192/bin/java.exe" os.environ['JAVAHOME'] = java_path model = "C:/Users/Public/utility/stanford-ner-2018-10-16/classifiers/english.muc.7class.distsim.crf.ser.gz" jar = "C:/Users/Public/utility/stanford-ner-2018-10-16/stanford-ner-3.9.2.jar" ner_tagger = StanfordNERTagger(model, jar, encoding = "utf-8") text = open("C:/Users/Public/projects/python101-2018/data/sample-text.txt").read() words = word_tokenize(text) classified_words = ner_tagger.tag(words) print(classified_words) for x, y in classified_words: print(x + "_" + y)
Note that the last two lines of code (line 24-25) illustrate a way of converting the original list of tuples (classified_words
) to a vertical list of tokens with NER labels by means of a for-loop
.