Stanford NER from Python

author: Sabine Bartsch, e-mail: mail@linguisticsweb.org

[tutorial status: work in progress: extension - 04.2022]

This small example illustrates how the Stanford Named Entity Recognizer (NER) can be driven from Python 3:

# Stanford NER 3.9.2 stand-alone version
# classifier: english.muc.7class.distsim.crf.ser.gz

import nltk
from nltk import *
import os
from nltk.tokenize import word_tokenize
from nltk.tag.stanford import StanfordNERTagger

java_path = "C:/Program Files/Java/jdk1.8.0_192/bin/java.exe"
os.environ['JAVAHOME'] = java_path

model = "C:/Users/Public/utility/stanford-ner-2018-10-16/classifiers/english.muc.7class.distsim.crf.ser.gz"
jar = "C:/Users/Public/utility/stanford-ner-2018-10-16/stanford-ner-3.9.2.jar"

ner_tagger = StanfordNERTagger(model, jar, encoding = "utf-8")

text = open("C:/Users/Public/projects/python101-2018/data/sample-text.txt").read()

words = word_tokenize(text)
classified_words = ner_tagger.tag(words)
print(classified_words)

for x, y in classified_words:
    print(x + "_" + y)

Note that the last two lines of code (line 24-25) illustrate a way of converting the original list of tuples (classified_words) to a vertical list of tokens with NER labels by means of a for-loop.