Automatic annotation

Many annotation tasks in natural language processing are automatic annotation processes. These automatic processes basically fall into the following main categories with list and pattern matching being closely related:

List matching is for example employed in annotation tasks resting on the automatic identification and annotation of a known set of items.

Pattern matching can be employed to identify sets of known items. An example task is the identification and annotation of all lexical items ending in a particular suffix or containing a particular other morpheme.

Rule-based annotation is employed in natural language processing tasks such as part of speech tagging. It allows the fine-tuning of rules for annotation processes, but is language specific. A well-known example of a rule-based part of speech tagger is the Brill Tagger by Eric Brill.

Statistical or probabilistic annotation entails automatic annotation of linguistic data on the basis of models trained on annotated data. Many of the widely-used part of speech taggers today are probabilistically based same as syntactic parsers. Examples are the Stanford POS Tagger and the Stanford Parser developed by the Stanford NLP Group.

Automatic annotation and tools

Stanford NLP Tools

Driving Stanford NLP tools from Python

Stanford Part of Speech Tagger: tagging from Python
Stanford NER: tagging from Python
Stanford CoreNLP Tools: running the annotation process from Python

Site Tools

Table of Contents

Automatic annotation

Automatic annotation and tools

Stanford NLP Tools

Driving Stanford NLP tools from Python