This page offers access to corpus resources to students and members of the university. Registration with the course tutors in corpus and computational linguistics is required in order to gain access to this resource as many corpora are under specific copyright constraints. There are two web-interfaces available for corpus access in my own courses (internal access only).
name | type | language(s) | source |
---|---|---|---|
Stanford Log-linear Part-Of-Speech Tagger | pos tagger (stand-alone version with integrated tokenizer) | Arabic, Chinese, English, French, German, Spanish | Stanford NLP Group |
Stanford Parser | parser: phrase structure, dependency (stand-alone version) | English, German … | Stanford NLP Group |
Standford CoreNLP Tools | tokenizer, pos, PS parser, dependency parser, NER, entity linking and more | Arabic, Chinese, English, French, German, Spanish | Stanford NLP Group |
TreeTagger | pos, lemmatzation, chunking | Bulgarian, Catalan, Chinese, Coptic, Czech, Danish, Dutch, English, Estonian, Finnish, German, Greek, Hungarian, Italian, Korean, Latin, Middle High German, Mongolian, Norwegian, Persian, Portugese, Romanian, Russian, Slovakian, Slovenian, Spanish, Swahili, Swedish | Helmut Schmid, LMU |
GATE (General Architecture for Text Engineering) | annotation and analysis pipeline | English, German, various | University of Sheffield |
Språkbanken's text analysis tool | annotation and analysis | Swedish | Språkbanken Text, University of Gothenburg |