resources: archives, corpora, dictionaries, lexica

This page offers access to corpus resources to students and members of the university. Registration with the course tutors in corpus and computational linguistics is required in order to gain access to this resource as many corpora are under specific copyright constraints. There are two web-interfaces available for corpus access in my own courses (internal access only).

Corpora, databases, tag sets and other language resources

Miscellaneous annotation and analysis tools

nametypelanguage(s)source
Stanford Log-linear Part-Of-Speech Taggerpos tagger (stand-alone version with integrated tokenizer)Arabic, Chinese, English, French, German, SpanishStanford NLP Group
Stanford Parserparser: phrase structure, dependency (stand-alone version)English, German …Stanford NLP Group
Standford CoreNLP Toolstokenizer, pos, PS parser, dependency parser, NER, entity linking and moreArabic, Chinese, English, French, German, SpanishStanford NLP Group
TreeTaggerpos, lemmatzation, chunkingBulgarian, Catalan, Chinese, Coptic, Czech, Danish, Dutch, English, Estonian, Finnish, German, Greek, Hungarian, Italian, Korean, Latin, Middle High German, Mongolian, Norwegian, Persian, Portugese, Romanian, Russian, Slovakian, Slovenian, Spanish, Swahili, SwedishHelmut Schmid, LMU
GATE (General Architecture for Text Engineering)annotation and analysis pipelineEnglish, German, variousUniversity of Sheffield
Språkbanken's text analysis toolannotation and analysisSwedishSpråkbanken Text, University of Gothenburg

Corpus databases and web-interfaces