resources: archives, corpora, dictionaries, lexica

This page offers access to corpus resources to students and members of the university. Registration with the course tutors in corpus and computational linguistics is required in order to gain access to this resource as many corpora are under specific copyright constraints. There are two web-interfaces available for corpus access in my own courses (internal access only).

Corpora, databases, tag sets and other language resources

click here

Miscellaneous annotation and analysis tools

name	type	language(s)	source
Stanford Log-linear Part-Of-Speech Tagger	pos tagger (stand-alone version with integrated tokenizer)	Arabic, Chinese, English, French, German, Spanish	Stanford NLP Group
Stanford Parser	parser: phrase structure, dependency (stand-alone version)	English, German …	Stanford NLP Group
Standford CoreNLP Tools	tokenizer, pos, PS parser, dependency parser, NER, entity linking and more	Arabic, Chinese, English, French, German, Spanish	Stanford NLP Group
TreeTagger	pos, lemmatzation, chunking	Bulgarian, Catalan, Chinese, Coptic, Czech, Danish, Dutch, English, Estonian, Finnish, German, Greek, Hungarian, Italian, Korean, Latin, Middle High German, Mongolian, Norwegian, Persian, Portugese, Romanian, Russian, Slovakian, Slovenian, Spanish, Swahili, Swedish	Helmut Schmid, LMU
GATE (General Architecture for Text Engineering)	annotation and analysis pipeline	English, German, various	University of Sheffield
Språkbanken's text analysis tool	annotation and analysis	Swedish	Språkbanken Text, University of Gothenburg

Table of Contents

resources: archives, corpora, dictionaries, lexica

Corpora, databases, tag sets and other language resources

Miscellaneous annotation and analysis tools

Corpus databases and web-interfaces

Site Tools

Table of Contents

resources: archives, corpora, dictionaries, lexica

Corpora, databases, tag sets and other language resources

Miscellaneous annotation and analysis tools

Corpus databases and web-interfaces