This is an old revision of the document!
corpus title | size | time | source | language |
---|---|---|---|---|
British National Corpus (BNC) | 100 million tokens | mid 1970s - early 1990s | Oxford | British English |
The Brown Corpus | 1 mio tokens | 1961 | ICAME | British English |
The Lancaster/Oslo-Bergen Corpus (LOB) | 1 mio. tokens | 1961 | ICAME | British English |
International Corpus of English (ICE) | xxxxxx | varieties of world Englishes | International Corpus of English (ICE) at Zuerich, CH | world English |
Mark Davies' English Corpora | xxxxxx | diverse set of corpora | Mark Davies | American English, British English, international English |
Textcorpora in the DWDS | div. | div. | https://www.dwds.de/r | German |
DWDS Kernkorpus | 1900-1999 | Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/kern | German | |
DWDS Kernkorpus 21 | 2000-2010 | Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/korpus21 | German | |
Hamburg Dependency Treebank | German news site heise.de, articles published between 1996 and 2001 | http://hdl.handle.net/11022/0000-0000-7FC7-2 | German | |
IDS-Corpora | http://www.ids-mannheim.de/kt/corpora.html | German | ||
LIMAS-Korpus | 1 mio words, 500 texts / fragments | 1970s | http://www.korpora.org/Limas/ | German |
Arabic News Texts Corpus (AntCorpus) | https://antcorpus.github.io/ | Arabic | ||
Wortschatz Leipzig | various sample sizes | Arabic, English, French, German, Russian misc. | https://wortschatz.uni-leipzig.de/de/download | various |
SpråkbankenText | https://spraakbanken.gu.se/en/resources | Swedish |