This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Previous revision Next revision Both sides next revision | ||
linguisticsweb:resources:corpora [2019/07/13 09:33] |
linguisticsweb:resources:corpora [2023/04/06 12:37] sabinebartsch [Penn TreeBank tag set] |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Corpora and other language resources ====== | ||
+ | |||
+ | ===== Tag sets ===== | ||
+ | |||
+ | * Penn TreeBank tag set | ||
+ | * CLAWS 5 tag set | ||
+ | * CLAWS 7 tag set | ||
+ | ===== Corpora ===== | ||
+ | |||
+ | ^corpus title^size^time^source^language^ | ||
+ | |British National Corpus (BNC)|100 million tokens|mid 1970s - early 1990s|Oxford|British English| | ||
+ | |The Brown Corpus|1 mio tokens|1961|ICAME|British English| | ||
+ | |The Lancaster/ | ||
+ | |[[http:// | ||
+ | |Mark Davies' | ||
+ | |Textcorpora in the DWDS| div. |div.| https:// | ||
+ | |DWDS Kernkorpus| | ||
+ | |DWDS Kernkorpus 21| |2000-2010 | ||
+ | |Hamburg Dependency Treebank| | ||
+ | |IDS-Corpora| | ||
+ | |LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http:// | ||
+ | |Arabic News Texts Corpus (AntCorpus)| | | https:// | ||
+ | |Wortschatz Leipzig|various sample sizes|Arabic, | ||
+ | |SpråkbankenText| | |https:// | ||
+ | |||