Both sides previous revision
Previous revision
Next revision
|
Previous revision
Last revision
Both sides next revision
|
linguisticsweb:resources:corpora [2020/12/01 15:43] sabinebartsch |
linguisticsweb:resources:corpora [2023/04/06 12:48] sabinebartsch [Tag sets] |
====== Corpora and other language resources ====== | ====== Corpora and other language resources ====== |
| |
| ===== Tag sets ===== |
| |
| * [[linguisticsweb:resources:corpora:tagsets|Penn TreeBank tag set]] |
| * CLAWS 5 tag set |
| * CLAWS 7 tag set |
===== Corpora ===== | ===== Corpora ===== |
| |
|[[http://ice-corpora.net/ice/index.html|International Corpus of English (ICE)]]|xxxxxx|varieties of world Englishes|[[https://www.ice-corpora.uzh.ch/en.html|International Corpus of English (ICE) at Zuerich, CH]]|world English| | |[[http://ice-corpora.net/ice/index.html|International Corpus of English (ICE)]]|xxxxxx|varieties of world Englishes|[[https://www.ice-corpora.uzh.ch/en.html|International Corpus of English (ICE) at Zuerich, CH]]|world English| |
|Mark Davies' English Corpora|xxxxxx|diverse set of corpora|Mark Davies|American English, British English, international English| | |Mark Davies' English Corpora|xxxxxx|diverse set of corpora|Mark Davies|American English, British English, international English| |
| |Textcorpora in the DWDS| div. |div.| https://www.dwds.de/r |German| |
|DWDS Kernkorpus| |1900-1999 |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/kern|German| | |DWDS Kernkorpus| |1900-1999 |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/kern|German| |
|DWDS Kernkorpus 21| |2000-2010 |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/korpus21|German| | |DWDS Kernkorpus 21| |2000-2010 |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/korpus21|German| |
|Deutscher Wortschatz Project|35 mio. sentences, 500 mio. words| |http://wortschatz.uni-leipzig.de/|German| | |
|Hamburg Dependency Treebank| |German news site heise.de, articles published between 1996 and 2001|http://hdl.handle.net/11022/0000-0000-7FC7-2|German| | |Hamburg Dependency Treebank| |German news site heise.de, articles published between 1996 and 2001|http://hdl.handle.net/11022/0000-0000-7FC7-2|German| |
|IDS-Corpora| | |http://www.ids-mannheim.de/kt/corpora.html|German| | |IDS-Corpora| | |http://www.ids-mannheim.de/kt/corpora.html|German| |
|LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http://www.korpora.org/Limas/|German| | |LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http://www.korpora.org/Limas/|German| |
| |Arabic News Texts Corpus (AntCorpus)| | | https://antcorpus.github.io/|Arabic| |
| |Wortschatz Leipzig|various sample sizes|Arabic, English, French, German, Russian misc. |https://wortschatz.uni-leipzig.de/de/download|various| |
| |SpråkbankenText| | |https://spraakbanken.gu.se/en/resources|Swedish| |
| |
| |