Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
linguisticsweb:resources:corpora [2020/12/01 15:43]
sabinebartsch
linguisticsweb:resources:corpora [2023/04/06 15:05] (current)
sabinebartsch [Tag sets]
Line 1: Line 1:
 ====== Corpora and other language resources ====== ====== Corpora and other language resources ======
  
 +===== Tag sets =====
 +
 +  * [[linguisticsweb:resources:tagsets:penntb|Penn TreeBank tag set]]
 +  * CLAWS 5 tag set
 +  * CLAWS 7 tag set
 ===== Corpora ===== ===== Corpora =====
  
Line 9: Line 14:
 |[[http://ice-corpora.net/ice/index.html|International Corpus of English (ICE)]]|xxxxxx|varieties of world Englishes|[[https://www.ice-corpora.uzh.ch/en.html|International Corpus of English (ICE) at Zuerich, CH]]|world English| |[[http://ice-corpora.net/ice/index.html|International Corpus of English (ICE)]]|xxxxxx|varieties of world Englishes|[[https://www.ice-corpora.uzh.ch/en.html|International Corpus of English (ICE) at Zuerich, CH]]|world English|
 |Mark Davies' English Corpora|xxxxxx|diverse set of corpora|Mark Davies|American English, British English, international English| |Mark Davies' English Corpora|xxxxxx|diverse set of corpora|Mark Davies|American English, British English, international English|
 +|Textcorpora in the DWDS|  div. |div.| https://www.dwds.de/ |German|
 |DWDS Kernkorpus|  |1900-1999  |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/kern|German| |DWDS Kernkorpus|  |1900-1999  |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/kern|German|
 |DWDS Kernkorpus 21|  |2000-2010  |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/korpus21|German| |DWDS Kernkorpus 21|  |2000-2010  |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/korpus21|German|
-|Deutscher Wortschatz Project|35 mio. sentences, 500 mio. words|  |http://wortschatz.uni-leipzig.de/|German| 
 |Hamburg Dependency Treebank|  |German news site heise.de, articles published between 1996 and 2001|http://hdl.handle.net/11022/0000-0000-7FC7-2|German| |Hamburg Dependency Treebank|  |German news site heise.de, articles published between 1996 and 2001|http://hdl.handle.net/11022/0000-0000-7FC7-2|German|
 |IDS-Corpora|  |  |http://www.ids-mannheim.de/kt/corpora.html|German| |IDS-Corpora|  |  |http://www.ids-mannheim.de/kt/corpora.html|German|
 |LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http://www.korpora.org/Limas/|German| |LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http://www.korpora.org/Limas/|German|
 +|Arabic News Texts Corpus (AntCorpus)| | | https://antcorpus.github.io/|Arabic|
 +|Wortschatz Leipzig|various sample sizes|Arabic, English, French, German, Russian misc. |https://wortschatz.uni-leipzig.de/de/download|various|
 +|SpråkbankenText| | |https://spraakbanken.gu.se/en/resources|Swedish|