Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
linguisticsweb:resources:corpora [2020/12/01 15:29]
sabinebartsch
linguisticsweb:resources:corpora [2023/04/06 12:48]
sabinebartsch [Tag sets]
Line 1: Line 1:
 ====== Corpora and other language resources ====== ====== Corpora and other language resources ======
  
 +===== Tag sets =====
 +
 +  * [[linguisticsweb:resources:corpora:tagsets|Penn TreeBank tag set]]
 +  * CLAWS 5 tag set
 +  * CLAWS 7 tag set
 ===== Corpora ===== ===== Corpora =====
  
Line 7: Line 12:
 |The Brown Corpus|1 mio tokens|1961|ICAME|British English| |The Brown Corpus|1 mio tokens|1961|ICAME|British English|
 |The Lancaster/Oslo-Bergen Corpus (LOB)|1 mio. tokens|1961|ICAME|British English| |The Lancaster/Oslo-Bergen Corpus (LOB)|1 mio. tokens|1961|ICAME|British English|
-|[[https://www.ice-corpora.uzh.ch/en.html|International Corpus of English (ICE)]]|xxxxxx|varieties of world Englishes|xxxxxxxx|world English|+|[[http://ice-corpora.net/ice/index.html|International Corpus of English (ICE)]]|xxxxxx|varieties of world Englishes|[[https://www.ice-corpora.uzh.ch/en.html|International Corpus of English (ICE) at Zuerich, CH]]|world English| 
 +|Mark Davies' English Corpora|xxxxxx|diverse set of corpora|Mark Davies|American English, British English, international English| 
 +|Textcorpora in the DWDS|  div. |div.| https://www.dwds.de/ |German|
 |DWDS Kernkorpus|  |1900-1999  |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/kern|German| |DWDS Kernkorpus|  |1900-1999  |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/kern|German|
 |DWDS Kernkorpus 21|  |2000-2010  |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/korpus21|German| |DWDS Kernkorpus 21|  |2000-2010  |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/korpus21|German|
-|Deutscher Wortschatz Project|35 mio. sentences, 500 mio. words|  |http://wortschatz.uni-leipzig.de/|German| 
 |Hamburg Dependency Treebank|  |German news site heise.de, articles published between 1996 and 2001|http://hdl.handle.net/11022/0000-0000-7FC7-2|German| |Hamburg Dependency Treebank|  |German news site heise.de, articles published between 1996 and 2001|http://hdl.handle.net/11022/0000-0000-7FC7-2|German|
 |IDS-Corpora|  |  |http://www.ids-mannheim.de/kt/corpora.html|German| |IDS-Corpora|  |  |http://www.ids-mannheim.de/kt/corpora.html|German|
 |LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http://www.korpora.org/Limas/|German| |LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http://www.korpora.org/Limas/|German|
 +|Arabic News Texts Corpus (AntCorpus)| | | https://antcorpus.github.io/|Arabic|
 +|Wortschatz Leipzig|various sample sizes|Arabic, English, French, German, Russian misc. |https://wortschatz.uni-leipzig.de/de/download|various|
 +|SpråkbankenText| | |https://spraakbanken.gu.se/en/resources|Swedish|