
This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
linguisticsweb:resources:corpora [2021/06/06 12:44]
sabinebartsch [Corpora]
linguisticsweb:resources:corpora [2023/04/06 12:30]
sabinebartsch [Tag sets]
Line 1: Line 1:
 ====== Corpora and other language resources ====== ====== Corpora and other language resources ======
 +===== Tag sets =====
 +^pos tag^description^example^
 +|CC|coordinating conjunction|and, or|
 +|CD|cardinal number|3, third|
 +|DT|determiner|the, this|
 +|EX|existential there|there is|
 +|FW|foreign word|tabula|
 +|IN|preposition, subordinating conjunction|in, of, like|
 +|IN/that|that as subordinator|that|
 +|JJ|adjective|blue, happy|
 +|JJR|adjective, comparative|bluer, happier|
 +|JJS|adjective, superlative|bluest, happiest|
 +|LS|list marker|1)|
 +|MD|modal|could, will|
 +|NN|noun, singular or mass|house|
 +|NNS|noun plural|houses|
 +|NP|proper noun, singular|Carrie|
 +|NPS|proper noun, plural|Americans|
 +|PDT|predeterminer|both as in "both the girls"|
 +|POS|possessive ending|person’s|
 +|PP|personal pronoun|I, she, it|
 +|PPZ|possessive pronoun|my, his, your|
 +|RB|adverb|however, usually, naturally, here, good|
 +|RBR|adverb, comparative|better|
 +|RBS|adverb, superlative|best|
 +|RP|particle|up as in "give up"|
 +|SENT|Sentence-break punctuation|. ! ?|
 +|SYM|Symbol|/ [ = *|
 +|TO|infinitive ‘to’|to play|
 +|VB|verb be, base form|be|
 +|VBD|verb be, past tense|was, were|
 +|VBG|verb be, gerund/present participle|being|
 +|VBN|verb be, past participle|been|
 +|VBP|verb be, sing. present, non-3d|am, are|
 +|VBZ|verb be, 3rd person sing. present|is|
 +|VH|verb have, base form|have|
 +|VHD|verb have, past tense|had|
 +|VHG|verb have, gerund/present participle|having|
 +|VHN|verb have, past participle|had|
 +|VHP|verb have, sing. present, non-3d|have|
 +|VHZ|verb have, 3rd person sing. present|has|
 +|VV|verb, base form|take|
 +|VVD|verb, past tense|took|
 +|VVG|verb, gerund/present participle|taking|
 +|VVN|verb, past participle|taken|
 +|VVP|verb, sing. present, non-3d|take|
 +|VVZ|verb, 3rd person sing. present|takes|
 +|WDT|wh-determiner|which, who|
 +|WP|wh-pronoun|who, what|
 +|WP$|possessive wh-pronoun|whose|
 +|WRB|wh-abverb|where, when|
 +|“|Quotation marks|‘ “|
 +|``|Opening quotation marks|‘ “|
 +|(|Opening brackets|( {|
 +|)|Closing brackets|) }|
 +|:|Punctuation|– ; : — …|
 ===== Corpora ===== ===== Corpora =====
Line 12: Line 74:
 |DWDS Kernkorpus|  |1900-1999  |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/kern|German| |DWDS Kernkorpus|  |1900-1999  |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/kern|German|
 |DWDS Kernkorpus 21|  |2000-2010  |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/korpus21|German| |DWDS Kernkorpus 21|  |2000-2010  |Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/korpus21|German|
-|Deutscher Wortschatz Project|35 mio. sentences, 500 mio. words|  |http://wortschatz.uni-leipzig.de/|German| 
 |Hamburg Dependency Treebank|  |German news site heise.de, articles published between 1996 and 2001|http://hdl.handle.net/11022/0000-0000-7FC7-2|German| |Hamburg Dependency Treebank|  |German news site heise.de, articles published between 1996 and 2001|http://hdl.handle.net/11022/0000-0000-7FC7-2|German|
 |IDS-Corpora|  |  |http://www.ids-mannheim.de/kt/corpora.html|German| |IDS-Corpora|  |  |http://www.ids-mannheim.de/kt/corpora.html|German|
 |LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http://www.korpora.org/Limas/|German| |LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http://www.korpora.org/Limas/|German|
 |Arabic News Texts Corpus (AntCorpus)| | | https://antcorpus.github.io/|Arabic| |Arabic News Texts Corpus (AntCorpus)| | | https://antcorpus.github.io/|Arabic|
-|Wortschatz Leipzig| | |https://wortschatz.uni-leipzig.de/de/download|various|+|Wortschatz Leipzig|various sample sizes|Arabic, English, French, German, Russian misc. |https://wortschatz.uni-leipzig.de/de/download|various
 +|SpråkbankenText| | |https://spraakbanken.gu.se/en/resources|Swedish|