This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
linguisticsweb:resources:corpora [2021/07/16 15:22] sabinebartsch [Corpora] |
linguisticsweb:resources:corpora [2023/04/06 12:25] sabinebartsch |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Corpora and other language resources ====== | ====== Corpora and other language resources ====== | ||
+ | |||
+ | ===== Tag sets ===== | ||
+ | |||
+ | ^pos tag^description^example^ | ||
+ | |CC|coordinating conjunction|and| | ||
+ | |CD|cardinal number|1, third| | ||
+ | |DT|determiner|the| | ||
+ | |EX|existential there|there is| | ||
+ | |FW|foreign word|les| | ||
+ | |IN|preposition, | ||
+ | |IN/ | ||
+ | |JJ|adjective|green| | ||
+ | |JJR|adjective, | ||
+ | |JJS|adjective, | ||
+ | |LS|list marker|1)| | ||
+ | |MD|modal|could, | ||
+ | |NN|noun, singular or mass|table| | ||
+ | |NNS|noun plural|tables| | ||
+ | |NP|proper noun, singular|John| | ||
+ | |NPS|proper noun, plural|Vikings| | ||
+ | |PDT|predeterminer|both the boys| | ||
+ | |POS|possessive ending|friend’s| | ||
+ | |PP|personal pronoun|I, he, it| | ||
+ | |PPZ|possessive pronoun|my, his| | ||
+ | |RB|adverb|however, | ||
+ | |RBR|adverb, | ||
+ | |RBS|adverb, | ||
+ | |RP|particle|give up| | ||
+ | |SENT|Sentence-break punctuation|. ! ?| | ||
+ | |SYM|Symbol|/ | ||
+ | |TO|infinitive ‘to’|togo| | ||
+ | |UH|interjection|uhhuhhuhh| | ||
+ | |VB|verb be, base form|be| | ||
+ | |VBD|verb be, past tense|was, were| | ||
+ | |VBG|verb be, gerund/ | ||
+ | |VBN|verb be, past participle|been| | ||
+ | |VBP|verb be, sing. present, non-3d|am, are| | ||
+ | |VBZ|verb be, 3rd person sing. present|is| | ||
+ | |VH|verb have, base form|have| | ||
+ | |VHD|verb have, past tense|had| | ||
+ | |VHG|verb have, gerund/ | ||
+ | |VHN|verb have, past participle|had| | ||
+ | |VHP|verb have, sing. present, non-3d|have| | ||
+ | |VHZ|verb have, 3rd person sing. present|has| | ||
+ | |VV|verb, base form|take| | ||
+ | |VVD|verb, past tense|took| | ||
+ | |VVG|verb, gerund/ | ||
+ | |VVN|verb, past participle|taken| | ||
+ | |VVP|verb, sing. present, non-3d|take| | ||
+ | |VVZ|verb, 3rd person sing. present|takes| | ||
+ | |WDT|wh-determiner|which| | ||
+ | |WP|wh-pronoun|who, | ||
+ | |WP$|possessive wh-pronoun|whose| | ||
+ | |WRB|wh-abverb|where, | ||
+ | |#|#|#| | ||
+ | |$|$|$| | ||
+ | |“|Quotation marks|‘ “| | ||
+ | |``|Opening quotation marks|‘ “| | ||
+ | |(|Opening brackets|( {| | ||
+ | |)|Closing brackets|) }| | ||
+ | |,|Comma|,| | ||
+ | |: | ||
+ | |||
+ | |||
===== Corpora ===== | ===== Corpora ===== | ||
Line 12: | Line 76: | ||
|DWDS Kernkorpus| | |DWDS Kernkorpus| | ||
|DWDS Kernkorpus 21| |2000-2010 | |DWDS Kernkorpus 21| |2000-2010 | ||
- | |Deutscher Wortschatz Project|35 mio. sentences, 500 mio. words| | ||
|Hamburg Dependency Treebank| | |Hamburg Dependency Treebank| | ||
|IDS-Corpora| | |IDS-Corpora| | ||
|LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http:// | |LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http:// | ||
|Arabic News Texts Corpus (AntCorpus)| | | https:// | |Arabic News Texts Corpus (AntCorpus)| | | https:// | ||
- | |Wortschatz Leipzig| | |https:// | + | |Wortschatz Leipzig|various sample sizes|Arabic, English, French, German, Russian misc. |https:// |
- | |SpråkbankenText|||https:// | + | |SpråkbankenText| | |https:// |