This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
linguisticsweb:resources:corpora [2021/07/16 15:23] sabinebartsch [Corpora] |
linguisticsweb:resources:corpora [2023/04/06 12:36] sabinebartsch [Tag sets] |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Corpora and other language resources ====== | ====== Corpora and other language resources ====== | ||
+ | |||
+ | ===== Tag sets ===== | ||
+ | |||
+ | ==== Penn TreeBank tag set ==== | ||
+ | |||
+ | Reference: | ||
+ | Mitchell P. Marcus, Mary Ann Marcinkiewicz, | ||
+ | |||
+ | ^pos tag^description^example^ | ||
+ | |CC|coordinating conjunction|and, | ||
+ | |CD|cardinal number|3, third| | ||
+ | |DT|determiner|the, | ||
+ | |EX|existential there|there is| | ||
+ | |FW|foreign word|tabula| | ||
+ | |IN|preposition, | ||
+ | |IN/ | ||
+ | |JJ|adjective|blue, | ||
+ | |JJR|adjective, | ||
+ | |JJS|adjective, | ||
+ | |LS|list marker|1)| | ||
+ | |MD|modal|could, | ||
+ | |NN|noun, singular or mass|house| | ||
+ | |NNS|noun plural|houses| | ||
+ | |NP|proper noun, singular|Carrie| | ||
+ | |NPS|proper noun, plural|Americans| | ||
+ | |PDT|predeterminer|both as in "both the girls" | ||
+ | |POS|possessive ending|person’s| | ||
+ | |PP|personal pronoun|I, she, it| | ||
+ | |PPZ|possessive pronoun|my, his, your| | ||
+ | |RB|adverb|however, | ||
+ | |RBR|adverb, | ||
+ | |RBS|adverb, | ||
+ | |RP|particle|up as in "give up"| | ||
+ | |SENT|Sentence-break punctuation|. ! ?| | ||
+ | |SYM|Symbol|/ | ||
+ | |TO|infinitive ‘to’|to play| | ||
+ | |UH|interjection|aha| | ||
+ | |VB|verb be, base form|be| | ||
+ | |VBD|verb be, past tense|was, were| | ||
+ | |VBG|verb be, gerund/ | ||
+ | |VBN|verb be, past participle|been| | ||
+ | |VBP|verb be, sing. present, non-3d|am, are| | ||
+ | |VBZ|verb be, 3rd person sing. present|is| | ||
+ | |VH|verb have, base form|have| | ||
+ | |VHD|verb have, past tense|had| | ||
+ | |VHG|verb have, gerund/ | ||
+ | |VHN|verb have, past participle|had| | ||
+ | |VHP|verb have, sing. present, non-3d|have| | ||
+ | |VHZ|verb have, 3rd person sing. present|has| | ||
+ | |VV|verb, base form|take| | ||
+ | |VVD|verb, past tense|took| | ||
+ | |VVG|verb, gerund/ | ||
+ | |VVN|verb, past participle|taken| | ||
+ | |VVP|verb, sing. present, non-3d|take| | ||
+ | |VVZ|verb, 3rd person sing. present|takes| | ||
+ | |WDT|wh-determiner|which, | ||
+ | |WP|wh-pronoun|who, | ||
+ | |WP$|possessive wh-pronoun|whose| | ||
+ | |WRB|wh-abverb|where, | ||
+ | |#|#|#| | ||
+ | |$|$|$| | ||
+ | |“|Quotation marks|‘ “| | ||
+ | |``|Opening quotation marks|‘ “| | ||
+ | |(|Opening brackets|( {| | ||
+ | |)|Closing brackets|) }| | ||
+ | |,|Comma|,| | ||
+ | |: | ||
===== Corpora ===== | ===== Corpora ===== | ||
Line 12: | Line 79: | ||
|DWDS Kernkorpus| | |DWDS Kernkorpus| | ||
|DWDS Kernkorpus 21| |2000-2010 | |DWDS Kernkorpus 21| |2000-2010 | ||
- | |Deutscher Wortschatz Project|35 mio. sentences, 500 mio. words| | ||
|Hamburg Dependency Treebank| | |Hamburg Dependency Treebank| | ||
|IDS-Corpora| | |IDS-Corpora| | ||
|LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http:// | |LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http:// | ||
|Arabic News Texts Corpus (AntCorpus)| | | https:// | |Arabic News Texts Corpus (AntCorpus)| | | https:// | ||
- | |Wortschatz Leipzig| | |https:// | + | |Wortschatz Leipzig|various sample sizes|Arabic, English, French, German, Russian misc. |https:// |
|SpråkbankenText| | |https:// | |SpråkbankenText| | |https:// | ||