This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Previous revision Next revision Both sides next revision | ||
linguisticsweb:resources:corpora [2019/07/13 09:34] |
linguisticsweb:resources:corpora [2023/04/06 12:36] sabinebartsch [Tag sets] |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Corpora and other language resources ====== | ||
+ | |||
+ | ===== Tag sets ===== | ||
+ | |||
+ | ==== Penn TreeBank tag set ==== | ||
+ | |||
+ | Reference: | ||
+ | Mitchell P. Marcus, Mary Ann Marcinkiewicz, | ||
+ | |||
+ | ^pos tag^description^example^ | ||
+ | |CC|coordinating conjunction|and, | ||
+ | |CD|cardinal number|3, third| | ||
+ | |DT|determiner|the, | ||
+ | |EX|existential there|there is| | ||
+ | |FW|foreign word|tabula| | ||
+ | |IN|preposition, | ||
+ | |IN/ | ||
+ | |JJ|adjective|blue, | ||
+ | |JJR|adjective, | ||
+ | |JJS|adjective, | ||
+ | |LS|list marker|1)| | ||
+ | |MD|modal|could, | ||
+ | |NN|noun, singular or mass|house| | ||
+ | |NNS|noun plural|houses| | ||
+ | |NP|proper noun, singular|Carrie| | ||
+ | |NPS|proper noun, plural|Americans| | ||
+ | |PDT|predeterminer|both as in "both the girls" | ||
+ | |POS|possessive ending|person’s| | ||
+ | |PP|personal pronoun|I, she, it| | ||
+ | |PPZ|possessive pronoun|my, his, your| | ||
+ | |RB|adverb|however, | ||
+ | |RBR|adverb, | ||
+ | |RBS|adverb, | ||
+ | |RP|particle|up as in "give up"| | ||
+ | |SENT|Sentence-break punctuation|. ! ?| | ||
+ | |SYM|Symbol|/ | ||
+ | |TO|infinitive ‘to’|to play| | ||
+ | |UH|interjection|aha| | ||
+ | |VB|verb be, base form|be| | ||
+ | |VBD|verb be, past tense|was, were| | ||
+ | |VBG|verb be, gerund/ | ||
+ | |VBN|verb be, past participle|been| | ||
+ | |VBP|verb be, sing. present, non-3d|am, are| | ||
+ | |VBZ|verb be, 3rd person sing. present|is| | ||
+ | |VH|verb have, base form|have| | ||
+ | |VHD|verb have, past tense|had| | ||
+ | |VHG|verb have, gerund/ | ||
+ | |VHN|verb have, past participle|had| | ||
+ | |VHP|verb have, sing. present, non-3d|have| | ||
+ | |VHZ|verb have, 3rd person sing. present|has| | ||
+ | |VV|verb, base form|take| | ||
+ | |VVD|verb, past tense|took| | ||
+ | |VVG|verb, gerund/ | ||
+ | |VVN|verb, past participle|taken| | ||
+ | |VVP|verb, sing. present, non-3d|take| | ||
+ | |VVZ|verb, 3rd person sing. present|takes| | ||
+ | |WDT|wh-determiner|which, | ||
+ | |WP|wh-pronoun|who, | ||
+ | |WP$|possessive wh-pronoun|whose| | ||
+ | |WRB|wh-abverb|where, | ||
+ | |#|#|#| | ||
+ | |$|$|$| | ||
+ | |“|Quotation marks|‘ “| | ||
+ | |``|Opening quotation marks|‘ “| | ||
+ | |(|Opening brackets|( {| | ||
+ | |)|Closing brackets|) }| | ||
+ | |,|Comma|,| | ||
+ | |: | ||
+ | |||
+ | ===== Corpora ===== | ||
+ | |||
+ | ^corpus title^size^time^source^language^ | ||
+ | |British National Corpus (BNC)|100 million tokens|mid 1970s - early 1990s|Oxford|British English| | ||
+ | |The Brown Corpus|1 mio tokens|1961|ICAME|British English| | ||
+ | |The Lancaster/ | ||
+ | |[[http:// | ||
+ | |Mark Davies' | ||
+ | |Textcorpora in the DWDS| div. |div.| https:// | ||
+ | |DWDS Kernkorpus| | ||
+ | |DWDS Kernkorpus 21| |2000-2010 | ||
+ | |Hamburg Dependency Treebank| | ||
+ | |IDS-Corpora| | ||
+ | |LIMAS-Korpus|1 mio words, 500 texts / fragments|1970s|http:// | ||
+ | |Arabic News Texts Corpus (AntCorpus)| | | https:// | ||
+ | |Wortschatz Leipzig|various sample sizes|Arabic, | ||
+ | |SpråkbankenText| | |https:// | ||
+ | |||