This is an old revision of the document!
pos tag | description | example |
---|---|---|
CC | coordinating conjunction | and, or |
CD | cardinal number | 3, third |
DT | determiner | the, this |
EX | existential there | there is |
FW | foreign word | tabula |
IN | preposition, subordinating conjunction | in, of, like |
IN/that | that as subordinator | that |
JJ | adjective | blue, happy |
JJR | adjective, comparative | bluer, happier |
JJS | adjective, superlative | bluest, happiest |
LS | list marker | 1) |
MD | modal | could, will |
NN | noun, singular or mass | house |
NNS | noun plural | houses |
NP | proper noun, singular | Carrie |
NPS | proper noun, plural | Americans |
PDT | predeterminer | both as in “both the girls” |
POS | possessive ending | person’s |
PP | personal pronoun | I, she, it |
PPZ | possessive pronoun | my, his, your |
RB | adverb | however, usually, naturally, here, good |
RBR | adverb, comparative | better |
RBS | adverb, superlative | best |
RP | particle | up as in “give up” |
SENT | Sentence-break punctuation | . ! ? |
SYM | Symbol | / [ = * |
TO | infinitive ‘to’ | to play |
UH | interjection | aha |
VB | verb be, base form | be |
VBD | verb be, past tense | was, were |
VBG | verb be, gerund/present participle | being |
VBN | verb be, past participle | been |
VBP | verb be, sing. present, non-3d | am, are |
VBZ | verb be, 3rd person sing. present | is |
VH | verb have, base form | have |
VHD | verb have, past tense | had |
VHG | verb have, gerund/present participle | having |
VHN | verb have, past participle | had |
VHP | verb have, sing. present, non-3d | have |
VHZ | verb have, 3rd person sing. present | has |
VV | verb, base form | take |
VVD | verb, past tense | took |
VVG | verb, gerund/present participle | taking |
VVN | verb, past participle | taken |
VVP | verb, sing. present, non-3d | take |
VVZ | verb, 3rd person sing. present | takes |
WDT | wh-determiner | which, who |
WP | wh-pronoun | who, what |
WP$ | possessive wh-pronoun | whose |
WRB | wh-abverb | where, when |
# | # | # |
$ | $ | $ |
“ | Quotation marks | ‘ “ |
`` | Opening quotation marks | ‘ “ |
( | Opening brackets | ( { |
) | Closing brackets | ) } |
, | Comma | , |
: | Punctuation | – ; : — … |
corpus title | size | time | source | language |
---|---|---|---|---|
British National Corpus (BNC) | 100 million tokens | mid 1970s - early 1990s | Oxford | British English |
The Brown Corpus | 1 mio tokens | 1961 | ICAME | British English |
The Lancaster/Oslo-Bergen Corpus (LOB) | 1 mio. tokens | 1961 | ICAME | British English |
International Corpus of English (ICE) | xxxxxx | varieties of world Englishes | International Corpus of English (ICE) at Zuerich, CH | world English |
Mark Davies' English Corpora | xxxxxx | diverse set of corpora | Mark Davies | American English, British English, international English |
Textcorpora in the DWDS | div. | div. | https://www.dwds.de/r | German |
DWDS Kernkorpus | 1900-1999 | Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/kern | German | |
DWDS Kernkorpus 21 | 2000-2010 | Berlin-Brandenburgische Akademie der Wissenschaften: https://www.dwds.de/d/korpora/korpus21 | German | |
Hamburg Dependency Treebank | German news site heise.de, articles published between 1996 and 2001 | http://hdl.handle.net/11022/0000-0000-7FC7-2 | German | |
IDS-Corpora | http://www.ids-mannheim.de/kt/corpora.html | German | ||
LIMAS-Korpus | 1 mio words, 500 texts / fragments | 1970s | http://www.korpora.org/Limas/ | German |
Arabic News Texts Corpus (AntCorpus) | https://antcorpus.github.io/ | Arabic | ||
Wortschatz Leipzig | various sample sizes | Arabic, English, French, German, Russian misc. | https://wortschatz.uni-leipzig.de/de/download | various |
SpråkbankenText | https://spraakbanken.gu.se/en/resources | Swedish |