corpus

Latin word for body, in principle any collection of more than one text. In the context of modern linguistics, a principled collection of natural language material. Can contain samples or whole texts, e.g. sentences, conversations, or books. Often enriched with additional information like annotations ( POS-tags, Parsing , etc. ). The primary purpose of a corpus is not to preserve and display text as in an archive or a library; a corpus is a collection with a particular linguistic purpose in mind. Corpora can be employed for research in all fields of linguistics , e.g. lexicology , sociolinguistics, lexicography, discourse studies or applied linguistics.

Typical characteristics :

Types of corpora: