Corpora are intended to be representative of some specified population or genre. Corpora are needed for large scale, systematic contrasts of, for example, language varieties, genres, and modalities (e.g., American vs. British English, informative vs. imaginative prose, or spoken vs. written language). Other research requires enormous amounts of data, even if from fewer genres, as for example, in lexicography, in order to detect words and collocations which occur only rarely. These are recognized corpora used in research; this list does not attempt include all text collections. Also see parallel corpora (involving aligned texts of two or more languages).





