ACL SIGLEX Resource Links

Special Interest Group on the Lexicon of the Association for Computational Linguistics

Parallel Corpora

(9/29/96) Running text in two or more languages.
  • Parallel corpora available from the Lancaster University Centre for Computer Corpus Research on Language (UCREL):
  • Sample Turkish and English texts, automatically aligned (by Kursat Ince) at the sentence level using Gale and Church's align code. Send any corrections and suggestions to Kemal Oflazer.
  • PEDANT, the parallel texts in Göteborg. PEDANT consists of texts in several languages and aims at providing a wide collection of text types and language pairs in order to facilitate the creation of sub-set corpora for the specific purposes various researchers might have. Developed by Pernilla Danielsson and Daniel Ridings. Searches, resulting in something that could be likened to a parallel concordance, can be done in Swedish, English, French and German.
  • Regeringsförklaringen is the yearly declaration of the Swedish government and is issued in Swedish, English, French, German, and Spanish. The documents have been converted to TEI-conformant SGML and the sentences in the different language issues have been aligned with the align program by Gale and Church. The result is this searchable parallel corpus. Contact Erik Tjong Kim Sang for further details.
    To SIGLEX Resources Main Page