ACL SIGLEX Resource Links
Special Interest Group on the Lexicon of the
for Computational Linguistics
Databases of transcripts of those acquiring language as children or as a second language.
Child Language Data
- Polytechnic of Wales Corpus (POW) "100,000 words of transcribed British English child-language data sampled from 6 - 12 year olds. collected between 1978-84. The corpus is balanced for sex, age, socio-economic status and strong second language influence." Recommended contacts are
Robin Fawcett, the original compiler and analyzer of the POW Corpus, Tim O'Donoghue,
Clive Souter (email@example.com) who wrote the manual, and Eric Atwell. Available from: - Oxford Text Archive Email Alan Morrison, firstname.lastname@example.org - or
- Child Language Data Exchange System (CHILDES) "A collection of utterances of children of different age groups. The total size of the database is approximately 150 megabytes. The corpora are divided into six major directories: English data, non-English data, story-telling or narrative data, data on language impairments, data from second language acquisition, and data not transcribed." (MacWhinney, Brian, "The CHILDES project", Lawrence Erlbaum Associates, pp.280, 1995.) The CHILDES system has developed a variety of computational tools that facilitate the sharing of transcript data, increase the reliability of transcriptions, and automate the process of data analysis.
- A series of taped focus group and interview sessions with UK adolescents is planned as a part of an ESRC-funded research project. The focus group and interview sessions will involve adolescents in years 8, 10 and 12 (that is approximately 12-13, 14-15, and 16-17 years old), at a secondary school in Sussex, England. They are planned to start at the beginning of 1997, with a quantity of material available by the end of January 1997. The research fieldwork continues until June 1997, and so material will be available for transcription throughout the first half of 1997. Anyone interested in this corpus, and who in return can offer an anonymous transcription service should contact Neil Jacobs
- Written Language collections: (As recommended by Geoffrey Sampson) The Written Language of Nine and Ten-Year Old Children; The Written Language of Eleven and Twelve-Year Old Children, Nuffield Foreign Languages Teaching Materials Project, Reports and Occasional Papers Nos. 24 and 25, respectively. "Apart from a few pages of introduction including transcription conventions they consist entirely of writing by children faithfully reproduced, with crossings-out, misspellings, etc. all recorded. I estimate that the total amount of children's writing is about 100,000 words." Published in 1967 at: 5 Lyddon Terrace, The University, Leeds 2; Editors' names are not given. Although they are not machine-readable and probably not very widely available, they do concentrate on written language.
Second Language Acquisition Data
ICE (the International Corpus of English) incorporates ICLE, the International Corpus of Learner English. This corpus samples the English used by advanced learners from ten different language backgrounds - French, Dutch, German, Spanish, Swedish, Finnish, Czech, Polish, Russian, Japanese, and Chinese. The International Corpus of Learner English (ICLE) was compiled to fill this gap. Centralised at the University of Louvain, Belgium, it has been collected in collaboration with several universities. The corpus, which now stands at 1 million+ words, is made up of argumentative essays written by university students of English from the mother tongue backgrounds. Contact Sylviane Granger for further information.
To SIGLEX Resources Main Page