Macquarie Library Pty Ltd has designated CL Research as an agent for licensing the machine-readable version of The Macquarie Dictionary (Big Mac) to the academic and commercial research community. CL Research has created a machine-tractable version of Big Mac in its DIMAP dictionary maintenance programs, adding syntactic and semantic information in the conversion. Using functionality to parse the dictionary definitions, DIMAP has further enhanced Big Mac through the addition of many semantic links, including hypernyms, synonyms, and other semantic relations, thus making Big Mac+DIMAP a semantic network of the English language. (Details on contents of Big Mac+DIMAP.)
The Macquarie Dictionary is a comprehensive dictionary, containing 110,000 headwords, with more than 30,000 subsidiary headwords (idiomatic phrases), and more than 200,000 definitions. Big Mac contains over 21,000 encyclopedic entries for people and places and over 18,000 illustrative examples. Entries contain extensive usage notes and etymologies. In addition to providing extensive coverage for standard English, Big Mac comprehensively covers Australian English (Strine), Aboriginal English, and key items from English in South-East Asia. Uniquely, most definitions in Big Mac are tied directly to subparagraphs in The Macquarie Thesaurus, a Roget-style thesaurus, that provides a categorical grouping of words at a level just above the WordNet synset. Each definition is given a unique identifier, making Big Mac easily usable for word-sense disambiguation. Pronunciations are given in the International Phonetic Alphabet (as shown in the print dictionary) and in a more casual form (in the machine-readable dictionary). Spelling variants in the machine-readable dictionary include forms both with and without diacritic marks.
In making Big Mac machine-tractable, CL Research has converted the Big Mac fields into clearly identified fields suitable for use in NLP, based on CL Research's experience in creating lexicons for word-sense disambiguation, question-answering, and information extraction. E.g., variable multiword units are converted into regular expressions containing lexical and syntactic preferences. After conversion, Big Mac data are available for many forms of analysis using DIMAP functionality, most notably the parsing of definitions to populate the data with semantic relation links, making the dictionary into a vast semantic network (rooted in lexicographically sound data). Inside DIMAP, the data are available for further analysis (most notably, CL Research's digraph-based primitive finding and dictionary mapping routines). DIMAP also provides considerable flexibility in searching for definitional patterns (regular expression searches on headwords, definitions, hypernyms, features, and other semantic links), extracting subdictionaries, and comparing entries with an integrated WordNet. (CL Research has used The Macquarie Dictionary and The Macquarie Thesaurus as a key component in its participation in the TREC Question-Answering Track; see the papers listed there for further details.)
Big Mac+DIMAP provides an unparalleled resource for research into the nature and application of the lexicon. Moreover, the CL Research collaboration with Macquarie lexicographers is an ongoing process of extracting and mining the data. CL Research is available for customizing Big Mac to meet your requirements.
An academic research license for a two-year period is free for The Macquarie Dictionary and The Macquarie Thesaurus, with a servicing fee of $200 per year to CL Research (which includes DIMAP and the DIMAP versions of Big Mac). A commercial research license for two years is $5,000 for The Macquarie Dictionary, $5,000 for The Macquarie Thesaurus, or $7,500 for both, and $7,000 for Big Mac+DIMAP . The license agreement, also available as a Microsoft Word document, specifies the terms of the license.
Contact Ken Litkowski, CL Research, 9208 Gue Road, Damascus, MD 20872 (301-482-0237) for further details on licensing arrangements.
Maintained by Ken
Litkowski
Copyright © 2002 CL Research