
The primary mission of CL Research is to investigate the structure of dictionaries (computational lexicons) and their role in natural language processing applications.
The structure of computational lexicons is investigated using DIMAP (DIctionary MAintenance Program). This Windows program provides a generalized structure for creating entries with multiple senses. Unlike ordinary dictionaries, DIMAP provides specific capabilities for representing superordinate and instance links, feature attributes and values, and generalized semantic relations to other entries and their senses. DIMAP includes functionality which permits the following types of computations within the lexicon (along with a range of maintenance functions):
- parsing definitions to identify superordinates and other types of semantic relations;
- analyzing the definitional hierarchies established by superordinates to identify definitional cycles (digraph analysis); and
- mapping between entries in different dictionaries.
DIMAP dictionaries have been created for several publicly-available lexicons, as well as electronic versions of published dictionaries, including
- an alphabetic version of the publicly-available WordNet 3.0 (see Electronic Dictionaries),
- an alphabetic version of the publicly-available UMLS Specialist Lexicon (January 2010) (see Electronic Dictionaries),
- an alphabetic version of FrameNet, incorporating all lexical items, frame characterizations, and frame relations (see Electronic Dictionaries), and
- a FrameNet frame element dictionary, used to create a frame element taxonomy,identifying hypernymic links between frame elements and the number of frames in which these frame elements appear (see Electronic Dictionaries and an online version that allows exploration of this taxonomy),
- a dictionary of all English prepositions (courtesy of Oxford University Press), further developed and analyzed in The Preposition Project, with an online version and broken down into preposition classes with digraphs showing derivational relationships.
- the Oxford Dictionary of English (1st and 2nd editions), and
- the Macquarie Dictionary (3rd and 4th editions), with links for definitions to the Macquarie Thesaurus (known as the Dictaurus),
The electronic versions of the Oxford and Macquarie lexical resources are
not publicly available, but may be licensed through CL Research for research
purposes.
The results of our research in examining the role of computational lexicons are incorporated in the Knowledge Management System (KMS), which is a unified platform for
- parsing and analyzing text (from most formats, including Word, PDF, XML,
and web pages),
- answering free-form natural language questions,
- summarizing one or more documents, generally or topic-based,
- extracting information,
- exploring document contents, and
- dynamically creating ontological representations of document contents.
KMS is accompanied by several supporting programs, parts of which are
incorporated directly, that feed into KMS or provide specialized analysis
functions. These include:
- a Text Parser, which can be used to provide background parsing and processing of large numbers of texts into XML representations for KMS use,
- an XML Analyzer, which provides more specialized XML tools for examining XML documents of any type, and
- a context analysis tool, Minnesota Contextual Content Analysis (MCCA), used for statistical characterization and analysis of texts, including multiple person texts such as transcripts of focus groups or plays.
Fully-functional demonstration versions of these supporting programs are available (see Demos).
CL Research has also developed a Windows-based utility, FrameNet Explorer (FNE), for examining the FrameNet database (see Demos). CL Research is using FNE in supporting an in-depth comprehensive, publicly-available characterization of the behavior of English prepositions (their semantic roles and the properties of the preposition complements and attachment points) in The Preposition Project.
Ken Litkowski of CL Research is also the webmaster for the Association for Computational Linguistics Special Interest Group on the Lexicon and a guest editor for a special issue of Computational Linguistics on semantic role labeling.
CL Research also provides consulting assistence on
- lexicon development, with a particular expertise in the bioinformatics domain (using the UMLS Metathesaurus and the UMLS Specialist Lexicon),
- research on the creation of ontologically-oriented lexicons out of standard exicons and dictionaries, and
- advanced text processing and natural language applications, with a focus on extracting textual content from documents, particularly using KMS.
To search the CL Research site:
This document maintained by Ken Litkowski.
Copyright © 2008 CL Research