Electronic dictionaries of the future

Current electronic dictionaries are presently little more than transcriptions of paper dictionaries. To be sure, they have a lot more information than is present in the print versions. But, they are not really designed to support natural language processing. The major needs of the future are: (1) a set of instances illustrating each sense of an entry; (2) sufficient structured information to permit disambiguation to reach each sense; and (3) a representation of the meaning of each sense for use in NLP.

Usage Examples: Dictionaries have always contained examples of the usage of most senses. For the longest time, these examples were frequently contrived and did not reflect actual use. Increasingly, these examples come from actual texts, i.e., from corpora. Most dictionary software today facilitates the association of samples from corpora to individual senses. The use of such examples for disambiguation was begun with Lesk and is considered the method of choice for establishing disambiguation baselines. However, this is still not sufficient. Recently, Oxford University Press has assembled a sentence dictionary, currently attempting to provide 20 examples for each sense. While this is more useful in providing lexical samples to be used in testing disambiguation techniques, e.g., 200 sentences for an entry containing 10 senses, this is still not sufficient for the kinds of statistical algorithms frequently used for disambiguation. I don’t know what this number should be, but I believe more will be necessary.

Disambiguation mechanisms: With increases in the amount of information in electronic dictionaries, more and more is available for use in disambiguation. This information can include subcategorization patterns, subject fields, and other grammar information. The latest is the use of construction patterns, which is a glorified term for subcategorization patterns.  However, this kind of information has not been fully synchronized with NLP needs. At present, a user of an electronic dictionary has to develop all sorts of routines to extract and make use of this information. More can be done in this area.

Representation mechanisms: Each sense needs a representation of its meaning that can be plugged into a representation of a sentence where it’s used. Clearly, the definition is one such representation and the design of definitions has generally been intended so that they can be substituted for a use. A lot of effort has gone into various representational formalisms. A lot of this has been done in the development of ontologies, creating a logic-style representation. This seems too strong and doesn’t provide the wiggle-room in the haziness that constitutes a meaning.

I like FrameNet and frame semantics. However, these frames are not really set up to represent the meaning of a sense in a pluggable form. Consider the meaning of a preposition. Within frame semantics, we can somewhat safely state that each preposition sense corresponds to a frame element, with the preposition object acting as a filler for the slot that the frame element provides. When we start moving to longer pieces of text, including more elaborate preposition definitions, we can’t follow this simple plug and play. The FrameNet folks have attempted to analyze texts into semantic dependency graphs and this approach seems to hold together. But, when we consider complex definitions, we need to consider some sort of compositional approach. To accomplish this, we need to perform some filling of frames along the way. A good example of this is to be found in the definitions of spatial prepositions. FrameNet has two frame elements, Direction and Distance. Many of the definitions fill in values for these frame elements, which can then be dragged along in building representations of a sentence. We find the same phenomenon in definitions of nouns, verbs, adjectives, and adverbs. Thus, we need to develop semantic dependency graphs for each sense, filling in slots as necessary, to provide a full representation that can then be used in representing larger units of text.

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

1 Comment »


Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>