A thesaurus contains synonyms, "broader than," "and narrower than" terms. With DIMAP, you or CL Research can parse
a (set of) dictionary (definitions) to identify how different entries relate to one another. The amount of effort depends, of
course, on the size of the dictionary. As a guide, processing of Webster's 2nd International Dictionary containing 120,000
headwords and 270,000 definitions took approximately 40 hours, much of which was background processing.
Familiarization may require additional time.
To create the thesaurus yourself, you will need to put your dictionary entries into the format used to upload them into
DIMAP format. The file format is described in the help file provided with the experimental DMP3A. If you are unable to
create the entries directly, CL Research will provide the C source code for a program (applicable against a marked-up
ASCII file). Alternatively, CL Research will modify the program to meet your format for $200.
Once the data are in the proper format for uploading into DIMAP dictionaries, the experimental DMP3A can be used
with a couple of menu selections to create the dictionaries. Parsing the definitions and creating the thesaurus require only
a few more menu and dialog selections.
After DIMAP dictionaries are created, they will then be suitable for more extended thesaural and semantic relations as
DMP3A is developed further.
If you require further assistance, CL Research can customize DMP3A to meet your needs. Please inquire.
Ontology Development
An ontology is an organization of concepts with one another, most specifically, a categorization of entities and actions. A
full ontology may deal with all knowledge, but it is possible to construct an ontology for a single field of study.
The main organizing principle of an ontology is the ISA backbone ("a horse is an animal"). A richer ontology contains
additional relations between concepts. These relations may include the thesaural relations of synonyms and antonyms, but
typically would include a breakdown of the general "related-to" thesaural relation into many semantic relations. At a
minimum, the semantic relations would include "part" relations that identify conceptual entities which are construed as
parts, constituents, or substances making up another entity. A more elaborate system would identify semantic relations
such as "agent", "instrument", "purpose", "location", "result", "cause", "manner", and "entailment". There is no general
agreement on the set of semantic relations and it is possible that a set may be somewhat arbitrary and depend on a user's
needs.
The DMP3A experimental version has now been extended to enable a user to define and identify many semantic relations
from parsing of a word's definitions. These relations can be encoded as part of the dictionary used to parse the definitions
by specifying "defining patterns" associated with individual words. (For details, see the discussion of semantic relations in
the Dictionary Parsing Project.)
DMP3A can be used directly to add such relations to dictionary entries based on parsing definitions.
Please inquire if you wish assistance in developing an ontology and set of relations specific to your needs.
Conceptual Organization
When hiearchical relations have been entered into DIMAP dictionaries, it is possible to create a conceptual organization
to a dictionary. This organization will identify the more basic and the more complex concepts within a field (perhaps an
entire general vocabulary, but preferably within a smaller sublanguage area).
Using the hierarchical relations, DIMAP can analyze the underlying dictionary graph to identify the primitive elements
and the ordering of concepts based on complexity. This can be accomplished through the menu selection for analyzing the
dictionary digraph.
If you require assistance in interpreting this digraph, CL Research can show you how to interpret the results. Please
inquire.