Taxonomy Change Operations

I have been involved in the development of a frame element hierarchy or taxonomy, based on FrameNet’s frame-to-frame relations and frame element definitions. Since I know that this taxonomy is not perfect and can be improved, I need to consider the types of operations that might be involved in making changes. Although this may seem a trivial task, a substantial amount of rigor needs to be maintained. Many other systems (particularly ontologies) also involve some sort of hierarchical relationships, principally the ISA relationship. The operations I consider will embrace these as well.

A taxonomy is a system of classification. (See the definitions returned by Google.) An ontology is very similar,  “a rigorous and exhaustive organization of some knowledge domain that is usually hierarchical“. A taxonomy contains a root, the top level node under which all the other concepts are organized. A taxonomy is a tree, with no cycles, so that when the full taxonomy is given, a strict hierarchy is produced, with the bottommost nodes called leaves.

Given a taxonomy, the following types of changes are envisioned:

  • adding a node: addition of a node may occur either as a new leaf or as an internal node
  • deleting a node: removing a node from the taxonomy, again, either a leaf node or an internal node
  • merging nodes: aggregating two or more nodes, possibly leading to the deletion of a node
  • moving a subtree: changing the hypernym for a node, so that the node and all its children are moved in the taxonomy to another location
  • splitting a node: creating subsets of the definitions of a given node, renaming the new subsets, and positioning the subsets at an appropriate place in the taxonomy

In adding a node as a leaf, there is usually no difficulty as long as we retain the principles under which the taxonomy is being maintained. When we add an internal node, at some intermediate level of the taxonomy, we will need to consider whether we are adhering to these principles for what would be the children of the new internal node.  (For the frame element taxonomy, the addition of nodes will be performed when there is a change in FrameNet.)

Deleting a leaf node should also be straightforward, since we will not be affecting any other nodes in the taxonomy. Deleting an internal node, however, will have repercussions for its child nodes. A decision will have to be made on what to do with these, either deleting them as well or moving them to other places in the taxonomy. (For the frame element taxonomy, the deletion of nodes will be performed when there is a change in FrameNet.)

Merging nodes first involves assessing the target node, i.e., determining whether to keep the name or creating a new name. Second, we have to determine what happens to all the children of each of the nodes being merged. In creating the frame element taxonomy, nodes were merged when there was some problem with the  creation of the digraph image (e.g., a slash in the frame element name) or the node names were differed only in case. (For the frame element taxonomy, the merging of nodes will be performed when there is a compelling reason to do so; this reason will need to be stated explicitly.)

Moving a node, and possibly moving a subtree, involves changing the hypernym of the node. When doing so, it will be necessary to keep in mind the principles underlying the construction of the taxonomy and making sure that the children of the moved node will continue to adhere to these principles. (For the frame element taxonomy, the moving of nodes will be performed when there is a compelling reason to do so; this reason will need to be stated explicitly.)

Splitting a node is perhaps the most interesting operation in changing a taxonomy. First, it will be necessary to determine how to name the new nodes. Second, it will be necessary to identify how the children will be affected. In all likelihood, the children will be split into subsets, with some children going to each of the new nodes. (For the frame element taxonomy, the splitting of nodes will generally be based on an examination of the frame element definitions. It will generally be clear that a node being split has more than one sense. This operation is analogous to the splitting of a word’s meaning in a dictionary into subsenses.)

In making any changes to a taxonomy, it is important to keep a change log. In this way, the full explication of the taxonomy’s construction will be readily available for any further changes. (For the frame element taxonomy, a list of such changes may help the FrameNet lexicographer’s make modifications to the FrameNet data to ensure consistency.)

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>