Word-sense disambiguation has frequently been criticized as a task in search of a reason. Since a considerable portion of a sense inventory has only a single sense, the question has been raised whether the amount of effort required by disambiguation is worthwhile. Heretofore, the focus of disambiguation has been on the sense inventory and has not examined the major reason why we would have lexical knowledge bases: how the meanings would be represented and thus, available for use in natural language processing applications. At the present time, a major paradigm for representing meaning has emerged in frame semantics, specifically in the FrameNet project.
A worthy objective for the Senseval community is the development of a wide range of methods for automating frame semantics, specifically identifying and labeling semantic roles in sentences. An important baseline study of this process has recently appeared in the literature ("Automatic Labeling of Semantic Roles" by Daniel Gildea and Daniel Jurafsky, Computational Linguistics, 28(3):245-288, 2002). The FrameNet project has put together a body of hand-labeled data and the Gildea & Jurafsky study has put together a set of suitable metrics for evaluating the performance of an auotmatic system.
This Senseval-3 task calls for the development of systems to meet the same objectives as the Gildea and Jurafsky study. The data for this task would be a sample of the FrameNet hand-annotated data. Evaluation of systems would follow the metrics of the Gildea and Jurafsky study.
The basic task for Senseval-3 is: Given a sentence, a target word and its frame, identify the frame elements within that sentence and tag them with the appropriate frame element name.
The FrameNet project has just released a major revision (FrameNet 1.1) to its database, with 487 frames using 696 distinct frame elements (although it is not guaranteed that frame elements with the same name have the same meaning). This release includes 132,968 annotated sentences (mostly taken from the British National Corpus). The Senseval-3 task will use approximately 8,000 of these sentences selected randomly from 40 frames (also selected randomly) having at least 370 annotations (out of the 100 frames having the most annotations). Participants will be provided with the following information for each test instance.
<frame name="Cause_fluidic_motion"> <instance lexunit="pump.v" luID="9973" sentID="256263"> <sentence>However, its task is made much more difficult by the fact that derogations granted to the Welsh water authority allow it to pump raw sewage into both those rivers.</sentence> <target start="125" end="128">pump</target> </instance> </frame>
The associated frame elements can be found in the FrameNet file frames.xml. Other information generated during the tagging of the sentence (and potentially of use for training purposes) can be found in the lexical unit file lu9973.xml. A proper answer for this sentence would be:
<frame name="Cause_fluidic_motion"> <instance lexunit="pump.v" luID="9973" sentID="256263"> <sentence>However, its task is made much more difficult by the fact that derogations granted to the Welsh water authority allow it to pump raw sewage into both those rivers.</sentence> <target start="125" end="128">pump</target> <frame_elements> <frame_element name="Agent" start="119" end="120">it</frame_element> <frame_element name="Fluid" start="130" end="139">raw sewage</frame_element> <frame_element name="Goal" start="141" end="162">into both those rivers</frame_element> </frame_elements> </instance> </frame>
In the FrameNet files, the target and frame elements are identified by beginning and ending character positions in the untagged sentence (following traditional string processing where the first position is 0). Results will be submitted as follows, with the item number (the frame name and the sentence ID) followed by the frame element names and their positions in the sentence.
Cause_fluidic_motion.256263 Agent (119,120) Fluid (130,139) Goal (141,162)
FrameNet recognizes the permissibility of "conceptually salient" frame elements that have not been instantiated in a sentence; these are called null instantiations (see the FrameNet Book for a fuller description). An example occurs in the following sentence (sentID="1087911") from the Motion frame: "I went and stood in the sitting room doorway , but I could n't get any further -- my legs would n't move ." In this case, the FrameNet taggers considered the Path frame element to be an indefinite null instantiation (INI). Frame elements that have been so designated for a particular sentence appear to be Core frame elements, but not all core frame elements missing from a sentence have designated as null instantiations. The correct answer for this case, based on the tagging, is as follows:
Motion.1087911 Theme (82,88) Path (0,0)
Null instantiations in submissions should identify the frame element and give "0" for the starting and ending position (hence a string of 0 length). They will be treated as follows in scoring: (1) if the FrameNet tagging indicates a null instantiation and the submission gives a non-null string as the answer, the answer will be scored as incorrect; (2) if a submission identifies null instantiations (perhaps by treating all missing core frame elements as such) and the FrameNet tagging identifies no such null instantiations, the answer will not be penalized); and (3) if a submission does not contain a null instantiation, it will not be penalized.
The test data contains 200 sentences for each frame and will be provided to participants as described above. Participants may use the remaining sentences in these frames for training purposes; a list of these sentences will be available (without the sentence, target, and frame_elements data) in the training set. The training data also includes the answers (in the form above) for the training set. Because of a mismatch between the number of annotations identified in frames.xml and what has actually been annotated in the lu*.xml files, the training data also includes a file identifying the number of sentences in the lu*.xml files that have not actually been annotated. The frames selected for this task include 32,560 annotated sentences, of which 8,000 constitute the test set. On average, 614 sentences are available for training, with a minimum of 170; the range of sentences for training will provide some insights about the importance of training data. Trial data containing full answers is available for 8 frames containing 100 to 105 annotations; these frames will not be used in the test. For this Senseval task, participants may download the training data at any time; the 21-day restriction on submission of results after downloading the training data does not apply since this is a new Senseval task and the dataset is very complex. Participants may work with the training data as long as they like. The 7-day restriction of submitting results after downloading the test data still applies (along with the April 15 deadline).
The sentences provided to participants will not be presegmented (as defined in the Gildea & Jurafsky study); this will be left to the participants' systems. Participants may use (and are strongly encouraged to use) any and all of the FrameNet data in developing and training their systems. In the test, participants may use any of this data, but are strongly encouraged to use only data available in the sentence itself and in the frame that is identified. (This corresponds to the "more difficult task" identified by Gildea & Jurafsky.) Participants may submit two runs, one with (non-restrictive case) and one without (restrictive case) using the additional data; these will be scored separately.
For both cases, the identity of the frame (and all information describing the frame) may be used. (In the present task, no disambiguation will be performed for determining the applicable frame when a lexical unit has more than one sense, i.e., more than one associated frame.) For the restrictive case, participants may use the "syntactic pattern" encoded in the name attribute of the lexical unit's subcorpus tag; this information may be viewed as the grammatical patterns for a lexical unit that might be available in a dictionary. For this case, the grammatical form (GF) and the phrase type (PT) information (particularly the frame element boundaries) would not be used. For the non-restrictive case, any information can be used except the names of the roles (frame elements). More specifically, in this case, the boundaries may be assumed to have been obtained from a prior module. For the non-restrictive case, the task can be viewed as a classification task (the names of the frame elements are not known, but it is known that the words in the sentence are in some frame element).
The basic evaluation metric will be precision and recall. These will be scored for both semantic role identification and for the character position and length. However, since it is expected that identifying the boundaries of a semantic role may be difficult, additional measures will examine the correspondence with the answer position and length (exact match, a superset of the answer, a subset of the answer, and overlapping). Since many semantic roles (frame elements) are very domain-specific, an attempt will be made to identify a mapping to more abstract roles, so that results providing more general roles will also be scored as in the Senseval lexical sample task at a "coarse grain". The current dataset uses the following frame elements most frequently: Agent (82 frames), Cause (49), Degree (142), Depictive (112), Duration (89), Goal (45), Instrument (59), Manner (253), Means (170), Path (47), Place (219), Purpose (122), Reason (103), Result (85), Source (47), Theme (35), Time (241), and Topic (40). Participants will not be penalized if they submit more frame elements than identified in the FrameNet data.
Please address any questions to Ken Litkowski. Since this task is new, these guidelines will be revised based on any comments or suggestions that are received. To assist your consideration, a Windows-based program "FrameNet Explorer" has been developed and is now available (a description of how to use the program is contained in the zip file).
Last revised 3/12/04