Instructions for Using CL Research FrameNet Explorer
FrameNet Explorer (FNE) allows examination of the FrameNet frames, frame elements, and lexical units and creates dataset samples for further analysis or testing. This program operates only in Windows. Running the program assumes you have obtained the FrameNet distribution and have unzipped the files to a root directory with at least five subdirectories: docs, frame, fulltext, lu, and schema. You will need to locate this root directory and specifically the file frameIndex.xml when you start FNE for the first time.
FNE uses the files "frameIndex.xml" and "frRelation.xml" in the root FrameNet directory, the frame files in the frame directory (*.xml), and the lexical unit files in the lu directory (lu*.xml). Each of these files also has an XML stylesheet (*.xsl) which may also be used to view FrameNet data in the standard format. FNE does not use these stylesheets, but presents the data in a different format.
When you first start FNE, you will be asked to locate the file frameIndex.xml; once located, this location will be remembered. It will take some time (20 seconds to a few minutes, depending on the speed of your machine) for the interface to come up, since all the main frame files are loaded and analyzed on start-up. Once started, the top half lists the frames, the total number of frames, and the total number of annotated sentences, and the bottom half provides four tabs for more detailed examination of frames, frame elements, lexical units, and sample selection.
Using FrameNet Help
FrameNet help can be accessed by pushing the Help or Feedback button or pushing the F1 key. When first running FNE, it may be necessary to push the F1 key first to locate and activate the help system. After this, FNE should respond in the normal way for context-sensitive help. That is, when you place the cursor in a field, pressing the F1 key will bring up the help for that topic.
The main list of frames has columns for the frame name, its ID, the number of frame elements, the number of lexical units that have used with the frame, and the number of annotated sentences for the frame. This list can be sorted by clicking on any of the column names.
When you select a frame (click on a frame name), its details will appear on the Frame tab, showing the frame name, its "definition" (description), its frame elements, its lexemes, and the frame-to-frame relations it participates in (see the FrameNet "book", book.pdf, for a description of these relations). If you clear the Frame field and begin typing in it, you will scroll to the first frame beginning with the letters you have typed. To see the details for a particular frame, you must then select it or type the full name.
Examining Frame Elements
On the Frame Element tab, all the frame elements are listed along with the number of frames in which identically-named elements have been used (the same name does not imply the same meaning for the frame element), as well as the total number of distinctly named frame elements. Selecting any element will show details for the frame element: the frames in which it appears, the type of element (core, peripheral, or extra-thematic), and its definition for that frame.
The Frame Element tab shows the total Number of Frame Elements, as obtained from the base FrameNet files. To facilitate scrolling, a frame element name can be entered in the field Find frame element. Right-clicking in the table showing the frame element definitions triggers a pop-up menu item, Save definitions. When selecting this item, the current definitions are appended to the file FEDefs.csv, a tab-separated file containing the columns shown in the table. Pushing the button Save All Definitions iterates through the full list of frame elements, appending the definitions of each frame element to FEDefs.csv; this takes only a few minutes.
Examining Lexical Units
On the Lexical Unit tab, all lexical units that have been identified in the various frames are shown. Each lexical unit is described by its name, its part of speech (P), the lexical unit ID (essentially the file name for the lexeme), the number of annotated sentences, the frame ID, and the definition. This list can also be sorted by clicking on any of the column headings. Selecting a lexical unit will show all the sentences that have been annotated for it, with the sentence ID in parentheses in front of the sentence.
Find lexical unit can be used to scroll to a particular lexical unit. Start typing in the edit box and the first lexical unit beginning with those letters will appear somewhere in the window. When the desired lexical unit appears in the window, select that item and the sentences annotated for the lexical unit will be shown. Remember that a full entry is required, i.e., with the part of speech ending, to distinguish among lexical units with multiple parts of speech. Note that the list of lexical items can be sorted by clicking any of the column headings (i.e., the lexical unit, the part of speech, the lexical unit ID, the number of annotationa, the frame name, the frame ID, or the definition).
Subcorpora: When a lexical unit has been selected, the names of the subcorpora that have been tagged appear in a drop-down list. One of these subcorpora may be selected to show only the annotated sentences within that subcorpora.
Viewing sentence frame elements: Selecting a sentence from the full list or from a subcorpora and then right-clicking displays a popup menu with one item "Show frame elements". If this menu item is selected, the tagging by the FrameNet lexicographers is shown in a separate form. This form contains the frame name, the lexical unit, the full sentence, and a list of the frame elements tagged for the sentence. The frame elements identify the frame element name ("Target" for the main word tagged for the sentence) and the sentence text that has been tagged as constituting the frame element.
Sample Selection Options
The Sample Selection tab provides several options, somewhat disorganized since these options have been developed somewhat experimentally. Some care is required in using these options. These options include:
Extracting a sample of frames and annotations,
Selecting preposition corpus instances,
Identifying other syntactic realizations of frame elements associated with preposition instances, and
Creating XML and DIMAP dictionaries of the FrameNet data
Extracting a Sample of Frames and Annotations
FNE was used in Senseval-3 to generate a test set and a training set for use in the task to identify semantic roles. FNE produces two XML files for the actual sets, two text files containing the answers, and a message file containing information about the sample selection process. These files are described in more detail below. For more details on how the test set and training set were used in Senseval-3, see the web site, where the actual data used in Senseval-3 are provided, along with a Perl script for scoring performance using the actual data or data generated following the instructions below and papers included in the proceedings.
To select the sample,
Specify the number of frames to select in the box next to Number of frames to select,
Specify the number of sentences to be selected from the annotation set of each frame in the box next to Number of sentences to select,
Specify the range of the number of annotations in frames to serve as candidates for selection (in the boxes next to Low and High, either typing them in or using the up-down controls) (the number of annotations for each frame is shown in the list of frames, which may be sorted by pushing the Anns column)
To include the answers in the test set, check the box labeled Provide Frame Elements
Push the button Select Sample
Watch the selection of the sample, in the boxes Frames Completed and Total Number of Sentences in Frames
FNE picks a random sample of frames as given in Number of frames to select with a number of annotations greater than Low and less than High and a random sample of annotated sentences within those frames as indicated in Number of sentences to select to serve as a test set. Sentences not included in the test set will be identified as a training set. If the total number of annotated sentences in a frame is less than 1.5 times the number of desired sentences, the frame is rejected as a valid frame and a message is written to indicate that the size of the training set is too small; lowering the number of desired sentences may correct this problem.
In determining the sentence samples for a given frame, FNE takes into account that some sentences may not have been tagged by the FrameNet lexicographers. These sentences are excluded from the test and training sets. The sample sentences for the test set are selected, with the remaining sentences included in the training set. The files are created based on whether a sentence is part of the test set or the training set.
Each sentence in the test set is printed to the file SensSemRolesTest.xml under its frame giving the lexical unit, the lexical unit ID, the sentence ID, the sentence, and the target word, with its starting and ending zero-based position in the sentence, as follows:
<instance lexunit="happen.v" luID="4886" sentID="796895">
<sentence>The accident happened in frosty conditions shortly before 8am on the A985 near Dunfermline in Fife .</sentence>
<target start="13" end="20">happened</target>
<instance lexunit="happen.v" luID="4886" sentID="796902">
<sentence>The accident happened in a section of roadworks between junctions One and Two of the eastbound M-Fifty in Worcestershire , when a lorry jack-knifed and rolled on top of a car .</sentence>
<target start="13" end="20">happened</target>
Each sentence is the training set has less information and only includes the lexical unit name, the lexical unit ID, and the sentence ID in the file SensSemRolesTrg.xml, as follows. The reduced amount of information was necessary because the size of the file would have been very large otherwise.
<instance lexunit="happen.v" luID="4886" sentID="796917"/>
<instance lexunit="happen.v" luID="4886" sentID="796918"/>
<instance lexunit="happen.v" luID="4886" sentID="796939"/>
<instance lexunit="happen.v" luID="4886" sentID="796947"/>
The answer set files, SensSemRolesTestAns.txt and SensSemRolesTrgAns.txt, both follow the same format, identifying the frame name and sentence ID, followed by all frame elements tagged for that sentence. In general, the number of frame elements is considerably lower than the number of frame elements for a given frame, but have not been instantiated in a given sentence.
Event.796917 Event (67,70) Place (81,93) Time (0,0)
Event.796918 Event (0,3) Place (18,48) Time (0,0)
Event.796939 Event (20,37) Place (47,62) Time (0,0)
Event.796947 Event (0,9) Reason (31,60) Place (0,0) Time (0,0)
Event.796982 Event (0,1) Time (12,37) Place (0,0)
Event.797002 Event (0,4) Time (19,78) Place (0,0)
The final file generated by FNE is SensSemRolesMessages.txt, which identifies any problems that may have occurred in the sample selection.
Progress is shown in the boxes Frames Completed and Total Number of Sentences in Frames.
Since the training set file does not include the sentence and target location, a comparable file can be created by loading the training set, selecting frames to be included, and creating the frame files. The button Load Training Set can be used to load the names of the frames that have been selected, as identified in the file SensSemRolesTrg.xml, into a checklist box below the button. One or more of these frames can then be selected by checking its box. Then, the button Create Frame Files can be used to create a file SensSemRolesTrgTest.xml containing all the sentences in the selected frames. This file will contain the same information as in the file SensSemRolesTest.xml described above. In addition, three files will be created for each of the selected frames. These files will have the base name of the frame and the extensions sents, lexemes, and sentfes, containing the following:
sents: Two lines for each sentence that has been annotated, with the first line containing Frame.nnnn (the frame name and the sentence number) and the second line containing the sentence with the target word preceded by <tag> and followed by </tag>.
lexemes: A list of the lexical units that have been annotated for the frame, each with its part of speech.
sentfes: Multiple lines for each sentence that has been annotated. The first two lines are identical to those included in sents, except that the target has not been surrounded by <tag>. The remaining lines identify each of the frame elements tagged in the sentence, consisting of the frame element name, its text, and its starting and ending positions in the sentence.
Select Preposition Corpus Instances
Not yet implemented for FrameNet 3.0.
The button Select Subcorpora creates a list of corpus instances in the FrameNet lexical units that have the name of a preposition in the form ppprep, such as V-570-np-ppthrough as a subcorpus name. This option is used to identify instances of particular prepositions in The Preposition Project.
Step 1 - On the sample selection tab, enter the preposition, the low and high values of the lexical unit files (minimum 1, maximum >= 10007), and push the button Select Subcorpora. Progress is shown in Frames Completed. This produces the files 'pp-prep.txt' (containing a tab-separated list consisting of the frame name, the frame element name - or "No instances", the lexical unit, the subcorpus name, and the sentence identifier and beginning position of the preposition) and 'pp-prep.sents.xml' (containing a DOCNO element made up of the preposition, the sentence identifier, and the beginning position and a TEXT element consisting of the sentence).
Step 2 - FNE does not seem capable of handling all lexical unit files in one pass. Set Low to 0 to 999, 1000 to 1999, ... , 9000 to 10999. The files produced by each pass should be renamed with 0 to 9 after the preposition name so that the file names will sort into sequential order.
Step 3 - The instances files ('pp-prep?.txt') should be copied and sorted into a single file from the command line with the following: 'copy pp-prep*.txt temp.txt' and 'sort temp.txt > pp-prep.txt'. The last line should be deleted. This file will be suitable for importing into an Excel spreadsheet.
Step 4 - The sentences files ('pp-prep?.sents.xml') should be copied into a single file from the command line with the following: 'copy pp-prep*.sents.xml pp-prep.sents.xml'. The last line should be deleted. In this file, all but the first and last instance of DOCS should be removed. The file should be passed through an XML validator to modify any entity references that cause a problem.
Step 5 - The file 'pp-prep.sents.xml.xml' is suitable for assessing the quality of the 'srtype' assignments that are made in automatic disambiguation of the prepositions.
Other Syntactic Realizations
Not yet implemented for FrameNet 3.0.
The button Other Syntactic Realizations creates a list of other ways in which a frame element associated with a preposition appear in the FrameNet annotations. For example, by may introduce an Authorities frame element in the Arrest frame. Using by as a seed, this option identifies all the ways in which the Authorities frame element is realized, with the associated lexical units, grammatical form, and phrase type. This option is also used in The Preposition Project.
To obtain other syntactic realizations of frame elements identified in analyzing FrameNet preposition instances, enter a preposition in the Preposition field of the Sample Selection tab and push the button Other Syntactic Realizations. This function requires a file, pp-prep.txt (where 'prep' is the preposition), of subcorpus instances labeled by 'pp-prep' and sorted by frame name, frame element, lexical unit, and subcorpus name. The output is saved in the file 'pp-prep.alters.txt', with each line identifying the frame name, the frame element, the lexical unit, the grammatical form (or type of null instantiation), and the phrase type (along with the preposition if the type is PP). This output should be interpreted according to the FrameNet manual's description of grammatical forms and phrase types.
In producing other syntactic realizations of a given frame element in a given frame, each such instance (from a preposition's instances) is used as a seed. From the seed frame name, all lexical units for the frame are examined. Only lexical units with annotations need to be examined. Each annotation is examined to determine if it has the given frame element. If it does, its position is used to identify its grammatical form (or type of null instantiation) and phrase type. If the phrase type is PP (prepositional phrase), the preposition is identified as the first word beginning at the position. A given lexical unit may have several annotations with the same grammatical form, phrase type, and (optionally) preposition; these are not duplicated in the output.
Creating XML and DIMAP dictionaries of the FrameNet data
On the Sample Selection tab, pushing the button Create Dictionaries results in the conversion of the FrameNet data into three files:
An XML file identifying the frames and frame elements (FrameDict.xml)
A file for uploading a more detailed description of the frames into a DIMAP dictionary (frames.dmp), and
A file for uploading the lexical units in the FrameNet data into a DIMAP dictionary (fnlex.dmp) (only produced when Frames Only DIMAP Dictionary is not checked)
Note that if the lexical unit files are being processed, the conversion takes about 3 minutes. Note also that the two upload files can be uploaded into separate dictionaries or into one dictionary.
A demonstration version of DIMAP is available at CL Research. This version is sufficient for most manipulations of the FrameNet data, such as searching through the entries or converting the entries into a user-defined format. A DIMAP dictionary containing the latest conversion made at CL Research is also available for download as a demonstration dictionary.
FrameNet XML File
The XML frame dictionary is printed into the file "FrameDict.xml" and consists of a single frames element, containing frame elements with the name of the frame as "name" attribute. Each frame tag contains an fe element for each frame element, each of which identifies the frame element in the "name" attribute and its core type (Core, Peripheral, or Extra-Thematic) in the "coreType" attribute. This XML file is intended as a simple identification of the FrameNet frames and their frame elements.
DIMAP Frame File
The DIMAP upload frame file, frames.dmp, captures essential characteristics of FrameNet frames and frame relations, extracting information from the core FrameNet files frames.xml and frRelation.xml. This file envisions that frames and frame relations can be represented as ordinary entries in a dictionary. However, these entries are viewed as meta-entries and made distinctive by beginning with special characters, # for frames and @ for frame-to-frame relations. Each type of entry was designed to distill a substantial amount of FrameNet data. The details of this distillation are described in
DIMAP Frame Entries
DIMAP Frame-to-Frame Entries
DIMAP Frame Entries
A DIMAP frame entry is a lexical entry distinguished by having a leading # followed by the frame name, e.g., #Manufacturing. The frame name is taken exactly as given in the "name" attribute of the frame tag in frames.xml, i.e., it is capitalized and may contain underscores. Each frame entry consists of one sense, which is given "none" as the part of speech.
Each frame entry has a set of DIMAP features that identify the frame's frame elements and their core type ("Core", "Peripheral", or "Extra-thematic"). The frame element is the feature name and the core type is the feature value. These are extracted from the "name" and "coreType" attributes of the fe tags in frames.xml.
Each frame entry has a set of DIMAP instances that provide the lexical units for the frame. The lexical units are identified from the lexunit children of the frame. The "name" attribute of each child is dissected to remove its part of speech code (a period and part of speech at the end of the "name" attribute). The "ID" attribute is identified as the sense number in DIMAP instances; this number corresponds to the lexical unit file for this entry, i.e., "lun.xml", where n is the ID number. The existence of a number does not imply that the FrameNet data contain any annotations for the lexical unit.
If a frame participates in any frame-to-frame relations, as identified in the file "frRelation.xml", these are captured in DIMAP roles. Each DIMAP role consists of a name and one or more links. For example, the "#Aggregate" frame has two roles, IS_INHERITED_BY with links to #Store and #Organization, and USES with a link to #Bounded_entity. To obtain the data for these roles, an XPath expression is used to identify the relevant nodes. This XPath consists of a concatenation of "//frame-relation-type[@name=\"", the relation name, "\"]//frame-relation[@", the relation direction ("super" or "sub"), "FrameName=\"", the frame name, and "\"]". The relation name is one of the frame relation names ("Inheritance", "Subframe", "Using", "See_also", "ReFraming_Mapping", "CoreSet", "Excludes", "Requires", "Inchoative_of", "Causative_of", "Precedes", or "Perspective_on"). The role names used in DIMAP are "IS_INHERITED_BY", "HAS_SUBFRAMES", "IS_USED_BY", "Distinguishes", "Was Remapped To", "INCHOF", "CAUSOF", "PRECEDES", "INHERITS", "IS_SUBFRAME_OF", "USES", "See Also", "Was Remapped From", "Has CoreSet", "Has Excludes", "Has Requires", "HAS_INCHOATIVE", "IS_CAUSED_BY", "IS_PRECEDED_BY", or "PERSPON". Many of these relations do not occur in the FrameNet frame-to-frame relation set in "frRelation.xml".
DIMAP Frame-to-Frame Entries
A DIMAP frame-to-frame entry is a lexical entry distinguished by having a leading @ followed by the first frame name, the relation, and the second frame name, e.g., @Communication_noiseINHERITSCommunication. Thus, the entry name encapsulates the two frames and the frame relation in which they participate, including the direction of the relation. These entries are intended to capture the mapping between the frame elements in the respective frames. Each frame-to-frame entry consists of one sense, which is given "none" as the part of speech.
The DIMAP entries capture the following types of relations: "INHERITS", "IS_SUB_OF", "USES", "INCH_OF", "CAUSE_OF", "PRECEDES", or "PERSP_ON". Each entry contains one sense and two features, one with an attribute name of "0" and the other with an attribute name of "1", and each with attribute values consisting of frame elements from the respective frames, in one-to-one correspondence constituting a mapping of frame elements from the first ('0') to the second ('1'). In general, the first feature set of frame elements is more specific and the second feature set is more general in the first three types of relations. For the remaining features, the mapping identifies how the first set of frame elements (in one frame) would be transformed into the second set of frame elements (in the second frame).
DIMAP Lexical Unit Entries
The DIMAP upload frame file, fnlex.dmp, captures essential characteristics of FrameNet lexical units, extracting information from the FrameNet files lu*.xml and le*.xml. A DIMAP lexical unit entry is an ordinary DIMAP entry, consisting of an entry name and one or more senses, each with its own part of speech.
When lexical unit entries are being created (Frames Only DIMAP Dictionary is unchecked), the lexical units are identified in association with each frame in the process of identifying the instances for each frame (see DIMAP Frame Entries). After separating the terminal period and part of speech from the lexical unit name, the lexical unit and the part of speech are encoded for DIMAP upload. The lexical unit ID is also identified in this process and is used to load the appropriate lexical unit files (lu*.xml and le*.xml).
The definition is extracted from the lu*.xml file. Each definition in the FrameNet data is preceded by a source code ("COD" for Concise Oxford Dictionary and "FN" for FrameNet). This code is removed from the definition. The definition is entered in the DIMAP definition field. The frame, the source code, and the ID number are entered as DIMAP features.
Finally, frame realization data is extracted from the lexical entry files (le*.xml). The "//FERealizations/FERealization" nodes The "total" attribute of the FERealization element is extracted. Then, the "fe" (frame element), "pt" (phrase type), and "gf" (grammatical function) attributes of the valence-unit element are extracted. These are all encapsulated in a single DIMAP feature, with the name equal to the concatenation of the phrase type and the grammatical function in parentheses and the value equal to the concatenation of the frame element and the total number of annotations in parentheses. A DIMAP sense may have several of these features.
The lexical units are encountered in conjunction with the traversal of the frames. The same lexical unit may be encountered in another frame, with different properties. When fnlex.dmp is uploaded into a DIMAP dictionary and an existing entry has already been created, the new data is added as another sense.
To report any bugs, request new or enhanced features, obtain product help or documentation, ask a question, make a comment, or request further information from CL Research, send feedback to CL Research (http://www.clres.com/clr/feedback.php?clrdemo=fne).