Google Groups
PrepositionDisambiguation   Subscribe to PrepositionDisambiguation
Visit this group   Email:

The Preposition Project (TPP) is designed to provide a comprehensive characterization of English preposition senses suitable for use in natural language processing. Each of 673 preposition senses for 334 prepositions (mostly phrasal prepositions) has been described by giving it a semantic role or relation name and by characterizing the syntactic and semantic properties of its complement and attachment point. Each sense is further described by its definition and sample usages from the Oxford Dictionary of English, its position in a semantic hierarchy of prepositions, its basic syntactic placement (as described in A Comprehensive Grammar of the English Language), other synonymic prepositions filling a similar semantic role, FrameNet frames and frame elements used to describe the complement, other syntactic forms in which the semantic role may be realized. (An online version, Online TPP, is available for examining the data in a convenient lookup format and for downloading the entire preposition sense inventory.)

The database was constructed following corpus-based lexicographic principles, guided by computational linguistic and lexicologic considerations. For the 56 most common English prepositions (see accompanying table), a corpus of over 27,000 sentences containing preposition instances was drawn from the FrameNet database of sentences tagged with semantic roles (frame elements). Since the FrameNet database was not constructed with prepositions in mind, this corpus provides a high-quality, independent, an unbiased corpus that considerably facilitates the construction of a high-quality preposition database.

Steps following in constructing the database are described in the following sections:

  1. The Preposition Sense Inventory
  2. Methodology for Sense Disambiguation of Preposition Instances in FrameNet
  3. Analyzing the Semantic Role for a Sense
  4. Gold Standards for Preposition and Semantic Role Disambiguation
  5. Identifying Other Prepositions and Other Syntactic Realizations Filling the Same Semantic Roles
  6. Propagating Meanings Via the Preposition Digraph
  7. The Preposition Databases
  8. Project Structure and Support
  9. Papers from The Preposition Project and SemEval-2007

The Preposition Sense Inventory

The Oxford Dictionary of English (ODE) (and its predecessor, the New Oxford Dictionary of English (NODE)) was chosen as the source of the preposition sense inventory because of the clarity and organization of its senses and its reliance on corpus evidence in its construction. Litkowski (2002) describes how prepositions in NODE were identified, particularly procedures used for identifying phrasal prepositions that are not accorded headword status and appear, unlabeled as prepositions, under other headwords. As indicated there,373 prepositions (listed in the appendix) and 847 preposition senses were identified and entered into a preposition dictionary using CL Research's DIMAP dictionary maintenance programs. This dictionary, with modifications as noted in Litkowski (2002) and with further modifications emerging in the project, forms the basis for TPP's database.

TPP's database now includes the definitions and examples from ODE, with kind permission granted by Oxford University Press. These definitions and examples are still copyrighted by Oxford University Press and may not be used without their permission.

During the course of TPP, a more careful lexicographic examination of the initial set of prepositions has led to several prepositions being dropped from the database. In addition, many prepositions have had senses added or revised in wording. The current database contains 334 entries with 673 senses. Further revisions are expected as refinements to the database are made.

Methodology for Sense Disambiguation of Preposition Instances in FrameNet

As indicated above, the core of TPP involved the assignment of senses to instances of prepositions in sentences drawn from FrameNet (1.1). The accompanying table shows the number of senses and the number of FrameNet instances for the 56 prepositions that were examined in detail. After selecting a preposition for study, the FrameNet corpus instances were obtained using CL Research's publicly available FrameNet Explorer (FNE). The FrameNet 1.1 database included approximately 7,500 XML lexical unit files, each of which contains tagged sentences for a specific lexical item and frame (e.g., the item move.v in the Motion frame). Tagged sentences are grouped into subcorpora, each of which has a name. The name encodes salient syntactic properties of the subcorpus, e.g., V-730-s20-ppacross, which includes sentences using the verb move that include a prepositional phrase beginning with across (which are tagged as instances of the Path frame element within the Motion frame).

FNE allows the user to specify a preposition and each of the 7,500 lexical unit files is examined to find subcorpora that have ppprep in the name. For each subcorpus having the desired name (e.g., ppby), a line is written to a text file. This line contains a tab-separated list of five elements: the frame name, the frame element, the lexical unit, the subcorpus name, and the sentence ID and starting position in the sentence of the preposition. Table 1 shows sample lines from the instance file for by.

Table 1. Preposition Instance File Sample Lines

Frame

Frame Element

Lexical Unit

Subcorpus

Identifier-Position

Achieving_first

No_instances

originate.v

V-570-s20-np-ppby

 

Arrest

Authorities

arrest.v

V-730-s20-ppby

875350-43

Arrest

Authorities

arrest.v

V-730-s20-ppby

875353-71

Arrest

Authorities

arrest.v

V-730-s20-ppby

875362-160

Arrest

No_instances

apprehend.v

V-730-s20-ppby

 

In constructing a line, each annotationSet of the FrameNet data is examined. This involves identifying elements in the PT (phrase type) annotation layer that have a PP (prepositonal phrase) label name, locating the element in the sentence, and assuring that it begins with the specified preposition. When this is confirmed, the FE (frame element) layer for this element is used to establish the frame element name. The example data shown above indicate that no sentences containing a prepositional phrase beginning with by were tagged in the subcorpora for originate.v and apprehend.v, but that 3 sentences were tagged for arrest.v, in the Arrest frame where the prepositional phrase beginning with by was labeled as the Authorities frame element. As implied above, the file of instances is sorted by frame name and is imported into an Excel spreadsheet.

The instance file generated by this method does not represent all instances of a preposition in the FrameNet database. A given preposition may be tagged in a subcorpus having a name that does not indicate its presence (e.g., a subcorpus pp-across may contain sentence instances where a prepositional phrase beginning with by has been tagged with some frame element). For the major prepositions, the method above generated from 300 to 4500 instances for each preposition. For other common prepositions, each sentence in the lexical unit files was examined to identify all PPs and those beginning with the preposition were extracted.

Using this instance file as a guide, the lexicographer began the process of analyzing the preposition's senses. An Excel spreadsheet was initiated for the preposition, with one row for each sense in the DIMAP preposition dictionary (with the ODE sense number included in parentheses). The lexicographer examined the definitions for the preposition, available information about the preposition in Quirk et al., and the FrameNet corpus instances. He then assigned a semantic role name and identified the usual syntactic function of a prepositional phrase beginning with the preposition in the specific sense (noun postmodifier (1); adverbial adjunct (2a), subjunct (2b), disjunct (2c), or conjunct (2d); and/or verb (3a) or adjective (3b) complement, as described in paragraph 9.1, p. 657 of Quirk et al.). The lexicographer then ascertained the paragraph, if any, in Quirk et al. that provides a semantic description of the instant sense. This paragraph may also identify other prepostions that have a similar sense and use; these other prepositions are also recorded in the spreadsheet, along with any others that the lexicographer intuits may have a similar meaning. Table 2 shows this information for five (of 13) senses of through.

Table 2. Sample Senses for 'through'

Sense

Relation Name

Quirk Syntax

Quirk Paragraphs

Complement Properties

Attachment Properties

1 (1)

ThingTransited

2a, 3a

9.25, 9.28

opening, channel, or location

verbs of motion

2 (1a)

ThingBored

1, 2a, 3a

9.25, 9.28

permeable or breakable physical object

verbs denoting penetration

3 (1b)

ThingTransited

1, 2a, 3a

9.25, 9.26, 9.28

sth regarded as homogenous

verbs of motion

4 (1c)

ThingPenetrated

1, 2a, 3a

None

a permeable obstacle

a perceived object; sometimes complement of a verb of perception

5 (1d)

ChannelTransited

1, 2a, 3a

9.19, 9.22, 9.27

an opening or obstacle

copula or verb of location

Based on the definition and the corpus instances, the lexicographer then characterized the syntactic and semantic properties of the sense's complement and attachment point, based on an interpretation of the definition. Next, in the instance spreadsheet, the lexicographer assigned a sense number to each sentence instance (i.e., no sense assignments are made when no sentences have been tagged for a subcorpus). The lexicographer used FNE for this purpose, using the information provided above for each corpus instance. With FNE, a lexical unit (such as arrest.v) can be entered and all sentences annotated for it are displayed immediately. In addition, all subcorpus names are displayed in a drop-down list; by selecting the relevant subcorpus (e.g., V-730-s20-ppby), the lexicographer can see just those sentences. The lexicographer could then determine which sense of the preposition is applicable. Since similar items may be grouped together (i.e., frame name, frame element name, and lexical unit), several instances were tagged at a time.

In some instances, the lexicographer found that multiple senses are applicable; in this case, each applicable sense number is included. Overall, less than 500 instances have multiple senses. The lexicographer also found that the sense inventory requires splitting of senses; during the first phase of the project, the sense inventory was expanded by approximately 10 percent. A major innovation of ODE was the development of a mini-hierarchy in grouping senses, leading to core senses and subsenses that are semantically similar to the core, but may represent some type of sense extension, usually a narrowing or broadening of meaning (see below for tables showing examples of how these were coded). The lexicographer annotated each subsense with the type of extension. During the course of annotating the FrameNet instances for major prepositions, the lexicographer kept notes and prepared a summary describing the treatment of the preposition. When warranted, a specific comment was attached to a particular sense, frequently with reference to the preposition's summary.

After the instances for major prepositions had been completed, the lexicographer began a systematic traversal through the dictionary for all prepositions that had not yet been analyzed. When a preposition with instances was reached, the procedures described above were followed. For prepositions without FrameNet instances, the lexicographer made use of other corpora available to him (such as the British National Corpus) to analyze a prepositions senses. Each sense of these prepositions was characterized in the same way as above. The only difference is that for these less prominent prepositions (usually with only one or two senses), no set of tagged sentences is available.

From a lexicographic perspective, it turns out each source of information about the behavior of a preposition is incomplete in itself. All three sources in use on the project are complementary in providing an overall assessment of the meaning and characterization of the preposition. ODE may be found wanting when placed next to the FrameNet instances; this project thus revealed further aspects of the appropriate sense inventory. ODE does not provide a summary picture of a preposition's meanings; the characterization in Quirk et al. provides such a perspective, but it too is incomplete, both in coverage of a particular meaning and in not identifying correspondences with other prepositions. The FrameNet database does not provide instances for all the senses. Litkowski & Hargraves (2005) provides further details on the use of the three different sources. Litkowski & Hargraves (2006) provides further details on the coverage of the three sources.

Analyzing the Semantic Role for a Sense

With the tagged instances, a simple sort by sense number of the Excel spreadsheet identifies the (Frame Frame_Element) pairs for each sense. These pairs are aggregated into one list in the Sense Analysis spreadsheet. Table 3 shows these pairs for the first three senses of through.

Table 3. Frame:FrameElement Pairs Identified for Senses of 'through'

Sense

Relation Name

Frame:FrameElement Pairs

1 (1)

ThingTransited

Arriving:Path; Cause_motion:Path; Cotheme:Path; Departing:Path; Escaping:Location; Escaping:Path; Evading:Path; Fluidic_motion:Path; Mass_motion:Path; Motion:Path; Motion_directional:Path; Motion_noise:Path; Operate_vehicle:Path; Path_shape:Path; Placing:Goal; Placing:Path; Removing:Path; Roadways:Area; Self_motion:Area; Self_motion:Path; Breathing:Path

2 (1a)

ThingBored

Cause_harm:Body_part; Impact:Impactee; Natural_features:Relative_location; Use_firearm:Path

3 (1b)

ThingTransited

Emotion_heat:Location; Path_shape:Area; Ride_Vehicle:Path; Roadways:Path; Self_motion:Self_mover; Travel:Path

As indicated above, the lexicographer identified a semantic role label for each sense based on intuition. These labels are developed somewhat independently of (computational) linguistic theories. These labels are intended to be used in characterizing prepositional phrases, after using criteria laid out in the complement and attachment syntactic and semantic properties for disambiguating the prepositions. Gildea & Jurafsky (2002) developed a mapping of frame elements into 18 higher level semantic roles. The methodology followed here provides an alternative mapping that is more data-driven and less subjective. In many senses for which FrameNet instances were identified, there is a clear correspondence between the frame element names and the semantic relation assigned by the lexicographer.

After about half the prepositions had been analyzed, the lexicographer began to group the semantic roles into higher level categories, i.e., generic classes of prepositions. There are 21 categories at the present time, including common semantic role names such as Agent, Cause, Means, Spatial, and Temporal, but also less common names such as Backdrop, Quantity, Scalar and Target. The less common names emerged from the data. Each preposition sense, and its associated semantic role, was examined carefully with respect to these categories and placed in one of them. This examination led to refinement of the categories and to changes in the semantic role names as the full sense inventory was completed. Although the generic classes are still regarded as preliminary, they provide an initial taxonomy for the complete set of preposition senses in the English language.

The Frame::FrameElement pairs identified in the project show the range and variation of frame elements that have been developed by the FrameNet lexicographers. Frame::FrameElement) pairs and lexical units are shown in Table 4 for through (sense 3), given the label ThingTransited. Examination of a table like this might indicate that this sense encapsulates a Path semantic role. Since other senses of through also have a Path role, the lexicographer's assignment indicates a finer granularity on the type of path. At the same time, however, the FrameNet assignment of an Area frame element for hitchhike also indicates a finer granularity on the type of path, suggesting that the path might be through a region. The other Path frame elements might also have such an interpretation.

Table 4. Analysis of Sense 3 (ThingTransited) for through

 Frame:Frame_Element

 Lexical Units

 Emotion_heat:Location

boil.v seethe.v burn.v

 Path_shape:Area

crisscross.v

 Ride_Vehicle:Path

hitchhike.v

 Roadways:Path

bypass.n highway.n line.n motorway.n path.n pathway.n road.n street.n track.n trail.n

 Self_motion:Self_mover

sprint.v

 Travel:Path

journey.n journey.v tour.n travel.v

This type of analysis demonstrates the richness of the data generated by tagging instances. This type of analysis has only begun. Efforts are currently under way to use this type of analysis to integrate results from TPP into FrameNet.

Gold Standards for Preposition and Semantic Role Disambiguation

In addition to identifying the instances for the lexicographer to use in characterizing the different senses of a preposition, an XML file of the sentences themselves was also generated. Each sentence was given an identifier consisting of the preposition name, the sentence number, and the character position of the preposition. The sentences for which the preposition senses have been assigned constitute a suitable corpus for the development of disambiguation routines for semantic role assignment. In this respect, these sentences are essentially equivalent to the lexical sample task followed in Senseval. In addition, since these instances are FrameNet tagged sentences, they provide a suitable dataset for the Senseval FrameNet semantic role task.

The sentence instances for 34 of the most common prepositions were used in a SemEval-2007 task on disambiguation of prepositions (Litkowski & Hargraves (2007)). The accompanying table shows the number of sentences that were used in SemEval-2007 for each preposition, broken down into the number used for training and the number used for testing. The papers by Ye and Baldwin (2007), Yuret (2007), and Popescu et al, (2007) describe the results of the three teams participating in this task. These results show considerable progress in disambiguating preposition senses, with nearly 70 percent accuracy by the top performing team.

Litkowski (2002) described a set of disambiguation tests for the preposition of, based solely on introspection of its definitions. Those tests are not sufficient. As implied in Table 2, the complement and attachment properties require a richer set of semantic tests for which suitable lexical resources do not presently exist. Sense 1 of through requires that the prepositional phrase be attached to a verb of motion; WordNet has a general motion category for verbs, so in this case, a suitable test can be made. However, for sense 2, it is necessary to identify verbs of penetration; no such category is available in WordNet. A Roget-style thesaurus might provide the necessary information (e.g., look up penetration in the thesaurus and then examine the verbs in the same thesaurus category).

Prepositions, like verbs, may have associated subcategorization patterns (e.g., requiring a gerundial complement such as a means sense of by). The Quirk syntax described in Table 2 provides some additional syntactic properties. In general, however, it appears that syntactic properties will not be sufficient. The machine learning algorithms used in Senseval for the semantic roles task may prove to be the most appropriate set of techniques.

An important question surrounding the use of prepositions is whether the phrases they introduce are arguments or adjuncts. In Merlo & Esteve Ferrer, “The Notion of Argument in Prepositional Phrase Attachment” (Computational Linguistics, 32(3), pp. 341-78), it was shown that argument-hood could be predicted, frequently in conjunction with lexical classes of their attachment points. A current task under TPP is to examine whether it is possible to predict whether senses in the preposition inventory can be assigned to argument or adjunct status based on their characteristics, as developed in TPP (see Hargraves (2007) for an initial assessment of this possibility).

In any event, the corpus instances developed in TPP can serve as an appropriate testbed for the development of disambiguation routines. The properties identified by the lexicographer will be used in the further development of these routines, particularly in the use of various lexical resources, including syntactic dictionaries, WordNet, machine-readable dictionaries, and thesauruses. It is expected that further development of these routines will lead to further refinement of the lexicographer's characterizations as well as a greater level of specificity about the kinds of information necessary from the lexical resources. Further examination of results from SemEval-2007 is ongoing.

Identifying Other Prepositions and Other Syntactic Realizations Filling the Same Semantic Roles

A tagged sentence in the FrameNet database identifies a specific frame element within a specific frame for the prepositional phrase introduced by the preposition. For example, by introduces the frame element Mode_of_transportation or Path in the Arriving frame. The FrameNet database can be queried to determine other prepositions and other syntactic realizations in which these frame elements occur. The distinct patterns in which these occur are summarized by identifying all unique occurrences of (Frame Frame_Element Lexical_Unit Grammatical_Function Phrase_Type Preposition) within the database. Preposition is included only when the Phrase_Type is PP. There may be many sentences that have been tagged similarly, but only unique occurrences need to be identified to examine the distribution of the same frame element.

In the example in Table 5 below (taken from the file generated for the preposition by, several combinations are evoked by the seed element. The Mode_of_transportation frame element was seeded by the instances for arrive.v and/or come.v (sense 8 of by); the Path element was evoked by the instances for enter.v (sense 5 of by). It can be seen that in addition to by, in is also used to indicate the Mode_of_transportation frame element, also as a Complement to the main verb. For the Path frame element, in addition to by, the prepositions on, through, via, round, past, towards, and across are used. The Path frame element is also expressed as the Direct Object for one verb, come.

Table 5. Variations in Syntactic Realizations of a Frame Element for 'by'

Frame

Frame Element

Lexical Unit

GF

PT

Preposition

Arriving

Mode_of_transportation

arrive.v

Comp

PP

by

Arriving

Mode_of_transportation

arrive.v

Comp

PP

in

Arriving

Mode_of_transportation

come.v

Comp

PP

by

Arriving

Mode_of_transportation

return.n

Comp

PP

by

Arriving

Path

approach.v

Comp

PP

on

Arriving

Path

approach.v

Comp

PP

through

Arriving

Path

approach.v

Comp

PP

via

Arriving

Path

arrive.v

Comp

PP

through

Arriving

Path

arrive.v

Comp

PP

via

Arriving

Path

come.v

Comp

PP

round

Arriving

Path

come.v

Comp

PP

through

Arriving

Path

come.v

Comp

PP

via

Arriving

Path

come.v

Obj

NP

 

Arriving

Path

enter.v

Comp

PP

at

Arriving

Path

enter.v

Comp

PP

by

Arriving

Path

enter.v

Comp

PP

through

Arriving

Path

enter.v

Comp

PP

via

Arriving

Path

get.v

Comp

PP

past

Arriving

Path

reach.v

Comp

PP

by

Arriving

Path

reach.v

Comp

PP

through

Arriving

Path

reach.v

Comp

PPing

 

Arriving

Path

return.n

Comp

PP

towards

Arriving

Path

return.v

Comp

PP

across

In a second example, shown in Table 6, 52 lines were generated for the Cure:Treatment combination from a single instance of through, via the verb rehabilitate.v (sense 12, labeled Intermediary by the lexicographer, but essentially a means semantic role). The Cure:Treatment pair occurs in a much greater range of lexical items, including not only verbs (alleviate, cure, ease, heal, rehabilitate, resuscitate, and treat), but also nouns (cure, healer, palliation, remedy, therapist, therapy, and treatment) and adjectives (curative, palliative, rehabilitative, and therapeutic). Examining just those with a Phrase Type of "PP", we see that by, with, without, and for are other prepositions in addition to through expressing the Treatment frame element.

Table 6. Variations in Syntactic Realizations of a Frame Element for 'through'
Frame  Frame Element Lexical Unit GF PT Preposition

Cure

Treatment

alleviate.v

Comp

PP

by

Cure

Treatment

alleviate.v

Comp

PP

with

Cure

Treatment

alleviate.v

Comp

PPing

 

Cure

Treatment

alleviate.v

Ext

NP

 

Cure

Treatment

curative.a

Ext

NP

 

Cure

Treatment

curative.a

Head

N

 

Cure

Treatment

cure.n

Comp

NP

 

Cure

Treatment

cure.n

Comp

VPing

 

Cure

Treatment

cure.n

Ext

NP

 

Cure

Treatment

cure.v

Comp

PP

by

Cure

Treatment

cure.v

Comp

PP

with

Cure

Treatment

cure.v

Comp

PP

without

Cure

Treatment

cure.v

Comp

PPing

 

Cure

Treatment

cure.v

Ext

NP

 

Cure

Treatment

ease.v

Comp

PP

by

Cure

Treatment

ease.v

Comp

PP

with

Cure

Treatment

ease.v

DNI

 

 

Cure

Treatment

ease.v

Ext

NP

 

Cure

Treatment

heal.v

Comp

PP

by

Cure

Treatment

heal.v

Comp

PP

with

Cure

Treatment

heal.v

Comp

PPing

 

Cure

Treatment

heal.v

Ext

NP

 

Cure

Treatment

healer.n

Ext

NP

 

Cure

Treatment

palliation.n

Mod

N

 

Cure

Treatment

palliative.a

Head

N

 

Cure

Treatment

rehabilitate.v

Comp

PP

through

Cure

Treatment

rehabilitate.v

Comp

PPing

 

Cure

Treatment

rehabilitate.v

Ext

NP

 

Cure

Treatment

rehabilitative.a

Head

NP

 

Cure

Treatment

remedy.n

Comp

AJP

 

Cure

Treatment

remedy.n

Comp

NP

 

Cure

Treatment

remedy.n

Comp

PP

for

Cure

Treatment

remedy.n

Comp

PPing

 

Cure

Treatment

remedy.n

Mod

AJP

 

Cure

Treatment

resuscitate.v

Comp

PP

through

Cure

Treatment

therapeutic.a

Comp

NP

 

Cure

Treatment

therapeutic.a

Ext

NP

 

Cure

Treatment

therapeutic.a

Head

N

 

Cure

Treatment

therapist.n

Mod

N

 

Cure

Treatment

therapy.n

Ext

NP

 

Cure

Treatment

therapy.n

INI

 

 

Cure

Treatment

therapy.n

Mod

AJP

 

Cure

Treatment

therapy.n

Mod

N

 

Cure

Treatment

treat.v

Comp

AVP

 

Cure

Treatment

treat.v

Comp

PP

by

Cure

Treatment

treat.v

Comp

PP

with

Cure

Treatment

treat.v

Comp

PPing

 

Cure

Treatment

treat.v

Ext

NP

 

Cure

Treatment

treatment.n

Comp

NP

 

Cure

Treatment

treatment.n

Comp

PP

by

Cure

Treatment

treatment.n

Comp

PP

with

Cure

Treatment

treatment.n

Ext

NP

 

Using the frames and frame elements from all sense-tagged instances as seeds, 9309 lines and 5440 lines similar to those in Tables 5 and 6 were generated for by and through, respectively. These results can then be examined by sense number and lead to an identification of all other prepositions expressing the frame elements as shown in Table 3. These prepositions are shown in Table 7 alongside those the lexicographer listed on the basis of intuition and Quirk assessments of semantic similarity.

Table 7. Other Similar Prepositions for Senses of 'through'

Sense

Lexicographer Prepositions

Prepositions Identifiable from FrameNet

2 (1a)

into

into; on; over; about; at; across; in; under; against; between; through; around; with; behind; off; onto; towards; by; down; outside; along; near; below; beneath; above; of; within; underneath; beside; beyond; throughout; close; up; for; from

3 (1b)

among, within

inside; through; under; within; at; beneath; amongst; between; on; behind; among; above; around; over; all; close; across; along; down; towards; up; past; via; from; of; alongside; by; with; to

The number of other prepositions expressing frame elements encompassed by a single sense was quite surprising. The first explanation for this large number was simply that the lexicographer had overlooked some possibilities. And indeed, upon reviewing the lists, the lexicographer could imagine substituting some of the suggestions in example sentences. However, the large number requires a more systematic explanation. To assess the substitutability of other prepositions for a given semantic role, the lexicographer first examined their definitions in ODE for similarity. Many had similar definitions, but many did not. The lexicographer then examined the definitions in the Oxford English Dictionary (OED), which has a much larger number of senses than ODE. Rather than finding similar senses, the lexicographer concluded that, in fact, ODE simply provided a better organization of the many senses, ignoring obsolete and dated senses. Instead of attempting to reach a final conclusion on substitutability, this issue will await further data when the other prepositions undergo their sense tagging. The analysis at that time will examine the semantic role assignments for prepositions deemed substitutable and determine their congruence. In particular, it will be possible to examine the array of frame elements of putative substitutable senses.

      In addition to the other preposition analysis, the FrameNet data support an in-depth examination of other methods of realizing frame elements. For example,the alternation patterns for expressing the Treatment frame element appear to vary by part of speech of the lexical item. For verbs, we have "Comp PPing" (a complement prepositional phrase containing a gerund), "Ext NP" (an external argument, i.e., the subject of the verb), "DNI" (a definite null instantiation, indicating that the element is an anaphor), and a "Comp AVP" (a complement adverbial phrase, e.g. treated pharmacologically). Similar variations are indicated for nouns and adjectives. These semantic role alternations await further study.

Propagating Meanings Via the Preposition Digraph

As described in Litkowski (2002), a digraph of the prepositions permits propagation of meanings via inheritance, where the final preposition of a preposition definition is like a hypernym or superordinate. Two types of preposition definitions were identified there: (1) usage notes and (2) those defined using other prepositions or verbs. The usage note definitions were characterized as being primitives. All but two of the 22 definitions of by are usage notes, whereas only two of the 13 definitions of through are usage notes. These prepositions are used in defining the following other prepositions (sense numbers are indicated in parentheses).

Virtually all these senses are defined with a past participle followed by by. For example, according to is defined as stated by, for as employed by, and in the grip of as dominated or affected by. The lexicographer judges that, in these instances, the second sense of by (AgentName) is the appropriate sense. Thus, it would seem that some inheritance is occurring. However, viewing the preposition digraph as an inheritance hierarchy does not follow the usual principles of inheritance typical for nouns and verbs and must be investigated further.

The lexicographer examined the 2-level hierarchy present in ODE senses (core senses and their subsenses). ODE states that subsenses are usually figurative extensions, specializations, or other relations to the core sense. In this effort, the lexicographer found that the relation of the subsenses to the core senses was based on some small bit of expanded (3) or narrowed (2) meaning, and that figurative extensions (1) did not apply (and are unlikely in general to apply to function words). Whether these bits of meaning are involved in any putative inheritance will be studied further as TPP continues. Tables 8 and 9 show these relations for by and through.

Table 8. Sense Relations for 'by' (Note sense numbering in parentheses)

Sense

Relation

Comment

1(1)

(core)

core sense: agent

2(1a)

2

specific to cases where passive verb precedes prep.

3(1b)

2

specific to cases where noun precedes prep.

4(1c)

2

specific to authors and other creators

5(2)

(core)

core sense: means

5(2)-1

3

extension: idea of attaching

6(2a)

2

specific to rendering of terms (see spreadsheet)

7(2b)

2

specific to names (see spreadsheet)

8(2c)

2

specific to means of transport

9(2d)

3

extension: notion of agent (parentage)

10(2e)

3

extension: notion of agent (parentage in animals)

11(2f)

2

specific to fixed phrases (see spreadsheet)

11(2f)-1

3

extension: ambience as a contributory means

12(3)

(core)

core sense: amount

13(3a)

3

reduction rather than extension: this is amount, without the sense of margin noted in 12(3). It would make as much sense for this to be the core, as 12(3) to be the subsense.

14(3b)

3

extension: unit of time (closely related to 15)

14(3b)-1

3

extension: notion of repetition

15(3c)

3

extension: notion of parameter

16(3d)

2

specific to dimensions and multiplication

17(4)

(core)

core sense: deadline, or time terminus

18(5)

(core)

core sense: location

19(5a)

3

extension: movement (passed a location)

20(6)

(core)

core sense: period of time

21(7)

(core)

core sense: concerning

22(8)

(core)

core sense: introduction of oaths

Table 9. Sense Relations for 'through' (Note sense numbers in parentheses)

Sense

Relation

Comment

1(1)

(core)

core sense: one side to the other

2(1a)

3

extension: so as to make a hole

3(1b)

3

extension: with reference to crowd, group, or other thing regarded as homogenous

4(1c)

3

extension: idea of perception

5(1d)

2

specific to locations

5(1d)-1

3

extension: movement via (an intermediate place)

6(1e)

2

specific to angles; often expressed in degrees

7(2)

(core)

core sense: progress toward completion

8(2a)

3

extension: completion with success

9(2b)

2

specific to things requiring endurance or suffering

10(3)

(core)

core sense: inspection

10(3)-1

3

extension: idea of pervasiveness

11(4)

(core)

core sense: inclusive period of time

12(5)

(core)

core sense: means

12(5)-1

2

specific to means of attachment

13(5a)

3

extension: intermediary or agent

The Preposition Databases

All the information developed in TPP is available. The summary results from TPP are best obtained by downloading the XML version of the entire database available from the online version, Online TPP. Summary results can be viewed in web pages for each preposition (by, through,with, for, of). (Note that the web pages for of show the most complete set of data.) In the accompanying table, there is a link to a zipped file (prep.zip, where prep identifies the preposition) containing

  1. The Excel spreadsheet containing the data in the web pages (Sense Analysis prep.xls),
  2. The tab-separated text file used to create the Excel spreadsheet (pp-prep.txt) and the Excel spreadsheet containing the full instances file with sense tags for the preposition (FrameNet prep Instances.xls),
  3. The summary lexicographic treatment of the preposition in a Word document (prepTreatment.doc),
  4. The sentences from the FrameNet database in a Senseval-compatible format (pp-prep.sents.xml) with an answer key (pp-prep.sents.key), and an XML file (gold standards for preposition disambiguation) showing a document identifier (pp-prep.sents.fn.xml), and
  5. The alternation patterns (other prepositions and other syntactic realizations filling the same semantic roles) in a tab-separated text file (pp-prep.alters.txt).

Project Structure and Support

All data generated in The Preposition Project is freely and publicly available under a general public license (GPL). The contents of the database are copyrighted in the names of its developers.

The Preposition Project is funded by CL Research. Tasks in the project are being performed by CL Research staff, with contract support from a professional lexicographer.

Papers from The Preposition Project and SemEval-2007

This document maintained by Ken Litkowski.
Copyright © 2005-2007 CL Research
Last updated: August 21, 2007