Preposition Class Analyses

(Under Development)


The representative instances collected in The Preposition Project (TPP) Pattern Dictionary of English Prepositions (PDEP) provide a comprehensive view of prepositions and an opportunity for examining the landscape of meaning expressed by prepositions. More specifically, the preposition senses and patterns can be aggregated into classes or clusters of semantic relations. Each class may be viewed as a coarse-grained set of meanings. Each class may then be analyzed in detail to provide a fine-grained set of meanings for the class. TPP has identified 12 classes, which are grouped and described briefly below, each of which contains a link to the fine-grained dimensional analysis for that class. This compares to 33 clusters assigned by Stephen Tratz (A Fast, Accurate, Non-Projective, Semantically-Enriched Parser) and 32 general semantic relations assigned by Vivek Srikumar (Modeling Semantic Relations Expressed by Prepositions). The number of classes or categories is undergoing refinement as instances are being examined; see below for a description of this process and a description of the class tables.

Classes

Special Class

Classes Merged

Procedure for Class Analysis

The analysis of preposition classes proceeds from a bottom-up examination of the TPP corpus instances for each individual sense. The particular sense under investigation is opened in PDEP. The objective is to complete its behavioral characterization, identifying the types of complements and the governors, the feature selectors for the complements and the governors, the TPP class and relation, the Tratz cluster, and the Srikumar semantic relation. This is done by opening the OEC instances, the FrameNet instances (if any), and the TPP instances. While an attempt is made to enter as much behavioral information as possible, a particular focus is to understand the nuance in meaning within the TPP class. In general, this analysis proceeds for the senses that have been identified for one class, with the objective of building an analysis of the nuances in that class. When a sufficient number of senses in a class has been analyzed and the dimensions of the class begin to emerge, this analysis is linked into PDEP. Then, whenever examining a sense that has a preliminary class analysis, the PDEP sense will have an Analysis menu item that can be clicked to bring up the analysis. This analysis will identify all senses in that class, with a discussion of the general characteristics of the class and any dimensional analysis for the class.

Description of Class Summary Tables

For several of the classes, the links lead to an in-depth discussion of the class. For all the classes, a current summary table is generated. For those with a discussion, the table appears after the discussion.

Each summary table includes a list of all the preposition senses in a given TPP class. A row in this table identifies the Preposition, the Sense number, and the TPP relation (labeled Srtype). The Count column indentifies the number of CPA instances that have been tagged with this sense. The percent column (Pct) identifies the percent of the instances for the preposition that have been tagged. When the percent is less than 100 percent (i.e., not all instances have yet been tagged), there is the possibility that more instances will be tagged with this sense.


The last column in a row is a normalized frequency (NF). This is an estimate of the number of instances per one million prepositions in the written portion of the British National Corpus. As discussed in detail elsewhere, the instances for each preposition is a sample drawn from the BNC. The target sample size was 250, with a larger sample for the most common prepositions with many senses. When the number of instances in the BNC was fewer than 250, all instances were used. When the samples were drawn, the total number of instances for each preposition was recorded (bi). When summing this number over all prepositions, the total N was 5,391,042. (For of and in, the total number of instances was 1,000,000, so that the estimate for the total is likely somewhat higher. As a rough order of magnitude, this is not likely to be very signficant.) To compute the normalized frequency for each sense, we first compute the proportion of the preposition's instances that have been tagged with the specific sense, i.e., pi = ci/ni, where ci is the count tagged for the sense and ni is the size of the sample for the preposition. Next, fi = pi*bi is computed as an estimate of the frequency of instances in the BNC that have this sense. Finally, nfi is computed as (fi/N)*1000000.


A minor adjustment was made for very infrequent occurrences. All prepositions and all senses have been attested, but they may not appear in the instances we have drawn. The frequency in the BNC (bi) is assumed to be at least 1. The count tagged in our sample (ci) is assumed to be at least 1. The sample size in the CPA for the preposition (ni) is assumed to be at least 1. For cases where all these conditions hold, the estimated frequency per million prepositions (nfi) is 0.19 (i.e., less than 1/5th of an occurrence per million prepositions).

References

  1. Quirk, R., Greenbaum, S., Leech, G., and Svartik, J. 1985. A Comprehensive Grammar of the English Language. Longman: New York.

Comments to: Ken Litkowski  Modified: December 31 2016 12:28:26.