Pattern Dictionary of English Prepositions

Non Gamstop Casinos Non Gamstop Casino Non Gamstop Casino New Non Gamstop Casinos UK Migliori Bonus Casino

Pattern Dictionary of English Prepositions

Building on data from The Preposition Project (TPP), the Pattern Dictionary of English Prepositons (PDEP) is intended to identify the prototypical syntagmatic patterns with which prepositions in use are associated. By definition, PDEP seeks to identify linguistic units used sequentially to make well-formed structures and to characterize the relationship between these units. In the case of prepositions, the units are the complement (object) of the preposition and the governor (point of attachment) of the prepositional phrase. The relationship is usually called the semantic role, specifying the relationship that the prepositional phrase has with the main verb in a clause. This term is extended to include cases where the prepositional phrase modifies nouns or adjectives.

Standard dictionaries include definitions of prepositions, but they only loosely characterize the syntagmatic patterns associated with each sense. PDEP takes this a step further, looking for prototypical sentence contexts to characterize the patterns. PDEP is modeled on the principles of Corpus Pattern Analysis (CPA), developed to characterize syntagmatic patterns for verbs, which are viewed as central to expression of meaning. These principles are described more fully in Patrick Hanks (2013), Lexical Analysis: Norms and Exploitations. Currently, CPA is being used in the project Disambiguation of Verbs by Collocation (DVC) to develop a Pattern Dictionary of English Verbs (PDEV).

PDEP is closely related to PDEV. As indicated, most syntagmatic patterns for prepositions are related to the main verb in a clause. Because of this close relation, PDEP is viewed as subordinate to PDEV. This relationship is so close that the implementation of PDEP employs significant portions of the code being used in PDEV, with appropriate modifications as necessary to capture the syntagmatic patterns for preposition behavior.The Pattern Dictionary of English Prepositions is an online dictionary consisting of three main components: (1) a complete inventory of English single-word and phrasal prepositions, (2) a summary list of patterns for each preposition, with details for each pattern, and (3) actual corpus instances for each preposition, many of them sense-tagged and many available for analysis of the prototypical sentence contexts. The details of each component are described below to provide a user's guide for navigating and exploiting the PDEP data. PDEP is further described in a paper presented at ACL 2014; see the reference below for the full citation.

Inventory of English Prepositions
Preposition Patterns
Preposition Corpus Instances
Preposition Syntagmatic Patterns
Steps and Aids in Tagging Instances
Recording Preposition Behavior in the Pattern Box
Preposition Class Analyses
Downloading Data
Outstanding Questions

Inventory of English Prepositions

The start page for PDEP asks for the prepositions you want to see. You can enter a single preposition, the beginning letters of prepositions, or a regular expression (usually prefixed with '^' to indicate the beginning letters). Or, you can select an editing status, 'All' to retrieve all prepositions or another status to look at just those prepositions at some point in the process of being analyzed. (Currently, the only active status is 'initial'. Other statuses, to be used in the future are 'complete' (when all editing is done), 'ready' (indicating that everything has been done, but awaiting final review), WIP (work in progress), or VLF (very low frequency prepostions, for which there is likely not enough evidence for a definitive treatment).

The opening page of PDEP consists of a text box where you may enter a specific preposition, a drop-down list of "status" options (indicating the status of work on a preposition, with a default value of all to view all prepositions), and a button Load to load either a specific prepositions or a set. When the Load button is pushed, a list of the selected prepositions is shown.

The table of prepositions shows the status of the investigation into the properties for each preposition. The initial status indicates data that has been developed under The Preposition Project. The next column identifies the number of patterns associated with each preposition (this may also be viewed as the number of senses). The next two columns identifies the number of sentences (instances) that have been sense-tagged for each preposition from the FrameNet project or the Oxford English Corpus. The next three columns refer to sentences that have been gathered under TPP for analysis. The column labeled BNC Freq identifies the number of instances present in the written portion of the British National Corpus; this column thus describes the relative frequency with which each preposition occurs. The columns TPP Tagged and TPP Insts indicates the sample size that has been drawn from the BNC for this analysis. The number tagged indicates how many of the sample have been sense-tagged. The remaining columns of the table describe the editing that has occurred for the preposition.

The table of prepositions is sortable in each column, by clicking on the table headings. When you click on any row, a new tab is opened showing the patterns for the preposition.

The overall progress in tagging corpus instances is also shown at the bottom of this table. This line identifies the number of prepositions, the number of patterns, the number of FrameNet instances, the number of Oxford English Corpus instances, the number of TPP instances that have been tagged, the total number of TPP instances, and the estimated total frequency of prepositions in the written portion of the British National Corpus.

Preposition Patterns

When you open a preposition, a new tab is opened with a title consisting of the preposition and its editorial status. This tab shows the current set of patterns for the preposition. Along the top, the number of tagged instances from each the TPP corpora is identified, along with the number of untagged TPP instances. The initial display shows a summary for each pattern, giving the pattern of the preposition in use, with a template for the general case, consisting of the string [[Governor]] preposition [[Complement]]. This is followed by the primary implicature for the pattern, essentially replacing the preposition with its definition. Associated with each pattern is a pattern number and the number of instances in each corpus that have been tagged with this pattern number.

Clicking on any pattern row opens the details for the pattern, with a pattern box entitled with the preposition and the pattern number. The pattern details provide descriptions of the complement and the governor, as written by a lexicographer, checkboxes identifying the basic syntactic characteristics for the complement and governor, and fields for recording selection criteria to recognize the pattern. A primary purpose of PDEP is to formalize these characterizations for use in natural language processing tasks (see below for procedures describing how the selectors are identified and encoded). The next two lines of the pattern detail gives the semantics and cluster/semantic relations expressed by this pattern. The TPP class and TPP relation identify the characterizations developed in TPP. The Cluster identifies the general cluster assigned by Stephen Tratz (A Fast, Accurate, Non-Projective, Semantically-Enriched Parser). The Relation identifies the general semantic relation assigned by Vivek Srikumar (Modeling Semantic Relations Expressed by Prepositions). These two fields were initially completed only for prepositions used in the SemEval 2007 task on preposition disambiguation, and are now being extended to cover all prepositions.

The remaining rows provide further insights into the paradigmatic and syntagmatic characteristics of the pattern. The list of Substitutable Prepositions identifies prepositions that have similar senses the one for this pattern. (Corpus instances of similar prepositons may provide useful information for further analysis.) The Syntactic Position identifies where in a clause the preposition in this pattern may appear, using the categories developed in Quirk et al. The Sense Relation identifies whether this pattern may be considered a core sense or a subsense of a core sense, in which case the type of relation is specified. Finally, the Primary Implicature is repeated and any comments about the pattern usage are specified.

Preposition Corpus Instances

In the menu bars for the pattern manager and for the pattern detail, there are drop-down boxes labeled All Corpus Instances and Corpus Instances. Selecting an option in either of these boxes will take you to the corpus instances associated with the preposition, either for the full set or for those that have been specifically tagged for a particular pattern. The options in the pattern manager refer to the full set of instances in the corpus for the preposition, regardless of the pattern or sense tag. The options in the pattern detail are for corpus instances that have been tagged with the specific pattern or sense. One option is for all patterns in the TPP corpus that have not yet been tagged (identified as "TPPUNK"). Another option is for all instances in the TPP corpus that have been tagged with the sense 'x', which identifies instances that are not valid for the preposition, usually reflecting instances that have been mistagged by the trawl in developing the TPP corpus. Another option is for all senses in the TPP corpus that have been tagged with the sense 'pv', where the instance is a (transitive) phrasal verb that uses the preposition form, but is really part of the verb unit. These latter instances provide a basis for studying tagging and parsing difficulties for the preposition.

The selected set of instances opens in a new tab titled Annotation: preposition (sense). Each sentence is accompanied by the name of the corpus and instance identifying number, along with the current sense tag and the location of the preposition in the sentence. In the sentence itself, the preposition is given in bold, highlighted in light blue, and labeled as the target. The preposition object (or complement) is given in bold, highlighted in light green, and labeled as the complement. The preposition point of attachment (or governor) is given in bold, highlighted in light orange, and labeled as the governor. (Note that not all complements and governors are properly tagged and labeled, due to some underlying difficulties, such as an inability by the parser to identify these items.) The primary purpose of this tab is to facilitate sense-tagging of the TPP corpus instances. The menu bar identifies the preposition, the sense, and the corpus.

Tagging instances involves first selecting instances (clicking on individual sentences selects the sentence or clicking Select All selects all sentences, with each selected instance highlighted in yellow) and then selecting an option, i.e., a sense, from the Tag Instances drop-down list. (Clicking Unselect will remove all selections.) In addition to the full set of pattern numbers for the preposition, the options include x (to indicate that this instance is not a preposition), pv (to indicate that this instance is really a transitive phrasal verb, where the lemma should be tagged as a particle, and not a prepositional phrase), and unk (for unknown, i.e., not yet tagged). The Save option is for registered editors and is used to commit taggings to the database.

Steps and aids using in tagging instances are described below. In addition to making use of the pattern descriptions, features identified in parsing all instances can be examined and used as the basis for selecting instances automatically. These features characterize the context of a preposition's use and provide links to FrameNet frame elements associated with FrameNet lexical units.

Preposition Syntagmatic Patterns

In characterizing preposition behavior, the general semantic content of each element of [[Governor]] preposition [[Complement]] must be specified. We consider each component:

[[Complement]]: Syntactically, the complement is a noun phrase, a nominal wh-clause, or a nominal -ing clause. Considered by itself, the complement has a meaning, i.e., some ontological category. For example, Boston is a city. This category may frequently help in disambiguating the preposition. However, more generally, some additional meaning is given to the complement. For example, Boston may be a destination or a point of reference. The precise meaning will come from the preposition and the governor.
preposition: The preposition associated with the complement provides a first step in allowing us to determine what additional meaning should be added to the complement. In general, a given complement can appear after a large number of prepositions. For the example of Boston, we can imagine sentences using the following prepositions, across, against, around, beyond, from, in, into, of, over, through, to, and within. Other prepositions, such as between, by reason of, during, and until, are unlikely to have Boston as a complement. The specific preposition will impart some information on how we want to interpret the complement.
[[Governor]]: The final piece of meaning associated with the complement is provided by the governor, or the point of attachment, of the prepositional phrase. For the example of Boston, the verb played with against Boston will invoke a sports context, while resided with in Boston will invoke a locational sense.

In analyzing preposition behavior, therefore, the objective is to tease apart these various elements. The procedures for doing so are laid out below.

Steps and Aids in Tagging Instances

In general, tagging TPP instances is based on considering the pattern descriptions in the pattern manager. Since the pattern sets (definitions) are based on the Oxford Dictionary of English, the likelihood is that the coverage and accuracy of the sense distinctions is quite high. However, since prepositions have generally not received the close attention of words in other parts of speech, PDEP is intended to ensure the coverage and accuracy. During the development of the SemEval 2007 tagged instances, using FrameNet sentences, the lexicographer found it necessary to increase the number of senses by about 10 percent. Since the lack of coverage in FrameNet is well-recognized, the representative sample developed for PDEP should provide the basis for ensuring the coverage and accuracy of the sense inventory.

As indicated, the first step in tagging instances involves looking at the patterns and seeing whether the TPP instances can be tagged with existing patterns. In addition to the patterns, instances that have been tagged for SemEval 2007 (labeled FN) or the Oxford English Corpus (labeld OEC) can be opened and used as the basis for making judgments on the TPP corpus.

We have provided tools to enhance the examination of similarities from the FN or OEC corpora and applying the results to the TPP instances. As indicated, all sentences in the corpora have been fully parsed with a dependency parser. Features characterizing the context of the target preposition have also been developed for each sentence using Tratz' system. There are approximately 1500 features for each sentences; these data are almost instantly available for examination. When a particular corpus has been opened, whether for a particular sense or for the entire set, the menu bar includes an Examine item and a Select item. Next to the Examine item, there are two drop-down boxes, with the initial options labeled WFRs (word-finding rules) and FERs (feature extraction rules). To use the examine or select capability, a WFR and an FER need to be selected.

Word-finding rules enable examination of features for words in a certain contextual location with respect to the target preposition. They are divided into two sets: words pertaining to the governor and words pertaining to the complement. Words pertaining to the governor are: (1) verb or head to the left (l), (2) head to the left (hl), (3) verb to the left (vl), (4) word to the left (wl), and (5) governor (h). Words pertaining to the complement are: (1) syntactic preposition complement (c) and (2) heuristic preposition complement (hr). Thus, selecting one of these options identifies the word whose properties are to be examined.

Feature extraction rules identifies the specific kind of feature to be examined. There are 9 feature kinds: (1) part of speech, using the Penn Treebank categories (pos), (2) word class, the 4 major word classes (wc), (3) lexical name, the WordNet file name category, 27 possibilities for nouns and 15 for verbs (ln), (4) lemma, the base form of a word (l), (5) the word as it appears (w), (6) synonyms, as identified in WordNet (s), (7) hypernyms, the first level in WordNet (h), (8) whether the word is capitalized (c), and (9) affixes present in the word, a set of 27 suffix or prefix characteristics (af). Thus, the feature extraction rules enable examination of specific syntactic or semantic features of the selected word.

The combination of WFRs and FERs provide 63 features that can be examined for any corpus that is opened. When a WFR and an FER have been selected, clicking on Examine brings up a new tab with the results for that word/feature combination. The results are presented in a table with the headings Value, Count, and Description. Value gives the value of the feature. Count indicates the number of instances with this value. Description is given for only two features, the part of speech and the affixes, where the codes given in the value field are not always transparent. For the feature identifying whether a word is capitalized, the value is only 'true'. For most features, the number of possible values is relatively small, so the table is only several rows deep. For the lemma and the word itself, the number of distinct entries is limited by the number of instances in the particular corpus set being examined. For the synonym and hypernym features, the number of entries may be quite a bit larger.

In addition to the features that have developed through parsing the sentences in a corpus, an additional capability allows examination of potential semantic role labels using FrameNet data associated with lexical units (as annotated in the FrameNet project). Next to the drop-down boxes for specifying WFRs and FERs, there is a checkbox labeled FN when the given preposition has been used for marking a frame element. When frames are developed and sentences containing lexical units for the frame are annotated, a set of frame element realizations are recorded in summary form. Many of these realizations are in the form PP[prep]. We have created a dictionary of the FrameNet lexical units that contains a list of all frame element realizations associated with the lexical unit. Throughout the FrameNet data, 75 distinct prepositions are recorded along with the frame element. When the FN box is checked, for a particular corpus of a preposition, the set of lexical units with that preposition is retrieved. We hypothesize that the governor of a prepositional phrase is the trigger for this phrase. To examine the occurrences of a possible frame element governed by one of these triggers, we need to select the governor WFR (h) and the lemma FER (l. With this combination and with the FN box checked, clicking on Examine will generate a table of all governors (in the lemma form, i.e., lexical units) in the current corpus that have been tagged in FrameNet. In addition to the count of instances, the results also identify the set of frame elements that have assigned to these prepositional phrases in FrameNet under the Description heading. In many cases, more than one frame element has been tagged with the given lexical unit. For example, some sentences for the lexical unit dance have been tagged for the preposition 'across' with the Area or the Path frame element.

A similar capability has been added to examine prepositions identified in VerbNet. Throughout the VerbNet data, 31 distinct prepositions have been identified in VerbNet frames. Again, with the selection of the governor WFR (h) and the lemma FER (l), and with the VN box checked, clicking on Examine will generate a table of all governors (in the lemma form, i.e., members of VerbNet verb classes) in the current corpus that have been identified in VerbNet frames. In addition to identifying the lemmas, the results also identify the VerbNet classes. In some cases, a lemma may appear as a member of more than one verb class using the given preposition.

The general objective of examining features is to identify those that are diagnostic of specific senses. To do this most effectively, it is best to open the corpus instances that have been tagged with a specific sense in either FN or OEC (see the instructions above for Preposition Corpus Instances). Experience in examining features will identify the most useful combinations. When an interesting feature has been identified, it can be used to select sentences in the open corpus set. To do this, it is necessary to put the value identified in a feature examination in the box next to Select and then click on Select (or just pushing the Enter key after entering text in this field). When this is done on an FN or OEC corpus, particularly those for specific senses, the selected instances will generally show the consistency with which these instances have been tagged. When the same feature combination is used with the TPP corpus, particularly for instances not yet tagged, the selection will identify candidate instances for tagging with a specific sense. For example, opening the full TPP corpus for 'over', specifying 'hr' as the WFR and 'ln' as the FER, and then placing 'noun.time' in the selection box will identify 122 instances out of 500 that have this characteristic. Inspection will show how well this combination is diagnostic of sense 14(5) of 'over'.

Recording Preposition Behavior in the Pattern Box

By examining features, the behavior of a particular sense can be constructed. As indicated above, examining characteristics of the two tagged corpora (OEC and FN) will be useful in formalizing the TPP data in the pattern box. This may begin with an examination of the word classes (wc) and parts of speech (pos) of the complements and governors. These can be used to check the appropriate boxes in the pattern description (NN, NNP, WH, or -ING for the complements and Noun, Verb, or Adj for the governors).

A next step might be to examine the complement and governor lemmas (l) and words (w). It is likely that several words or lemmas will be identified. Several potential categorizations of these words can be examined, including WordNet lexical names (ln), WordNet synonyms (s), WordNet hypernyms (h), FrameNet frame element realizations (with FN checked), and VerbNet verb classes (with VN checked. When these features are examined, the results show the number of instances in the particular subcorpus and the total number of instances in that corpus, so that some assessment of generality can be made. The WordNet features tend to produce a larger number of total hits, reflecting the polysemy present in WordNet. The number of FrameNet and VerbNet hits are always below the total number of instances; this reflects the coverage of these two resources.

When some features appear to be diagnostic of a sense, the specifications can be applied to the TPP corpus using the Select facility. When the selected instances appear to have been selected appropriately, they can then be tagged with the particular sense under investigation. In such cases, the selection criteria are entered into the Selector fields of the patterns. For example, for pattern 12(10) of for, indicating the length of (a period of time, the WordNet lexical name noun.time is found to be quite prevalent in the OEC and FN corpora for this sense. When applied to the TPP corpus, most selected instances appear to be correctly identified. Upon examination, any incorrect selections can be unselected. The sense 12(10) is then applied to the selected instances. Finally, the annotation hr:ln:noun.time is entered into the Selector field for the complement.

Once instances in TPP have been tagged for a specific sense, the next time this sense is examined, these instances can then be investigated in further depth. It is much easier to examine the consistency of the tagging when only the instances with these tags are shown. Further shades of meaning can perhaps be identified, perhaps with further refinement of all fields in the pattern description.

It is worth noting that examination of WordNet, FrameNet, and VerbNet features may provide additional insights into those resources. The WordNet features frequently reveal unexpected characterizations (such as 'school' as a time period). For FrameNet, the FN corpus shows a very high number of hits for FrameNet head lemmas, while the OEC and TPP corpora show a much lower number of hits. VerbNet also has a much smaller number of hits. Thus, presuming that the identification of head lemmas is quite accurate, analysis of the TPP instances may provide an opportunity for expanding the coverage of FrameNet and VerbNet.

Preposition Class Analyses

PDEP enables an indepth analysis of TPP classes, Tratz clusters, and Srikumar semantic realations. First, we query the database underlying the patterns to identify all senses with a particu-lar class. We then examine each sense on each list in detail. We follow the procedures laid out above for examining the features to add information about selectors, complement types, and categories. We use this information to tag the TPP instances, conservatively assuring the tagging, e.g., leaving untagged questionable instances. Finally, we carefully place each sense into a preposition class or subclass, grouping senses together and making annotations that attempt to capture any nuance of meaning that distinguishes the sense from other members of the class.

To build a description of the class and its sub-classes, we make use of the Quirk reference in the pattern box (i.e., the relevant discussions in Quirk et al. (1985)). We build the description of a class as a separate web page and make this available as a menu item in the pattern box, labeled Analysis. A class analysis is not yet available for all classes; the current state of class analysis is described in Preposition Class Analyses. The description provides an overview of the class, making use of the TPP data and the Quirk discussion, and indicating the number of senses and the number of prepositions. Next, the description provides a list of the categories within the class, characterizing the complements of the category and then listing each sense in the category, with any nuance of meaning as necessary. Finally, we attempt to summarize the selection criteria that have been used across all the senses in the class. A list of prepositions senses in each class and their semantic relation type (Srtype) is also provided, along with a count of the number of instances tagged with each sense, the percentage of instances for the preposition that have been tagged with each sense, and a normalized frequency of the occurrence of each sense in the British National Corpus (per million prepositions).

The process of building a class description reveals inconsistencies in each of the class fields. When we place a preposition sense into the class, we may find it necessary to make changes in the underlying data. At the top level, these class analyses in effect constitute a coarse-grained sense inventory. As the subclasses are developed, a finer-grained analysis of a particular area is available. We believe these analyses may provide a comprehen-sive characterization of particular semantic roles that can be used for various NLP applications.

Downloading Data

All data used in PDEP is available for download directly. The full database is available in a set of MySQL files for upload into a MySQL database. Specific data is also available in Javascript Object Notation (JSON), using a simple format of {string, value} pairs. This is done through PHP scripts, as provided below. Each script is described with (1) a link to the section above where the relevant portion of PDEP is described, (2) a brief statement of what the script returns, (3) a link to the script (opening in another window), and (4) a detailed list of the field names (the strings) and the values (when these are not obvious).

Pattern Dictionary of English Prepositions Data All data from (PDEP) are available in a 46.7 MB zipped file. This file contains (1) a script to create the vertical files uploaded to Sketch Engine, with all supporting data and results, (2) three MySQL files suitable for upload into a MySQL database (definitions for all 1040 senses (patterns) of 304 prepositions, properties for each sense in 27 fields, and tagged instances for all sentences in the TPP corpora), and (3) help files describing the status of the corpora and the scripts for creating the vertical files. As significant changes to the PDEP are made, a new version of this data will be make available. (Latest: July 23, 2019.)
Preposition Inventory: This script (https://www.clres.com/db/prepstats.php) returns a list of all the prepositions, with summary data about each. The fields are "preposition", "patterns" (the number), "status", "fn" (FrameNet instances), "oec" (OEC instances), "tpptags" (TPP instances that have been tagged), "tpp" (TPP instances), "bnc" (BNC frequency), "create_by" (creator of the entry), "created" (date created), "modified" (who made the last modifications), "last" (date last modified). Note that manipulations of the table are not handled through the script, but through Javascript code associated with the table.
Preposition Patterns: Two scripts are used to retrieve data on the patterns for a preposition:
1. Pattern Summary: This script (https://www.clres.com/db/preppats.php?prep=against) retrieves the set of patterns (senses) for a preposition. Note that the script requires an argument prep and should be changed to obtain the patterns for a desired preposition. Five items of information are retrieved for each sense; the senses are separated by a comma. The fields for each sense are "sense" (the TPP sense identifier), "def" (the definition), "fn" (the number of FrameNet instances tagged with this sense), "oec" (the number of OEC instances tagged with this sense), and "tpp" (the number of TPP instances tagged with this sense).
2. Pattern Detail: This script (https://www.clres.com/db/prepprops.php?prep=as of&sense;=1(1)) retrieves the properties associated with a specific sense. Note that it requires two arguments, prep (the preposition) and sense (the TPP sense identifier). Twenty-eight fields of information are retrieved for each pattern: "cprop" (the complement properties), "aprop" (the governor or attachment properties), "sup" (the TPP class), "srtype" (the TPP semantic relation type), "tratz" (the Tratz cluster), "srikumar" (the Srikumar semantic relation), "opreps" (other prepositions with an equivalent sense), "srel" (the relation of a subsense to a supersense), "qsyn" (a space-separated list of the Quirk syntactic positions), "qpar" (where this sense is discussed in Quirk et al.), "com" (the lexicographer's comments about this sense), "cnn" (whether the complement is a common noun), "cnnp" (whether the complement is a proper noun), "cwh" (whether the complement is a wh-phrase), "cing" (whether the complement is an -ing phrase), "clexset" (specific complement lexical items), "gnoun" (whether the governor is a noun), "gverb" (whether the governor is a verb), "gadj" (whether the governor is an adjective), "csel" (complement selectors), "gsel" (governor selectors), "conto" (the complement ontology category), "fn" (number of FrameNet instances with this sense), "oec" (number of OEC instances tagged with this sense), "tpp" (number of TPP instances tagged with this sense), "tppunk" (number of TPP instances that have not been tagged), "tpppv" (number of TPP instances that have been tagged as phrasal verbs), and "tppx" (number of TPP instances that have been tagged as x, indicating that they are not legitimate prepositional instances).
3. Preposition Classes: This script (https://www.clres.com/db/prepclas.php) returns a JSON array containing the TPP Class for each sense. This array contains the elements prep (the preposition), sense (the TPP sense identifier), and sup (the TPP class).
Corpus Instances: This script (https://www.clres.com/db/prepsents.php?source=FN&prep;=above&sense;=1(1)) retrieves corpus instances from a specified corpus, preposition, and sense. Note that it requires three arguments, source (the desired corpus), prep (the preposition), and sense (the TPP sense identifier). The value for source can be either "FN" (the FrameNet corpus), "OEC" (the OEC corpus), or "CPA" (the TPP corpus). The value for sense can be either a specific TPP sense identifier, "all" (to retrieve the instances for all senses from the specified corpus), "unk" (to retrieve instances not yet tagged, applying only to the TPP corpus), "x" (to retrieve instances that have been tagged as non-prepositional), or "pv" (to retrieve instances that are phrasal verbs, i.e., where the preposition is really a particle or adverb). The script retrieves six fields of information, "prep" (the preposition), "source" (the source corpus), "sense" (the TPP sense identifier), "inst" (the instance identifier), "preploc" (the location of the preposition in the sentence, zero-based), and "sentence" (the sentence).
1. Dependencies:
Latest Tags: This script (https://www.clres.com/db/key.php?prep=along&corp;=CPA) retrieves the latest tagging for a given preposition from a specified corpus. The script requires two arguments: (1) prep (the preposition, with spaces for phrasal prepositions) and (2) corp (the corpus, one of "CPA", "FN", or "OEC", in uppercase). The output of this script follows the format of SemEval lexical sample answer key files, consisting of three space-separated fields: (1) the preposition, followed by the letter "p", (2) the instance identifier, and (3) the sense. The output appears in the web browser and can be saved as a text file, usually in the SemEval form consisting of the preposition as the file name, with the extension "key". An answer key can be generated for any preposition and corpus, but is usually done only for the CPA instances when all instances have been tagged, as indicated by "tagged" in the status field for the preposition.
Feature Analysis: This script (https://www.clres.com/db/featanal.php?corp=cpa&prep;=about&wfr;=hr&fer;=l) retrieves a count of the feature values for a specified preposition, corpus, word-finding rule, and feature-extraction rule, i.e., how many times each feature value occurs in the corpus. Note that the script requires at least four arguments: (1) corp (the corpus name, one of "cpa", "fn", or "oec", in lowercase), (2) prep (the preposition), (3) wfr (a word-finding rule, one of the codes listed in tagging above), and (4) fer (a feature-extraction rule, one of the codes listed in tagging above). An additional argument, sense, can be added to restrict the feature analysis to a specific sense (https://www.clres.com/db/featanal.php?corp=cpa&prep;=about&wfr;=hr&fer;=l&sense;=1(1)). The sense argument must be one of the TPP sense identifiers, "unk" (for untagged TPP instances in "cpa"), or "x" (for TPP instances that been tagged as non-preposition instances).
1. FrameNet Lookup: As mentioned above, when the word-finding rule is "h" (governor) and the feature-extraction rule is "l" (lemma), a lookup into a FrameNet dictionary is performed. This script (https://www.clres.com/db/fndict.php?pr=about) identifies all lexical units in FrameNet with the preposition (pr). This script retrieves a list consisting of the lexical unit and the frame element name where the preposition is used, with the number of occurrences of that frame element. This script returns values only for the following prepostions: 'by', 'as', 'at', 'in', 'before', 'after', 'with', 'to', 'near', 'outside', 'from', 'on', 'of', 'against', 'for', 'along', 'through', 'about', 'over', 'until', 'into', 'round', 'beyond', 'off', 'without', 'amidst', 'within', 'despite', 'like', 'under', 'since', 'because of', 'inside', 'than', 'throughout', 'among', 'during', 'between', 'above', 'except', 'via', 'past', 'concerning', 'below', 'across', 'onto', 'upon', 'around', 'down', 'towards', 'up', 'out', 'regarding', 'together with', 'amongst', 'behind', 'irrespective of', 'including', 'toward', 'beside', 'beneath', 'underneath', 'alongside', 'according to', 'depending on', 'aboard', 'till', 'following', 'per', 'ahead of', 'unto', 'opposite', 'besides', 'athwart', 'astride'. The values returned by this script are matched up with the feature examination internally in the Javascript processing and are not matched up by this script.
2. VerbNet Lookup: As mentioned above, when the word-finding rule is "h" (governor) and the feature-extraction rule is "l" (lemma), a lookup into a VerbNet dictionary is performed. This script (https://www.clres.com/db/vndict.php?pr=about) identifies all lexical units in VerbNet classes with the preposition (pr). This script retrieves a list consisting of the lexical unit and the verb classes where the preposition is used, with the number of occurrences of the preposition in the VerbNet frames. This script returns values only for the following prepostions: 'to', 'for', 'against', 'about', 'concerning', 'on', 'regarding', 'respecting', 'of', 'with', 'into', 'at', 'in', 'from', 'onto', 'before', 'out', 'by', 'towards', 'as', 'after', 'until', 'among', 'between', 'over', 'under', 'through', 'upon', 'off'. The values returned by this script are matched up with the feature examination internally in the Javascript processing and are not matched up by this script.
Instance Selection: This script (https://www.clres.com/db/featanal.php?corp=cpa&prep;=about&wfr;=hr&fer;=l&fval;=what) retrieves a list of the corpus instances matching the criteria specified in the arguments. As with the feature examination, the script requires the four arguments: (1) corp (the corpus name, one of "cpa", "fn", or "oec", in lowercase), (2) prep (the preposition), (3) wfr (a word-finding rule, one of the codes listed in tagging above), and (4) fer (a feature-extraction rule, one of the codes listed in tagging above). Another argument, fval (the value of a feature), is required. Optionally, an additional argument, sense, can be added to restrict the instance selection to a specific sense. In PDEP, the list returned by this script is used to identify which instances are highlighted.
The Preposition Project (TPP) Data This zip file (to be downloaded) contains the original data from The Preposition Project Online and includes a DIMAP dictionary of all English prepositions (November 2008) (courtesy of Oxford University Press), containing much of the data and with disambiguated hypernymic relationships as used in the digraph analysis of preposition classes. This file also includes an XML version of the TPP data, the data for each entry as used in the online system, and the taxonomy data used when clicking on the labels associated with each sense.

Outstanding Questions

PDEP is a work in progress, with several questions being addressed, including the following:

Representational Utility: What is the best formulation for representing prepostional behavior, particularly for facilitating use in NLP tasks? How can we best indicate the most important determinants for disambiguationg each sense? (Is it the complement or the governor?)
Sense Granularity: How can we best indicate subsense relations? This is currently indicated in the field Sense Relation in the pattern box, which can perhaps be useful for collapsing fine-grained senses into more coarse-grained senses. Is it possible or desirable to formalize these relations? Is there some mechanism that can or should be used to examine subsense relations to supersenses?
Ontological Specifications: Several fields in the pattern box (the Selector fields, the Category field, and the Sem Class field) are intended to capture some ontological category. What ontological frameworks can be used in these fields? Is Hanks' shallow ontology sufficient? Should we walk up the WordNet hypernymy tree? Should we use multiple categories for a single sense, or does such use imply distinct senses (requiring the addition of a pattern)?
Distributional Methods: Can distributional methods be applied to preposition behavior? In general, since a given sense usually has only a few substitutable prepositions, these methods have not been successful. Will the class analyses extend the amount of data that can be used in testing distributional semantics of prepositions?

References

Ken Litkowski. 2014. Pattern Dictionary of English Prepositions. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Baltimore, Maryland, Association for Computational Linguistics, pp. 1274-83. (pdf) (slides)

Quirk, R., Greenbaum, S., Leech, G., and Svartik, J. 1985. A Comprehensive Grammar of the English Language. Longman: New York.

Handpicked links

Casino Non Aams Italia