ÿWPC·  öòî±åÞâDl‚¸…÷ [-'IKa¾µqÄZmP¼_Þ×g.lNsq0r”¨eúUï­":ðÿ‡c¾¡ ŠòÂmü+U>¥ãaÁ™#Þ“+ ü È.¶Æ!ÛB-ÓÒwR¼é…ÖŠTœ¢ŠkyM³‘>’ ˜± ¢AfR°ۜ̅v•OÁ î?GæÛ­=ýæµ ˜¯Zà–å ¸ ˇp€è•¶¸÷=1{BÏj ûîÏE}q»Døí×d|=3Оáµ¢‰ƒ{§v¦åÏ|h‡øã23æ\ØÐá'ó33ê`”ÆÐŽIV !:$#ö]Q¢Bÿ6Ëä.˜v >Í­xèÙÔûD_G~Iy\ÖÕ(ëÿÕ¿;<æ!Vü«nŠYAóV¿ç»¡üi hÛÝÃÌ¡–KSsjGÜÎ9ɵ kšwäž·¦ð²ðØ—£Tž±$°xw_N'ïÓn é(zÜ#Úuð,J”=åé´W°Ü<`9óâ}"ïÞÉý:’;–dy™CÓ vóeçæí&SAOÆg"»]¹>m ÷o—Æü@nuJ5áŒÂÀè…êVÚ,À‡`¥×ÛØÓ°D«1ù'D.t~´‡¦×m ÎŽøä(UN0 %~j„ 0Kî R9 0O‹U.Ú 0C RK 0x B D32 Æe 0O+ 0Cz 0O½ AS _ 04o 4£ (@øÐ Z ‹6Times New Roman RegularX ' Í & Category Development Based on Semantic Principles   0 .   4ÿÿ$`Diamondð`ðà0..àÈ"s$pVçJdCREAA¼,Ø'äTA3L('2ŸÑ$§§Ý ƒ&¤T!ÝÓ  ÓÝ  Ý.xÐðApUnivers(ÿÿ$‘‘òòÚ  ÚÚ  ÚóóˆtableCREAA¼,Ø'äTA3(92g$¥¥Ý ƒ&¤T!ÝÓ  ÓÝ  ÝÓÓÚ  Ú0Ú  Ú. d'ÿÿÈÈÈÈdxdƒ Level 1 Level 2 Level 3 Level 4 Level 5('2ŸÑ$££Ý ƒ&¤T!ÝÓ  ÓÝ  Ý(ÿÿ$””ò òÚ  ÚÚ  Úó ó('2ŸÑ$¢¢Ý ƒ&¤T!ÝÓ  ÓÝ  ÝEþÿ<< cÿÿ€3|X( ¤T$¡¡Ó  Ó'a´"Ý ƒ&¤T!ÝÓ  ÓÝ  ÝÓÓCategory€Development€Based€on€Semantic€PrinciplesÐ ° ÐÌKenneth€C.€LitkowskiÌCL€ResearchÌ20239€Lea€Pond€PlaceÌGaithersburg,€MD€20879ÌTelephone:€301„926„5904ÌEmail:€ken@clres.comÌWeb€site:€http://www.clres.comÌÐ  b  ÐAbstractÌà  àCategory€systems€extracted€from€textual€material€are€an€important€part€of€the€scientificÏprocess€and€qualitative€analysis,€but€are€frequently€developed€on€an€ad„hoc€basis.€€TheirÏdevelopment€can€be€improved€by€a€stronger€reliance€on€linguistic€and€semantic€principles.€ÏTracing€and€recapitulating€the€development€of€semantic€principles€from€the€1950s€to€the€1990sÏin€linguistics€and€text€analysis€shows€how€the€principled€use€of€information€from€lexicalÏresources€can€facilitate€the€development€and€analysis€of€category€systems.€€These€principles€areÏdemonstrated€in€examining€the€category€systems€of€Minnesota€Contextual€Content€Analysis,€aÏtechnique€for€analyzing€textual€material€from€sentences,€answers€to€open„ended€questions€onÏquestionnaires,€expository€texts,€and€verbatim€transcripts.€€These€principles€are€then€extended€toÏshow€how€to€abstract€category€assignments€for€this€type€of€textual€material.€€The€discussionÏidentifies€computerized€systems€and€data€resources€used€in€the€application€of€these€principles.ÌÌKeywords:€Category€development,€semantic€analysis,€linguistic€analysis,€text€analysis,€contentÏanalysis,€qualitative€analysisÐ  ,| ÐAuthor's€Note:€€This€paper€incorporates€material€from€a€presentation€to€the€Conference€onÏComputers€and€the€Social€Sciences,€Minneapolis,€1996.€€Special€thanks€to€Don€McTavish€forÏawakening€my€interest€in€content€analysis€and€to€David€Fan€for€his€patience€and€comments€onÏearlier€drafts€of€this€paper.ÌÐ  ø H Ð1.€IntroductionÌà  àCategories€are€used€in€every€form€of€science,€and€indeed,€may€constitute€the€roots€ofÏscience.€€A€category€is€an€abstract€class,€group,€or€set€consisting€of€individual€elements€of€anyÏtype.€€A€category€is€defined€by€characterizing€these€elements.€€In€this€paper,€the€elements€areÏindividual€words€of€natural€language,€as€well€as€phrases€like€òòcounty€seatóó€that€have€well€definedÐ ø H Ðmeanings.€€Beyond€such€words€and€simple€phrases,€set€elements€can€be€more€complex€phrases,Ïsentences,€paragraphs,€and€even€entire€texts;€at€the€end€of€the€paper,€categories€for€such€largerÏunits€of€text€are€discussed.Ìà  àA€vast€array€of€methods,€particularly€statistical,€is€used€to€categorize€information€andÏdata.€€But€many€qualitative€approaches€are€used€as€well,€particularly€in€social€science€research.€ÏFor€the€most€part,€such€techniques€are€based€on€the€investigator's€intuitions€about€the€meaning€ofÏcategories,€perhaps€supported€with€statistical€analysis.€€While€such€approaches€can€and€should€beÏcontinued,€some€other€avenues€have€opened€up€with€the€developments€in€linguistics€and€theÏsemantic€theories€supporting€linguistics.€€This€paper€presents€techniques€for€categoryÏdevelopment€based€on€semantic€principles€(that€is,€principles€for€describing€the€meaning€ofÏwords),€particularly€by€weaving€in€the€historical€emergence€of€these€principles.Ìà  àTo€ground€this€discussion,€the€paper€(1)€characterizes€some€of€the€ways€in€whichÏcategories€are€used€in€social€science,€from€the€simple€use€of€categories€like€gender€inÏquestionnaires,€through€category€development€in€theory€development,€to€highly€intricateÏcategory€systems€involving€hierarchical€systems,€and€(2)€looks€briefly€at€category€developmentÏfor€thesauruses€and€library€cataloguing€systems.€€The€paper€then€describes,€in€the€1950s€(theÏearly€days€of€computers),€the€beginnings€of€computerized€information€retrieval€and€text€analysis,Ïparticularly€from€the€perspective€of€their€use€of€thesauruses€and€cataloguing€systems.€€A€briefÐ ¼, () Ðhistorical€overview€then€describes€formalizations€of€linguistic€principles€in€the€development€ofÏformal€grammars€(that€is,€how€words€can€be€combined€in€phrases€and€sentences)€and€semantics.€ÏThis€overview€is€then€unfolded€in€the€presentation€of€principles€for€category€development,€basedÏon€research€in€linguistic€formalisms€continuing€with€ever€richer€grammars€and€semanticÏformalisms.€€The€progression€of€these€formalisms€is€described€in€the€examination€of€theÏcategories€used€in€the€Minnesota€Contextual€Content€Analysis€(MCCA)€approach.€€Finally,Ïcurrent€research€toward€an€integration€of€semantic€principles€into€content€analysis€describesÏabstraction€procedures€for€characterizing€the€"category"€of€any€text.ÌÌ2.€Category€Systems€Based€on€Textual€MaterialÌà  àMost€scientific€endeavors€involve€defining€variables€in€terms€suitable€for€measurement,Ïdeveloping€the€measures,€and€specifying€the€variable€relationships€in€order€to€check€forÏmisspecification€of€the€variables€and€for€measurement€errors€(U.€S.€General€Accounting€Office,Ï1993:€26).€€This€requires€defining€variables€"in€concrete,€specific,€unambiguous,€and€contextualÏterms€that€reduce€the€measure€to€a€single€trait€or€characteristic"€(U.€S.€General€AccountingÏOffice,€1993:€29).€€"Measures€must€be€accurate,€precise,€valid,€reliable,€relevant,€realistic,Ïmeaningful,€comprehensive,€and€in€some€cases€complementary,€sensitive,€and€properlyÏanchored"€(U.€S.€General€Accounting€Office,€1993:€30).€€This€paper€is€concerned€with€theÏdevelopment€of€variables€with€conceptual€underpinnings€(that€is,€categories)€and€how€to€ensureÏthat€they€are€well„defined€and€meet€the€requirements€for€being€meaningful,€comprehensive,€andÏperhaps€properly€anchored.€€Such€categories€may€refer€to€the€ostensibly€simple€concepts€of€adult,Ïmiddle„age,€and€senior€in€groupings€by€age.Ìà  àA€survey€researcher€engaged€in€exploratory€work€may€ask€open„ended€questions€whoseÐ ¼, (( Ðanswers€can€be€analyzed€only€by€examining€the€texts€of€the€responses.€€The€researcher€may€haveÏinitial,€sketchy€conceptions€of€the€categories€into€which€the€answers€will€fall.€€In€questionnaireÏdevelopment,€the€researcher€formulates€questions€where€answers€should€identify€aÏcomprehensive€set€of€alternatives€(such€as€list€of€items,€multiple€choices,€ranking€scales,€andÏLikert€scales,€and€range,€amount,€and€frequency€intensities)€(U.€S.€General€Accounting€Office,Ï1993:€46„78).€€The€set€of€possible€answers€should€contain€all€the€desired€categories,€should€notÏoverlap,€and€should€have€an€appropriate€level€of€specificity€(U.€S.€General€Accounting€Office,Ï1993:€102„9).Ìà  àContent€analysis€of€open„ended€exploratory€questions,€verbatim€transcripts€of€speech€orÏinterviews,€or€other€free€textual€material€is€essentially€theory€development€in€which€an€analystÏassigns€categories€to€organize€the€textual€material.€€This€development€is€very€difficult,€veryÏsubjective,€and€frequently€open€to€criticism€of€replicability€and€interrater€reliability.€€ManyÏinvestigators€eventually€create€categories€featuring€particular€words€such€as€those€expressingÏemotion€expressing€words.€€The€analysis€then€consists€of€obtaining€the€frequencies€of€suchÏwords€throughout€the€textual€material.€€Of€course,€what€constitutes€an€emotion€expressing€wordÏis€an€important€issue.€€Many€content€analysts€have€developed€dictionaries,€assigning€words€toÏdifferent€categories€based€on€their€individual€judgments;€these€analysts€may€articulate€criteriaÏused€for€the€development€of€their€systems,€sometimes€stating€that€the€words€in€a€category€shareÏ"semantic€components,"€that€is,€common€elements€of€meaning.€€However,€the€validity€of€theseÏcategory€systems€can€frequently€be€criticized.€€This€paper€describes€the€use€of€semanticÏprinciples€for€the€development€of€criteria,€with€the€goal€of€placing€category€development€on€aÏfirmer€basis.ÌÐ ¼, (* Ð3.€Thesaurus,€Library€Catalogue,€and€Information€Retrieval€CategoriesÌà  àCategory€development€predates€the€computer€era€primarily€in€the€classification€of€(1)Ïhuman€works€such€as€as€books,€films,€and€plays€into€library€catalogs€and€(2)€words€intoÏthesauruses€(Roget's€International€Thesaurus,€1992).€€While€these€systems€predate€computers,Ïthey€are€still€relevant€to€category€development.€€Library€cataloguing€systems€have€attempted€toÏorganize€the€world's€knowledge€into€broad€groups,€broken€down€into€ever„finer€categories,€soÏthat€each€catalogued€item€is€placed€within€the€system.€€However,€such€a€system€is€ultimatelyÏproblematic€when€an€item€may€logically€fall€under€more€than€one€category.ò ò€€ó óIn€addition€to€theÐ n¾  Ðgeneral€cataloguing€systems€in€public€libraries,€some€disciplines,€most€notably€the€medicalÏcommunity,€have€developed€cataloguing€systems€that€reflect€the€complexity€of€those€disciplines.Ìà  àA€thesaurus,€while€presenting€synonyms€and€antonyms,€is€generally€organized€byÏgrouping€words€according€to€ideas.€€Thus,€Roget's€International€Thesaurus€(1992)€uses€1,073Ïcategories€in€15€classes€(with€further€loose€groupings€within€the€classes,€down€finally€to€pairingsÏof€opposites,€such€as€òòAssentóó€and€òòDissentóó).€€For€example,€the€subclass€òòSexóó,€with€associatedÐ Zª Ðcategories€òòMasculinityóó€and€òòFemininityóó€may€be€used€to€formulate€gender€based€categories.Ð ,| Ðà  àWith€the€onset€of€computers,€thesauruses€and€cataloguing€systems€gained€considerableÏflexibility€in€permitting€multiple€terms€or€categories€for€characterizing€textual€materials.€€TheÏprimary€purpose€of€these€categories,€of€course,€is€for€the€retrieval€of€documents.€€ThesaurusesÏbecame€important€adjuncts€to€cataloguing€systems,€since€documents€could€be€characterized€byÏkey€words,€the€stock€in€trade€of€thesauruses.Ìà  àThesaurus€development€expanded€dramatically€with€the€advent€of€computer€age€in€theÏ1950s.€€This€expansion€has€contined€unabated€to€the€present.€€The€process€consists€of€identifyingÏwords€and€phrases€used€in€documents€and€then€placing€them€within€a€thesaurus.€€Unfortunately,Ð ¼, (( Ðwith€the€rapid€expansion€of€these€activities,€less€attention€is€placed€on€the€overarching€schemataÏfor€a€thesaurus.€€Instead,€the€emphasis€has€been€on€"local"€placement€decisions€in€which€a€newÏentry€is€related€to€other€entries,€primarily€by€linkages€through€synonyms,€broader€terms,Ïnarrower€terms,€and€perhaps€antonyms.€€The€overall€consistency€of€the€thesaurus€is€seldomÏexamined.€€Notwithstanding,€available€thesauruses€of€this€type€are€valuable€resources€forÏcategory€development.Ìà  àWithin€the€field€of€information€retrieval,€classification€of€documents€is€a€primaryÏendeavor.€€A€considerable€amount€of€research€uses€the€existence€of€words€in€a€text€as€the€basisÏfor€"classifying"€the€text,€often€in€relation€to€other€texts€and€documents.€€This€type€of€researchÏfocuses€on€the€frequency€of€occurrence€of€words€and€uses€sophisticated€statistical€techniques€forÏthe€classification.€€While€many€of€these€techniques€may€be€useful€for€category€development,€it€isÏimportant€to€distinguish€between€classification€and€category€development.€€The€difference€isÏlargely€one€of€scale,€with€classification€generally€focusing€on€whole€texts€(books,€reports,€andÏpapers),€while€category€development€focuses€on€narrower€text€segments€(individual€words,Ïphrases,€and€sentences).€€But,€as€category€development€attempts€to€cope€with€the€larger€textÏsegments€of€paragraphs,€speeches,€and€interviews,€the€boundary€with€classification€begins€toÏblur.€€The€principles€described€in€this€paper€show€how€category€development€may€cope€withÏthese€larger€segments€and€perhaps€eventually€with€classification.ÌÌ4.€Text€AnalysisÌà  àThe€advent€of€the€computer€in€the€1950s€also€saw€the€expansion€of€efforts€to€characterizeÏand€critically€evaluate€writings.€€The€computer€enabled€rich€new€areas€of€research€to€examineÏcharacteristics€of€textual€materials.€€The€computer€made€it€possible€to€examine€a€wide€variety€ofÐ ¼, (( Ðstatistics€about€texts,€identifying€such€things€as€frequencies€of€words,€their€average€length,Ïsentence€complexity,€and€vocabulary€growth.Ìà  àBeyond€benefit€to€information€retrieval€and€automatic€text€processing€efforts,€theseÏstatistical€analyses€also€enhanced€efforts€at€more€sophisticated€analyses€of€the€content€of€texts.€ÏThe€initial€work€only€looked€at€patterns€for€very€common€words€such€as€articles€òòtheóó€and€òòaóó,Ð ø H Ðpronouns,€and€prepositions.€€However,€these€efforts€soon€turned€to€the€analysis€of€more€'content'„¼ful€words.€€The€pace€of€text€analysis€has€accelerated€in€literary€analysis,€authorship€attribution,Ïthe€quantification€of€qualitative€data,€as€well€as€the€analysis€of€transcripts€from€focus€groups,Ïpsychotherapy,€and€interviewsÔ€XþŒXXXÔ.Ð @ ÐÔ€X›XXXþŒÔÌ5.€Evolution€of€Grammar€and€Semantics€ResearchÌà  àThe€1950s€also€saw€the€beginnings€of€analyses€of€both€the€frequencies€of€words€in€textsÏand€a€theory€of€syntactic€structures€describing€permissible€phrases€within€sentences€(Chomsky,Ï1956;€Chomsky,€1965).€€The€late€1950s€and€the€1960s€saw€a€greater€understanding€of€languageÏsyntax€and€the€accompanying€development€of€modestly€efficient€parsing€routines€that€providedÏsome€capability€for€representing€at€least€the€syntactic€structure€of€text.€€There€was€a€beginning€inÏthe€identification€and€assignment€of€semantic€features€to€words.€€An€example€here€would€be€theÏassignment€of€the€feature€òòmaleóó€to€the€word€òòbacheloróó.€€Progress€was€also€made€in€the€assignmentÐ æ#6 Ðof€semantic€roles€to€various€types€of€syntactic€structures.€€For€instance,€the€subject€of€a€sentenceÏmight€be€identified€as€the€agent€of€an€action.€€Or,€the€object€of€the€preposition€òòwithóó€might€beÐ Š'Ú"" Ðidentified€as€being€the€instrument€of€an€action)€(Katz€&€Fodor,€1963;€Fillmore,€1968).€€InÏinformation€retrieval,€the€use€of€thesauruses€(with€synonyms€and€rough€ð ðbroader€thanðð€andÏð ðnarrower€thanðð€hierarchies)€led€to€the€organization€of€concepts€into€hierarchies€useful€forÐ -P(( Ðgrouping€text€segments€into€conceptual€threads€through€the€text.€€But,€there€was€not€yet€anÏintegration€of€semantics€(see€(Quillian,€1968;€Kucera€&€Francis,€1967)).Ìà  àThe€1970s€saw€the€emergence€in€artificial€intelligence€of€techniques€for€creatingÏknowledge€bases€and€representing€various€types€of€relations€among€logically„stated€pieces€ofÏknowledge€(Winograd,€1972;€Schank,€1975).€€This€recognition€of€these€linkages€led€to€initialÏsemantic€studies€of€the€lexicon,€both€the€words€used€in€a€language€and€the€relationships€amongÏthem€(Jackendoff,€1972;€Amsler,€1980;€Evens,€et€al.,€1980;€Litkowski,€1978).€€The€1980s€sawÏconsiderable€expansion€in€the€study€of€semantic€relations,€leading€to€further€understanding€of€theÏimportance€of€the€lexicon€as€the€bedrock€for€understanding€the€nature€of€syntactic€structures.€ÏBut,€the€1980s€also€showed€the€need€for€the€compilation€of€massive€amounts€of€information€forÏcharacterizing€each€piece€of€the€lexicon.Ìà  àThe€1990s€has€seen€the€continuation€of€this€accumulation€of€information,€so€that€today,Ïthe€lexicon€is€populated€with€information€that€characterizes€the€meaning€of€a€word,€where€thatÏword€sits€in€a€hierarchy€representing€the€lexicon,€the€nature€of€its€relations€with€other€items€inÏthe€lexicon,€what€syntactic€patterns€it€may€participate€in€(particularly€for€verbs€and€verbalÏnouns),€and€what€might€be€its€collocations€(the€company€a€word€keeps).€€This€information€isÏused€to€identify€the€specific€sense€in€which€a€word€is€used,€whether€through€syntactic€analysis€orÏthrough€more€statistically„based€associations.Ìà  àWith€identification€of€the€specific€concept€associated€with€each€word,€it€is€possible€toÏbuild€a€much€richer€representation€of€a€text€passage.€€It€is€possible€to€identify€the€context,€toÏstudy€the€ebb€and€flow€of€that€context,€to€place€the€concept€within€its€proper€structure€within€aÏsentence,€and€to€organize€the€sentences€(that€is,€the€discourse)€into€its€overall€structure,€and€thus,Ïto€identify€more€precisely€the€overall€organization€of€a€text.Ð ¼, () Ðà  àGiven€this€overall€process€for€organizing€text,€the€next€task€is€to€bring€these€techniquesÏinto€real„world€processing.€€The€greatest€difficulty€lies€in€what€is€known€as€the€lexicalÏacquisition€and€knowledge€acquisition€bottlenecks.€€It€simply€takes€a€lot€of€time€to€put€all€thisÏinformation€into€the€lexicon€and€to€build€the€systems€to€do€the€processing.€€The€technology€isÏhere,€but€techniques€are€needed€to€put€it€together€efficiently€for€use€in€information€retrieval€andÏtext€processing€systems.€€The€principles€that€follow€will€facilitate€this€process.ÌÌ6.€Principles€of€Category€DevelopmentÌÌ6.1€Lexical€ResourcesÌà  àLexical€resources€include€dictionaries,€thesauruses,€grammars,€sets€of€examples€of€aÏwordððs€use,€specially€constructed€databases€of€information€about€words,€and€linguistic€analysesÏof€words;€they€provide€information€about€words;€they€are€used€to€develop€lexicons,€systematicÏrepresentations€of€characteristics€of€words€suitable€for€use€in€computerized€text€analysisÏsystems.òò1óó€€The€principles€described€in€this€section€make€use€of€three€distinct€lexical€resources:€Ð ,| Ð(1)€a€machine„readable€dictionary€(MRD),€a€searchable€reproduction€of€a€paper€dictionary,€usedÏto€identify€parts€of€speech€such€as€nouns,€verbs,€and€adverbs,€inflectional€forms€such€as€the€pastÏtense€or€gerundial€forms€of€verbs,€and€derivational€forms€such€as€concept€that€òòmanagementóó€isÐ ¢#ò Ðderived€from€òòmanageóó;€(2)€an€1800€page€description€of€grammatical€and€semantic€properties€ofÐ t%Ä  Ðthe€English€language,€used€to€identify€features€and€characteristics€of€words€(Quirk,€et€al.,€1985);Ïand€(3)€WordNet,€a€freely€available€rigorously€developed€database€of€approximately€120,000Ïwords€and€phrases,€with€these€words€and€phrases€grouped€into€synonym€sets€(synsets)€andÏorganized€into€a€hierarchical€and€relational€semantic€network€(Miller,€et€al.,€1990).€€WordNet€canÐ ¼, (' Ðbe€used€to€identify€common€semantic€components€for€words,€since€its€principal€relation€is€theÏhierarchical€ISA€relation€(a€"horse"€òòis€aóó€"mammal,"€establishing€that€"horse"€has€the€semanticÐ ‚Ò Ðcomponent€"mammal").òò2óóÐ T¤ ÐÌ6.2€Minnesota€Contextual€Content€AnalysisÌà  àMinnesota€Contextual€Content€Analysis€(MCCA)€is€a€technique€for€characterizing€theÏconcepts€in€textual€material,€ranging€from€answers€to€open„ended€questions€in€surveys€throughÏsentences,€paragraphs,€interview€transcripts,€and€books.€€MCCA€places€each€English€languageÏword€into€one€of€116€categories,€counts€the€words€in€each€category€and€compares€the€frequencyÏprofile€against€that€for€general€English€usage€(McTavish€&€Pirro,€1990;€McTavish,€et€al.,€1997a;ÏMcTavish,€1997b).Ìà  àIn€the€MCCA€dictionary€of€11,000€words,€the€average€number€of€words€in€a€category€isÏ95,€with€a€range€from€1€to€about€300.òò3óó€€Each€category€is€given€a€name,€but€these€names€are€onlyÐ ˆØ Ðheuristic€in€nature€and€have€no€essential€meaning.€€The€categories€appear€internally€consistent€inÏthat€the€words€in€each€category€have€an€underlying€similarity.€€However,€€the€characteristics€ofÏthe€categories€are€not€intuitively€obvious.€€Firm€principles€for€category€construction€can€helpÏextend€the€MCCA€dictionary€and€improve€the€function€of€this€program€(McTavish,€et€al.,€1997a;ÏLitkowski,€1997).€€These€principles€are€a€part€of€the€DIMAP€dictionary€creation€andÏmaintenance€software€(CL€Research,€1997€-€in€preparation).òò4óó€€DIMAP€includes€MCCA€as€aÐ t%Ä  Ðmodule€and€improves€the€dictionary€and€the€function€of€the€technique€by€creating€sublexiconsÏfor€individual€categories.€€These€sublexicons€are€based€on€WordNet€synsets,€information€fromÏthe€Merriam„Webster€Concise€Electronic€Dictionary,€as€well€as€the€other€resources€describedÏabove.Ð ¼, (( Ї6.3€Initial€Stage€Based€on€Part€of€Speech€AnalysisÌà  àThe€first€stage€of€category€analysis€involves€looking€at€the€part€of€speech€of€the€words€inÏthe€categories.€€This€stage€corresponds€to€the€earliest€developments€in€computational€textÏprocessing€in€the€1950s,€when€the€focus€was€on€the€part€of€speech€of€words.€€Eleven€categoriesÏin€MCCA€(such€as€òòHaveóó,€òòPrepositionsóó,€òòYouóó,€òòI„Meóó,€òòHeóó,€òòA„Anóó,€òòTheóó)€consist€of€only€a€few€wordsÐ ø H Ðin€closed€classes.òò5óó€€The€category€òòTheóó€contains€one€word€and€the€category€òòPrepositionsóó€containsÐ Ê   Ð18ò ò€wordsó ó.€€About€20€categories€(òòImplicationóó,€òòIfóó,€òòColorsóó,€òòObjectóó,€òòBeingóó)€consist€of€a€relativelyÐ œì  Ðsmall€number€of€words€(34,€22,€65,€11,€12,€respectively)€taken€primarily€from€syntactically€orÏsemantically€closed„class€words€such€as€subordinating€conjunctions€and€relativizers€or€wordsÏwhich€are€found€at€the€top€levels€of€WordNet€and€represent€abstract€concepts€like€òòpersonóó,€òòplaceóó,Ð b Ðand€colors.€€To€determine€that€these€categories€consist€primarily€of€closed€class€words,€the€wordsÏin€the€category€were€passed€through€DIMAP€to€extract€just€this€set€from€the€integrated€MRD.€ÏInspection€of€the€part€of€speech€field€confirmed€the€intuitions€about€the€category€assignment.Ìà  àWhen€the€parts€of€speech€of€words€in€a€category€belong€to€open€classes,€analysisÏbecomes€a€little€more€difficult.€€When€the€words€are€all€in€one€class€(that€is,€all€nouns,€verbs,Ïadjectives,€or€adverbs),€a€unifying€principle€is€sought€from€the€hierarchical€relationships€amongÏthe€words.€€One€possible€principle€is€that€the€words€fall€into€a€small€number€of€categories€in€aÏthesaurus€such€as€that€of€Roget.€€Another€possibility€is€that€the€words€are€related€by€"broaderÏthan,"€"narrower€than,"€or€synonymic€relations€as€assigned€in€keyword€indexing€thesauruses.€ÏYet€another€possible€principle€is€one€used€for€dictionary€definitions€and€consists€of€examiningÏdefinitions€of€the€words€to€identify€an€umbrella€genus€word€with€more€specific€terms€underneath.€ÏUsing€WordNet,€this€step€involves€identifying€the€hierarchical€groupings€of€the€words€in€theÏcategory.Ð ¼, (* Ðà  àThe€remaining€80€or€so€categories€in€MCCA€consist€primarily€of€just€such€open„classÏwords€(nouns,€verbs,€adjectives,€and€adverbs),€sprinkled€with€closed„class€words€(auxiliaries,Ïsubordinating€conjunctions).€€Several€categories€consist€of€words€from€a€single€part€of€speech€asÏis€the€case€with€òòFunctional€rolesóó,€òòDetached€rolesóó,€and€òòHuman€rolesóó,€which€all€include€onlyÐ & v Ðnouns.€€To€examine€such€unified€sets€of€words,€it€is€valid€to€examine€their€definitions€forÏcommon€genus€terms.€€DIMAP€implements€the€more€convenient€method€of€using€WordNet€toÏexamine€hierarchical€relations€as€in€Table€1,€which€shows€a€sample€dictionary€entry€where€theÏfield€"Isa€links"€shows€that€"animal"€is€of€type€"creature".Ì„€„€„€„€„€„ÌTable€1€about€hereÌ„€„€„€„€„€„Ìà  àTo€see€how€this€field€is€used,€consider€the€MCCA€category€òòDetached€rolesóó,€which€has€aÐ ¶ Ðtotal€of€66€words,€including€the€words:ÌÌà0  àòòacademic,€artist,€biologist,€creator,€critic,€historian,€instructor,€observer,Ð ,| Ðphilosopher,€physicist,€professor,€researcher,€reviewer,€scientist,€sociologistóó.ÐþN(#(# ÐÌThese€words€fall€under€the€WordNet€synsets€headed€by€òòpersonóó€(although€not€including€thisÐ ¢#ò Ðword),€in€particular,€synsets€headed€byÌÌòòà  àcreatoróó;Ð )h$ Ðòòà  àexpert:€authority:€professionalóó;Ð ê*:& Ðòòà  àintellectualóó.Ð ¼, (  ЇOther€synsets€under€òòexpertóó€and€òòauthorityóó€do€not€fall€into€this€category€(and€would€thus€beÐ ° Ðincluded€in€other€MCCA€categories).€€Thus,€it€is€possible€to€characterize€òòDetached€rolesóó€asÐ ‚Ò Ðwords€used€to€describe€persons€performing€intellectual€or€thinking€activities.€€This€is€a€conceptÏwell€captured€by€its€heuristic€name,€and€distinguished€from€òòHuman€rolesóó€such€as€òòuncleóó€or€òòbrideóóÐ & v Ðand€òòFunctional€rolesóó€such€as€òòjanitoróó€or€òòfirefighteróó.€€Identification€of€these€synsets€facilitatesÐ ø H Ðextension€of€the€MCCA€dictionary€for€this€category€to€include€further€hyponyms€(that€is,€typesÏof€creators,€experts,€or€intellectuals)€of€these€synsets.ÌÌ6.4€Semantic€Features€and€Semantic€ComponentsÌà  àThe€heuristic€name€given€to€the€category€of€òòDetached€rolesóó€along€with€the€definingÐ b ÐWordNet€synsets,€suggests€the€next€stage€in€the€process€of€category€development,€as€well€as€theÏnext€step€in€linguistic€consideration€of€the€lexicon.€€Table€1€also€shows€the€field€"Features",Ïindicating€properties€of€the€lexical€items,€such€as€"Age"€and€"Sex".€€Katz€&€Fodor€(1963)Ïproposed€the€use€of€semantic€features€to€characterize€entries€in€a€lexicon.€€In€the€sampleÏcategory,€there€is€a€feature€"Human"€with€a€value€"+"€and€a€feature€"Role"€with€a€valueÏ"Detached."€€Several€more€features€might€be€proposed€to€encode€words€in€this€category;Ïhundreds,€if€not€thousands,€of€other€features€can€be€used€to€characterize€the€full€set€of€words€inÏthe€English.€€For€example,€Whissell€(1996)€developed€a€"Dictionary€of€Affect",€encodingÏdimensions€of€emotion„activation€and€pleasantness.Ìà  à€Laffal€(1995)€likewise€based€his€dictionary€of€43,000€words€and€168€concepts€onÏsemantic€features,€coding€words€in€the€same€category€based€on€the€"core€meanings€of€words,"Ïthat€is,€having€the€same€semantic€component.€€Nida€(1975:€174)€characterized€a€semantic€domainÏas€consisting€of€words€sharing€semantic€components.€€However,€he€also€suggests€(Nida,€1975:Ð ¼, (( Ð193)€that€domains€represent€an€arbitrary€grouping€of€the€underlying€semantic€features.Ìà  àThus,€it€is€possible€to€see€that€the€1960s€development€of€the€notion€of€semantic€featuresÏhas€become€a€very€prominent€basis€for€the€development€of€category€systems.€€The€subtreesÏrooted€at€particular€nodes€in€the€WordNet€hierarchies€provide€a€readily€available€basis€forÏcategory€development€that€reflects€implicit€assignment€of€common€semantic€features€andÏcomponents.€€Litkowski€(1997)€proposes€making€these€semantic€features€and€components€moreÏexplicit,€specifically€for€the€purpose€of€facilitating€category€development.ÌÌ6.5€Syntax€and€Semantic€RolesÌà  àThe€1960s€saw€the€rapid€development€of€formalisms€for€representing€the€syntacticÏstructures€of€phrases,€clauses,€and€sentences,€but€there€was€relatively€little€research€towardÏintegrating€semantics€(that€is,€meanings)€into€the€representations.€€Fillmore€(1968)€began€aÏprocess€of€characterizing€the€semantic€roles€of€noun€phrases€in€a€sentence,€particularly€as€relatedÏto€the€main€verb.€€Thus,€in€addition€to€identifying€the€subject€and€object€of€a€verb€and€the€objectÏof€a€preposition,€it€was€possible€to€characterize€the€role€of€these€syntactic€items,€by€referring€toÏthem€as,€for€example,€òòagentóó,€òòpatientóó,€òòthemeóó,€òòinstrumentóó,€and€òòlocationóó.€€There€are€about€30€to€50Ð þN Ðsemantic€roles€although€there€is€still€no€full€agreement€on€what€the€complete€set€should€be.€ÏTable€2€shows€a€lexical€entry€for€the€word€òòeatóó€and€illustrates€the€way€in€which€syntactic€andÐ ¢#ò Ðsemantic€role€information€is€encoded.€€Important€to€this€example€is€the€requirement€that€the€wordÏòòeatóó€have€associated€syntactic€items€of€subject€and€object.€€The€subject€identifying€an€"agent"€whoÐ F'–"" Ðperforms€the€act€of€eating€and€a€"theme"€describing€the€thing€being€consumed€are€both€encodedÏas€features€of€the€lexical€item.Ì„€„€„€„€„€„Ð ¼, (' ÐTable€2€about€hereÌ„€„€„€„€„€„Ìà  àSyntactic€and€semantic€role€information€is€normally€used€for€parsing€text,€but€it€can€beÏimportant€for€category€development€as€well.€€This€can€be€seen€in€the€analysis€for€the€MCCAÏcategory,€òòSanctionóó,€which€contains€120€words,€including€the€following€words:Ð ø H ÐÌà0  àòòapplaud,€applause,€approve,€congratulate,€congratulation,€convict,Ð œì  Ðconviction,€disapproval,€disapprove,€honor,€judge,€judgment,€judgmental,Ïmerit,€mistreat,€reject,€rejection,€ridicule,€sanction,€scorn,€scornful,€shame,ÏshamefullyóóÐb(#(# ÐÌWhile€this€set€of€words€includes€words€from€several€parts€of€speech€(discussed€in€more€detailÏbelow),€it€is€rooted€primarily€in€the€Levin€(1993)€verb€sets€of€òòCharacterizeóó€(class€29.2),€òòDeclareóóÐ ˆØ Ð(29.4),€òòAdmireóó€(31.2),€and€òòJudgmentóó€(33).€€This€means€that€the€set€has€particular€syntactic€andÐ Zª Ðsemantic€patterning€in€addition€to€the€synonymic€and€hierarchical€relations€that€can€beÏdiscovered€using€the€techniques€described€in€the€previous€section.€€Levin€has€identified€aÏconsiderable€set€of€syntactic€properties€associated€with€the€classes€she€has€developed€(and€thus€aÏuseful€resource€itself€for€category€development),€but€has€not€yet€formally€characterized€theÏsemantic€properties.€€Instead,€the€definition€of€this€class€might,€following€Davis€(1996),€inherit€aÏsort€òònotion„relóó,€which€has€a€"perceiver"€and€a€"perceived"€argument€(thus€capturing€syntacticÐ F'–"  Ðpatterning)€with€perhaps€a€selectional€restriction€on€the€"perceiver"€that€the€type€of€action€is€anÏevaluative€one€(thus€providing€semantic€patterning).€€In€other€words,€the€underlyingÏconceptualization€of€the€MCCA€category€indicates€that€there€is€an€action€involved€(as€indicatedÐ ¼, (& Ðby€the€verb),€that€this€action€involves€some€idea€or€notion€on€the€part€of€the€actor€(theÏ"perceiver"),€and€that€this€notion€(the€"perceived")€is€inherently€an€evaluation.Ìà  àWordNet€synsets€explicitly€contain€some€syntactic€information€and€implicitly€someÏsemantic€role€information.€€However,€it€does€not€have€the€depth€required€for€the€analysisÏdescribed€above.€€Other€resources,€such€as€Levin€(1993),€as€well€as€some€databases€beingÏconstructed€for€on„line€access,€contain€more€of€this€detail.€€What€this€means€for€purposes€ofÏcharacterizing€and€extending€the€words€in€the€category€òòSanctionóó€is€that€not€only€can€theÐ œì  ÐWordNet€hierarchy€be€used,€but€also€it€is€possible€to€include€words€that€correspond€to€conversionÏof€verb€concepts€into€noun€counterparts€(for€example,€the€action€òòjudgeóó€corresponds€to€the€resultÐ @ Ðof€a€òòjudgingóó€action,€that€is,€a€òòjudgmentóó).Ð b ÐÌ6.6€Selectional€Restrictions,€Semantic€Relations,€and€Knowledge€BasesÌà  àThe€evolution€of€artificial€intelligence€and€semantics€in€the€1970s€and€the€1980s€(Amsler,Ï1980;€Evens,€et€al.,€1980;€Winograd,€1972;€Schank€&€Abelson,€1977;€Markowitz,€et€al.,€1986)Ïhas€provided€significant€amounts€of€understanding€about€potential€information€that€can€beÏincluded€in€lexical€entries€that€can€be€used€in€category€development.€€This€discussion€illustratesÏthree€pieces€of€information€(selectional€restrictions,€semantic€relations,€and€knowledge€baseÏinformation)€that€may€be€included€in€lexical€entries€and€that€can€assist€in€the€process€of€categoryÏdevelopment.€€These€are€discussed€for€the€sake€of€completeness,€but€are€not€described€in€theÏpresent€analysis€of€MCCA€categories€because€of€space€considerations.Ìà  àAs€alluded€to€in€the€last€section,€a€restriction€was€placed€on€the€type€of€notion€involved€inÏthe€use€of€a€word€in€the€òòSanctionóó€category,€namely,€that€it€had€to€be€evaluative€in€nature.€€Table€3Ð ê*:&% Ðshows€a€lexical€entry€for€the€preposition€òòinóó€with€two€senses.€€Basically,€this€entry€says€that€òòinóó€isÐ ¼, (' Ðused€to€begin€prepositional€phrases€(the€"pp„adjunct")€with€noun€phrase€objects.€€In€the€firstÏsense,€this€says€that€the€phrase€may€be€attached€to€another€noun€phrase€which€may€be€an€"object"Ïor€an€"event"€and€that€the€object€of€the€prepositional€phrase€is€a€location€in€some€physical€object.€ÏThe€second€sense€says€that€the€prepositional€phrase€is€attached€to€a€verb€which€describes€anÏevent€and€that€the€object€of€the€preposition€describes€a€location€which€may€additionally€beÏcharacterized€as€a€destination.€€These€specifications€are€called€selectional€restrictions€and€serveÏto€limit€the€range€of€words€that€may€appear€in€the€identified€syntactic€positions.Ì„€„€„€„€„€„ÌTable€3€about€hereÌ„€„€„€„€„€„Ìà  àTable€4€shows€a€lexical€entry€describing€an€event€(of€which€there€may€be€many€types).€ÏBut,€additionally,€the€entry€states€that€any€word€describing€an€event€is€inherently€related,€inÏseveral€possible€ways,€to€other€lexical€entries.€€These€are€known€as€semantic€relations.€€They€areÏencoded€here€as€features€with€values€preceded€by€plus€(+)€signs,€which€are€taken€to€mean€that€theÏfollowing€word€is€actually€a€selectional€restriction€on€what€other€lexical€entries€may€appear€inÏthe€particular€relation.€€The€relations€shown€in€Table€4€are€quite€general€and€would€apply€toÏmany€lexical€entries.€€However,€the€number€of€possible€relations€is€unbounded,€similar€to€theÏopen„class€words,€and€hence,€a€relation€may€be€of€arbitrary€depth€and€specificity.€€For€example,€aÏchemical€event€relation€"hydrogenate"€could€be€defined€and€specify€that€its€location€is€a€test„¼tube.Ì„€„€„€„€„€„ÌTable€4€about€hereÌ„€„€„€„€„€„Ð ¼, (% ЇÌà  àTable€5€presents€a€lexical€entry€for€the€word€or€concept€"teach".€€"Teaching"€is€aÏcommunicative€event€that€involves€a€"teacher"€as€the€agent€and€"knowledge"€as€the€"thing"€thatÏis€passed€on.€€The€lexical€entry€specifies€that€a€"teaching"€event€may€consist€of€three€subevents,Ïwhere€a€teacher€performs€a€"describing"€action,€where€there€may€be€a€"request"€subevent€(whenÏa€student€asks€for€information),€and€where€there€may€be€an€"answering"€process.€€TheÏcorresponding€lexical€entries€for€the€"answering"€and€"describing"€subevents€show€that€theyÏinherit€information€from€the€"teaching"€event.€€The€three€lexical€entries,€considered€as€a€unit,€areÏconstrued€as€part€of€a€script€(see€Schank€&€Abelson€(1977)).Ì„€„€„€„€„€„ÌTable€5€about€hereÌ„€„€„€„€„€„Ìà  àLexical€entries€containing€information€on€selectional€restrictions,€semantic€relations,€andÏknowledge€base€data€can€be€used€in€category€development€primarily€by€enabling€an€analysis€ofÏhow€the€embodied€concepts€fit€together,€that€is,€which€ones€are€in€more€subsidiary€positions.€ÏThe€lexical€entries€described€in€Tables€3,€4,€and€5€illustrate€the€general€linguistic€finding€that€theÏrepresentation€of€meaning€is€focused€principally€on€the€verbs€and€that€these€verbs€mayÏthemselves€be€arranged€in€hierarchies.€€Analysis€or€development€of€categories€should€thereforeÏconsider€this€information€in€identifying€the€characteristics€of€the€words€in€the€category.ÌÌ6.7€Lexical€Rules,€Derivations,€and€Sense€RelationsÌà  àThe€final€type€of€information€in€lexical€entries€considered€here€is€based€on€theÏphenomena€by€which€new€lexical€entries€are€derived€from€existing€ones.€€The€most€basic€of€theseÐ ¼, ($ Ðderivational€relations€is€the€one€in€which€inflected€forms€are€generated.€€These€are€generallyÏquite€simple,€and€include€the€formation€of€plural€forms€of€nouns,€the€formation€of€tensed€(past,Ïpast€participle,€gerund)€forms€of€verbs,€and€the€formation€of€comparative€and€superlative€formsÏof€adjectives.€€The€discussion€above€of€the€MCCA€òòDetached€rolesóó€and€òòSanctionóó€categories€didÐ & v Ðnot€mention€the€possibility€of€including€these€inflected€forms,€but€in€fact,€these€forms€areÏincluded.Ìà  àSeveral€more€elaborate€forms€of€relations€are€also€possible.€€For€the€purpose€ofÏillustrating€these€additional€derivational€rules,€consider€another€MCCA€category,€known€asÏòòNormativeóó.€€This€is€a€complex€category€consisting€of€76€words,€and€like€the€òòSanctionóó€category,Ð @ Ðalso€has€words€from€all€parts€of€speech.€€This€category€includes€the€following€(along€withÏvarious€inflectional€forms):ÌÌà0  àòòabsolute,€absolutely,€consequent,€consequence,€consequently,€correct,Ð ˆØ Ðcorrectly,€dogmatism,€habit,€habitual,€habitually,€ideologically,€ideology,Ïnecessarily,€necessary,€norm,€obviously,€prominence,€prominent,€prominently,Ïregular,€regularity,€regularly,€unequivocally,€unusual,€unusuallyóóÐþN(#(# ÐÌThe€use€of€the€heuristic€òòNormativeóó€to€label€this€category€clearly€reflects€the€presence€in€theseÐ ¢#ò Ðwords€of€a€semantic€component€oriented€around€characterizing€something€in€terms€ofÏexpectations€or€standards.€€Of€particular€interest€here€are€the€derivational€relations€that€formÏadjectives€from€nouns,€nouns€from€adjectives,€and€adverbs€from€adjectives.€€There€were€similarÏkinds€of€relations€in€the€òòSanctionóó€category,€where€most€of€the€concepts€seemed€to€be€based€onÐ ê*:&% Ðunderlying€verb€forms.€€In€that€category,€a€number€of€words€were€clearly€noun,€adjective,€andÐ ¼, (' Ðadverb€derivations€from€the€underlying€verbs.Ìà  àThese€derivational€relations€can€be€encoded€in€lexical€entries€in€the€same€way€as€theÏsemantic€relations€shown€in€Table€4.€€The€feature€name€in€such€relations€would€describe€theÏrelation€(such€as€"nominalization")€with€a€value€identifying€the€derived€form,€which€would€alsoÏbe€a€lexical€entry€having€the€inverse€relation€("nominalization_of"),€with€a€value€showing€theÏbase€form€of€the€word.€€Some€of€these€relations€are€shown€in€WordNet,€but€a€more€completeÏsource€is€a€dictionary€which€shows€an€ordering€of€derived€forms.€€The€MRD€included€withÏDIMAP€shows€these€forms.Ìà  àThe€adverb€derivations€in€the€òòNormativeóó€category€have€an€additional€interesting€aspect€toÐ @ Ðthem.€€The€heuristic€òòReasoningóó€has€also€been€used€to€label€this€category.€€Examination€of€theÐ b Ðsyntactic€and€semantic€nature€of€these€adverbs€shows€that€they€are€considered€to€be€òòcontentÐ ä4 Ðdisjunctsóó€(Quirk,€et€al.,€1985:€8.127-33),€that€is,€words€indicating€that€the€speaker€is€making€aÐ ¶ Ðcomment€on€the€content€of€what€the€speaker€is€saying,€in€this€case,€compared€to€some€norm€orÏstandard.€€Thus,€part€of€the€defining€characteristics€for€this€category€is€a€specification€for€lexicalÏitems€that€have€a€[content„disjunct€+]€feature.€€Analyzing€text€that€contains€such€words€asÏòònecessarilyóó,€òòobviouslyóó,€òòunequivocallyóó,€and€òòconsequentlyóó€would€thus€indicate€the€presence€ofÐ þN Ðeditorial€commentary.€€This€shows€the€value€of€using€non„database€sources€that€describeÏsyntactic€and€semantic€characteristics€of€the€language.Ìà  àThe€final€type€of€lexical€rule€considered€here€is€more€subtle€and€involves€the€observationÏthat€a€word€may€have€several€senses€that€are€related€to€one€another€(usually€with€one€sense€as€theÏbase€from€which€all€the€others€have€been€derived).€€A€simple€example€of€such€a€rule€is€the€wordÏ"fish."€€The€base€sense€of€this€word€refers€to€an€individuated€object€that€is€countable;€the€derivedÏsense€is€where€it€refers€to€the€food€sense,€where€the€object€is€not€individuated€but€anÐ ¼, () Ðundifferentiated€mass€or€substance.€€Another€example€of€the€same€process€is€use€of€the€wordÏ"coffee."€€A€lexical€rule€has€been€developed€to€encode€this€regularity€in€language€and€is€shownÏas€a€lexical€entry€in€Table€6.€€Note€that€there€is€a€general€rule€of€"grinding"€and€then€a€moreÏdetailed€entry€for€"animal„grinding."€€For€the€more€general€rule,€a€count€noun€is€converted€into€aÏmass€noun,€taking€it€from€an€individuated€object€to€a€substance.€€In€the€more€specific€rule,€theÏcount€noun€is€required€to€be€an€animal€and€then€the€derived€form€is€a€food„substance.€€Table€7Ïshows€how€this€might€be€encoded€in€a€dictionary€entry€for€the€word€"coffee,"€where€sense€2€ofÏthe€word€is€derived€from€sense€1.ÌññÔ% € Ôññ„€„€„€„€„€„ÌTable€6€about€hereÌ„€„€„€„€„€„Ì„€„€„€„€„€„ÌTable€7€about€hereÌ„€„€„€„€„€„Ìà  àThese€kinds€of€lexical€rules€(showing€the€way€different€senses€are€related€to€one€another)Ïare€presently€a€topic€of€much€research,€so€they€are€not€usually€found€in€any€easily€accessibleÏdatabases.€€However,€an€awareness€of€their€existence€is€important€for€category€development.ÌÌ6.8€€Summary€of€ProceduresÌà  àIn€the€analysis€of€MCCA€categories,€the€first€step€was€to€extract€from€the€full€MCCAÏdictionary€the€words€in€a€particular€category€(performed€automatically€using€DIMAP).€€This€listÏof€words€was€then€passed€up€against€the€integrated€machine„readable€dictionary,€automaticallyÏcreating€a€sublexicon€of€entries€consisting€of€just€the€words€on€the€list.€€These€entries€were€thenÐ ¼, (" Ðvisually€examined€to€determine€part„of„speech,€inflectional,€and€morphological€characteristics.€ÏIf€possible,€the€words€were€then€grouped€in€a€word€processing€program€so€that€all€words€basedÏon€a€single€base€word€appeared€on€a€single€line.€€Next,€the€base€words€were€passed€throughÏDIMAP€to€extract€and€create€lexical€entries€from€the€WordNet€database.€€Information€createdÏautomatically€in€these€entries€included€the€relations€to€other€words€(in€WordNet,€but€also€withinÏthe€created€sublexicon€in€DIMAP).€€These€relations€were€visually€inspected€to€determine€whatÏhierarchical€relations€were€present€among€the€words€in€the€category;€these€relations€were€thenÏused€to€rearrange€the€word€lines€in€the€word€processing€program,€so€words€related€hierarchicallyÏwere€indented€under€their€more€general€words.€€The€words€in€the€group€were€looked€up€in€Quirk,Ïet€al.€(1985);€if€discussed€in€that€text,€any€properties€were€identified.€€The€combination€of€all€thisÏinformation€then€constituted€the€definition€of€the€category,€permitting€a€critique€of€the€MCCAÏcategorization€and€its€automatic€extension€using€DIMAP€runs€based€on€data€from€WordNet.ÌÌ7.€Abstraction€as€Part€of€Category€DevelopmentÌà  àThe€preceding€section€has€shown€the€many€ways€in€which€lexical€information€can€beÏused€in€category€development.€€While€this€is€important€(and€all€category€development€canÏusefully€be€based€on€such€considerations),€categorizations€can€go€beyond€the€word€level.€€AsÏnoted€above,€the€issue€of€separating€categorization€from€classification€comes€into€play.€€TheÏtechniques€of€content€analysis€(including€that€embodied€in€the€MCCA€technique)€represent€oneÏmethod€of€attempting€to€identify€and€classify€texts€that€go€beyond€the€single€word€or€phrase.€ÏLinguistic€techniques€are€presently€emerging€that€may€allow€a€smoother€transition€from€the€wordÏlevel€to€the€text€level.Ìà  àBurstein,€et€al.€(1996)€describe€techniques€for€using€lexical€semantics€to€classifyÐ ¼, (( Ðresponses€to€test€questions.€€An€essential€component€of€this€classification€process€is€theÏidentification€of€sublexicons€that€cut€across€parts€of€speech,€along€with€concept€grammars€thatÏallow€the€collapsing€of€phrases€and€clauses€into€a€generalized€representation€that€abstracts€awayÏfrom€the€reliance€on€individual€words.€€As€seen€above€in€the€procedures€for€defining€MCCAÏcategories,€addition€of€lexical€semantic€information€in€the€form€of€derivational€andÏmorphological€relations€(that€is,€word€formation€rules)€and€semantic€components€common€acrossÏpart€of€speech€boundaries€would€justify€the€development€of€concept€grammars.Ìà  àLitkowski€&€Harris€(1997)€discuss€extension€of€a€discourse€analysis€algorithmÏincorporating€lexical€cohesion€principles.€€These€principles€show€how€the€information€in€lexicalÏentries,€particularly€selectional€specifications€on€verbs,€maintain€cohesion€of€a€discourse.€€WithÏsuch€information,€it€is€possible€to€understand€how€the€individual€components€of€a€text€fitÏtogether,€and€in€particular,€shows€that€particular€phrases€and€sentences€are€elaborations€of€othersÏ(and€hence€not€an€essential€part€of€its€categorization).€€As€a€result,€it€is€possible€not€only€toÏprovide€a€more€coherent€discourse€analysis€of€a€text€segment,€but€also€to€summarize€the€textÏbetter€and€thus€provide€an€overall€categorization€of€a€text,€rather€than€just€a€classification.ÌÌ8.€ConclusionsÌà  àBy€following€the€steps€in€which€the€understanding€of€linguistic€processes€has€evolvedÏsince€the€1950s,€a€set€of€principles€has€emerged€for€developing€and€analyzing€category€systems.€ÏSpecifically,€these€principles€require€analyzing€a€lexicon€to€articulate€the€specific€sets€ofÏlinguistic€and€semantic€characteristics€that€define€the€categories.€€Many€existing€and€emergingÏsources€of€lexical€information,€including€thesauruses,€dictionaries,€lexical€databases,€andÏdescriptions€of€grammatical€principles,€can€be€used€in€category€development.òò6óó€€Use€of€theseÐ ¼, (( Ðlexical€resources€and€adherence€to€the€category€development€principles€can€improve€theÏreliability€and€validity€of€category€systems€used€in€development€of€response€sets€forÏquestionnaire€items,€analysis€of€open„ended€questions,€and€analysis€of€textual€material€from€theÏsentence€to€the€book€level.ÌÐ  ø H ÐEndnotesÌ1.€€A€lexicon€includes€phrases€as€well€as€individual€words.€€A€phrase€in€a€lexicon€has€the€sameÏconceptual€status€as€a€word€and€hence€be€characterized€in€the€same€way€as€a€word.€€RecognizingÏphrases€in€text€analysis€is€very€difficult.€€Since€this€paper€is€not€concerned€with€the€actualÏmechanics€of€text€analysis,€use€of€the€term€phrases€is€avoided€for€the€sake€of€simplicity€ofÏpresentation.Ì2.€€Described€also€on€the€World€Wide€Web€at€http://www.cogsci.princeton.edu/~wn/,€from€whichÏthe€database€may€be€downloaded.Ì3.€€MCCA€incorporates€disambiguation€procedures€for€assigning€a€single€category€when€a€wordÏfalls€into€more€than€one€category.Ì4.€€A€suite€of€programs€for€creating€and€maintaining€lexicons€for€natural€language€processing,Ïavailable€from€CL€Research.€€Elaboration€of€the€procedures€used€in€this€paper,€applicable€to€anyÏcategory€analysis€using€DIMAP,€are€available€at€http://www.clres.com.€€These€proceduresÏdescribe€the€ordering€of€the€steps,€which€steps€can€be€performed€automatically,€how€informationÏis€merged,€and€where€human€intervention€is€required.Ì5.€€Closed€classes€are€syntactic€categories,€such€as€prepositions€or€pronouns,€that€have€relativelyÏfew€words€and€are€unlikely€to€have€new€words.€€Open€classes€are€nouns,€verbs,€adjectives,€andÏadverbs;€these€classes€expand€as€the€language€evolves.Ì6.€€The€Special€Interest€Group€on€the€Lexicon€of€the€Association€for€Computational€LinguisticsÏmaintains€a€set€of€links€to€publicly€available€lexical€resources€on€the€World€Wide€Web€atÏhttp://www.clres.com/siglex.html.ÌÐ  ê*:&# ÐÌReferencesÌà0  àà ° àAmsler,€R.€A.€(1980).€The€structure€of€the€Merriam-Webster€pocket€dictionary€[diss],€Austin:ÏUniversity€of€Texas.Ð (#(# Ðà0  àà ° àBurstein,€J.,€Kaplan,€R.,€Wolff,€S.,€&€Lu,€C.€(1996).€Using€lexical€semantic€informationÏtechniques€to€classify€free€responses.€In€E.€Viegas€&€M.€Palmer€(Eds.),€òòBreadth€andÐ Ê   ÐDepth€of€Semantic€Lexiconsóó.€Workshop€Sponsored€by€the€Special€Interest€Group€on€theÐ œì  ÐLexicon.€Santa€Cruz,€CA:€Association€for€Computational€Linguistics.Ð (#(# Ðà0  àà ° àChomsky,€N.€(1956).€Three€models€for€the€description€of€language.€òòIRE€Transactions€PGITóó,€2,Ð @  Ð113-124.Ð (#(# Ðà0  àà ° àChomsky,€N.€(1965).€òòAspects€of€the€theory€of€syntaxóó.€Cambridge,€MA:€MIT€Press.Ðä4(#(# Ðà0  àà ° àCL€Research.€(1997€-€in€preparation).€òòDIMAP-3€users€manualóó.€Gaithersburg,€MD.ж(#(# Ðà0  àà ° àDavis,€A.€R.€(1996).€Lexical€semantics€and€linking€in€the€hierarchical€lexicon€[diss],€Stanford,ÏCA:€Stanford€University.Ð (#(# Ðà0  àà ° àEvens,€M.,€Litowitz,€B.,€Markowitz,€J.,€Smith,€R.,€&€Werner,€O.€(1980).€òòLexical-semanticÐ ,| Ðrelations:€A€comparative€surveyóó.€Edmonton,€Alberta:€Linguistic€Research,€Inc.ÐþN(#(# Ðà0  àà ° àFillmore,€C.€J.€(1968).€The€case€for€case.€In€E.€Bach€&€R.€Harms€(Eds.),€òòUniversals€in€linguisticÐ Ð!  Ðtheoryóó€(pp.€1-90).€New€York:€Holt,€Rinehart,€and€Winston.Т#ò(#(# Ðà0  àà ° àJackendoff,€R.€S.€(1972).€òòSemantic€interpretation€in€generative€grammaróó.€Cambridge,€MA:€MITÐ t%Ä  ÐPress.Ð (#(# Ðà0  àà ° àKatz,€J.€J.,€&€Fodor,€J.€A.€(1963).€The€structure€of€a€semantic€theory.€òòLanguageóó,€39,€170-210.Ð)h$(#(# Ðà0  àà ° àKucera,€H.,€&€Francis,€W.€N.€(1967).€òòComputerized€dictionary€of€present-day€American€Englishóó.Ð ê*:& ÐProvidence,€RI:€Brown€University€Press.м, ( (#(# Ðà0  àà ° àLaffal,€J.€(1995,€October).€A€concept€analysis€of€Jonathan€Swift's€òòA€Tale€of€a€Tubóó€and€òòGulliver'sÐ ° ÐTravelsóó.€òòComputers€and€the€Humanitiesóó,€pp.€339-361.ЂÒ(#(# Ðà0  àà ° àLevin,€B.€(1993).€òòEnglish€verb€classes€and€alternations:€€A€preliminary€investigationóó.€Chicago,Ð T¤ ÐIL:€The€University€of€Chicago€Press.Ð (#(# Ðà0  àà ° àLitkowski,€K.€C.€(1978).€Models€of€the€semantic€structure€of€dictionaries.€òòAmerican€Journal€ofÐ ø H ÐComputational€Linguisticsóó€(Mf.81),€25-74.ÐÊ  (#(# Ðà0  àà ° àLitkowski,€K.€C.€(1997,€April).€Desiderata€for€tagging€with€WordNet€synsets€and€MCCAÏcategories.€4th€Meeting€of€the€ACL€Special€Interest€Group€on€the€Lexicon.€Washington,ÏDC:€Association€for€Computational€Linguistics.Ð (#(# Ðà0  àà ° àLitkowski,€K.€C.,€&€Harris,€M.€D.€(1997).€òòCategory€development€using€complete€semanticÐ b Ðnetworksóó.€Technical€Report,€vol.€97-01.€Gaithersburg,€MD:€CL€Research.Ðä4(#(# Ðà0  àà ° àMarkowitz,€J.,€Ahlswede,€T.,€&€Evens,€M.€(1986,€June€10-13).€Semantically€Significant€PatternsÏin€Dictionary€Definitions.€24th€Annual€Meeting€of€the€Association€for€ComputationalÏLinguistics.€New€York,€NY:€Association€for€Computational€Linguistics.Ð (#(# Ðà0  àà ° àMcTavish,€D.€G.€(1997b).€Scale€validity:€A€computer€content€analysis€approach.€òòSocial€ScienceÐ ,| ÐComputer€Reviewóó,€this€issue.ÐþN(#(# Ðà0  àà ° àMcTavish,€D.€G.,€Litkowski,€K.€C.,€&€Schrader,€S.€(1997a).€A€computer€content€analysisÏapproach€to€measuring€social€distance€in€residential€organizations€for€older€people.€òòSocialÐ ¢#ò ÐScience€Computer€Reviewóó,€15(2),€170-180.Ðt%Ä (#(# Ðà0  àà ° àMcTavish,€D.€G.,€&€Pirro,€E.€B.€(1990).€Contextual€content€analysis.€òòQuality€&€Quantityóó,€24,Ð F'–" Ð245-265.Ð (#(# Ðà0  àà ° àMiller,€G.€A.,€Beckwith,€R.,€Fellbaum,€C.,€Gross,€D.,€&€Miller,€K.€J.€(1990).€Introduction€toÏWordNet:€An€on-line€lexical€database.€òòInternational€Journal€of€Lexicographyóó,€3(4),Ð ¼, (# Ð235-244.Ð (#(# Ðà0  àà ° àNida,€E.€A.€(1975).€òòComponential€analysis€of€meaningóó.€The€Hague:€Mouton.ЂÒ(#(# Ðà0  àà ° àQuillian,€M.€R.€(1968).€Semantic€memory.€In€M.€Minsky€(Ed.),€òòSemantic€informationÐ T¤ Ðprocessingóó.€Cambridge,€MA:€MIT€Press.Ð& v(#(# Ðà0  àà ° àQuirk,€R.,€Greenbaum,€S.,€Leech,€G.,€&€Svartik,€J.€(1985).€òòA€comprehensive€grammar€of€theÐ ø H ÐEnglish€languageóó.€London:€Longman.ÐÊ  (#(# Ðà0  àà ° àòòRoget's€International€Thesaurusóó€(R.€L.€Chapman,€Ed.)€(5th).€(1992).€New€York:€HarperCollinsÐ œì  ÐPublishers,€Inc.Ð (#(# Ðà0  àà ° àSchank,€R.€C.€(1975).€òòConceptual€information€processingóó.€Amsterdam:€North-Holland.Ð@ (#(# Ðà0  àà ° àSchank,€R.€C.,€&€Abelson,€R.€(1977).€òòScripts,€plans,€goals€and€understandingóó.€Hillsdale,€NJ:Ð b  ÐLawrence€Erlbaum.Ð (#(# Ðà0  àà ° àU.€S.€General€Accounting€Office.€(October€1993).€òòDeveloping€and€using€questionnairesóó.Ð ¶ ÐGAO/PEMD-10.1.7.€Washington,€D.C.Ð (#(# Ðà0  àà ° àWhissell,€C.€(1996).€Traditional€and€emotional€stylometric€analysis€of€the€songs€of€Beatles€PaulÏMcCartney€and€John€Lennon.€òòComputers€and€the€Humanitiesóó,€30(3),€257-265.Ð,|(#(# Ðà0  àà ° àWinograd,€T.€(1972).€òòUnderstanding€natural€languageóó.€New€York:€Academic€Press.ÐþN(#(# ÐÐ  Ð!  ÐññÔ% € ÔññSoftware€Cited:€DIMAP€„€òòDIóóctionary€òòMAóóintenance€òòPóórograms,€utilities€for€creating€andÐ ° Ðmaintaining€lexical€knowledge€bases,€with€integrated€machine„readable€dictionary,€WordNet,Ïand€MCCA€content€analysis€capability.€€Available€from€CL€Research,€20239€Lea€Pond€Place,ÏGaithersburg,€MD€20879€(Telephone:€301„926„5904;€email€„€ken@clres.com;€web€site€„Ïhttp://www.clres.com).Ð  ø H ÐBiographical€Sketch:€Kenneth€C.€Litkowski€is€the€owner€of€CL€Research.€€He€has€degrees€inÏmathematics,€law,€and€computer€science€and€has€worked€extensively€in€qualitative€research€andÏcomputational€lexicology.€€His€interests€focus€on€the€design€and€development€of€computer„basedÏtools€for€building€lexical€knowledge€bases.€€He€may€be€contacted€at€20239€Lea€Pond€Place,ÏGaithersburg,€MD€20879€(Telephone:€301„926„5904;€eññmail€„€ken@clres.com;€web€site€„Ïhttp://www.clres.com).ññÐ  Ê   ÐTable€1ÌòòLexical€entries:€Example€of€semantic€featuresóóÐ ‚Ò ÐWord:€#animal€Type=r€Code=#00026€No.Defs=1ÌÓiU,°` ¸ hÀpÈ xÐ (#€%Ø'0*ˆ,à.813è5@8˜:ð<°œXiÓà Ü àSense:€1€Cat:€nilÌà Ü àIsa€links:ÌÓoU,X` ¸ hÀpÈ xÐ (#€%Ø'0*ˆ,à.813è5@8˜:ð<,°˜XoÓà Ü àà  à#creature€d-0Ìà Ü àFeatures:Ìà Ü àà  àEDIBLE€=€+booleanÌÌWord:€#creature€Type=r€Code=#00025€No.Defs=1Ìà Ü àSense:€1€Cat:€nilÌà Ü àIsa€links:Ìà Ü àà  à#ind_obj€d-0Ìà Ü àFeatures:Ìà Ü àà  àAGE€=€+scalarÌà Ü àà  àSEX€=€+genderÌÐ  Ð!  ÐTable€2ÌòòLexical€entries:€Example€of€syntax€and€semantic€rolesóóÐ ‚Ò ÐWord:€eat€Type=r€Code=e00000€No.Defs=1Ìà Ü àSense:€1€Cat:€vrb€Ìà Ü àDefin:€ingest€solid€food€through€mouth€and€swallow€itÌà Ü àIsa€links:Ìà Ü àà  à#ingest€d-0Ìà Ü àFeatures:Ìà Ü àà  àroot€=€$var0Ìà Ü àà  àsubj€=€((root€$var1)€(cat€n))Ìà Ü àà  àobj€=€((root€$var2€optional)€(cat€n))Ìà Ü àà  àAGENT€=€^$var1Ìà Ü àà  àTHEME€=€^$var2ÌÐ  Zª  ÐTable€3ÌòòLexical€entries:€Example€of€selectional€restrictionsóóÐ ‚Ò ÐWord:€in€Type=r€Code=i00000€No.Defs=2Ìà Ü àSense:€1€Cat:€prpÌà Ü àDefin:€located€within€the€confines€ofÌà Ü àFeatures:Ìà Ü àà  àroot€=€$var1Ìà Ü àà  àpp-adjunct€=€((root€$var0)€(obj€((root€$var2)€(cat€n))))Ìà Ü àà  à^$var1€=€(*OR*€+object€+event)€(location€^$var2€+physobj)Ì€€€€Ìà Ü àSense:€2€Cat:€prp€Ìà Ü àDefin:€into€the€destination€ofÌà Ü àFeatures:Ìà Ü àà  àroot€=€$var1Ìà Ü àà  àpp-adjunct€=€((root€$var0)€(obj€((root€$var2)€(cat€n))))Ìà Ü àà  à^$var1€=€+event€(destination€^$var2€+location€(relaxable-to€+physobj))ÌÐ  Ð!  ÐTable€4ÌòòLexical€entries:€Example€of€semantic€relationsóóÐ ‚Ò ÐWord:€#event€Type=r€Code=#00012€No.Defs=1Ìà Ü àSense:€1€Cat:€nil€Ìà Ü àIsa€links:Ìà Ü àà  à#all€d-0Ìà Ü àFeatures:Ìà Ü àà  àSUBEVENTS€=€+eventÌà Ü àà  àSUBEVENT-OF€=€+eventÌà Ü àà  àTIME€=€>€0€(MEASURING-UNIT€+second)Ìà Ü àà  àLOCATION€=€+placeÌà Ü àà  àCAUSED-BY€=€+eventÌà Ü àà  àCAUSES€=€+eventÌà Ü àà  àPRECONDITION€=€+eventÌà Ü àà  àEFFECT€=€+eventÌÐ  þN ÐTable€5ÌòòLexical€entries:€Example€of€knowledge€base€dataóóÐ ‚Ò ÐWord:€#teach€Type=r€Code=#00014Ìà Ü àIsa€links:Ìà Ü àà  à#communicative-event€d-0Ìà Ü àFeatures:Ìà Ü àà  àAGENT€=€+intentional-agent€(default€+teacher)Ìà Ü àà  àTHEME€=€+knowledgeÌà Ü àà  àBENEFICIARY€=€+intentional-agent€(default€+student)Ìà Ü àà  àPRECONDITION€=€(default€(*AND*€#teach-know-1€(NOT€#teach-know-2)))Ìà Ü àà  àEFFECT€=€(default€#teach-know-2)Ìà Ü àà  àSUBEVENTS€=€(*AND*€#teach-describe€#teach-request-info€#teach-answer)Ì€€€€ÌWord:€#teach-answer€Type=r€Code=#00019Ìà Ü àIsa€links:Ìà Ü àà  à#answer€d-0Ìà Ü àFeatures:Ìà Ü àà  àAGENT€=€+teach.agentÌà Ü àà  àTHEME€=€+teach-request-info.themeÌà Ü àà  àBENEFICIARY€=€+teach.beneficiaryÌ€€€€ÌWord:€#teach-describe€Type=r€Code=#00017Ìà Ü àIsa€links:Ð ¼, ( Ðà Ü àà  à#describe€d-0Ìà Ü àFeatures:Ìà Ü àà  àAGENT€=€+teach.agentÌà Ü àà  àTHEME€=€+teach.themeÌà Ü àà  àBENEFICIARY€=€+teach.beneficiaryÌÐ  Ê   ÐTable€6ÌòòLexical€entries:€Example€of€lexical€rulesóóÐ ‚Ò ÐWord:€#grinding€Type=r€Code=#00032Ìà Ü àSense:€1€Cat:€nil€Ìà Ü àIsa€links:Ìà Ü àà  à#lexical-ruleÌà Ü àFeatures:Ìà Ü àà  à0€=€+count-noun€(ORTH€$var0)€(RQS€+ind_obj)Ìà Ü àà  à1€=€+mass-noun€(ORTH€$var0)€(RQS€+substance)ÌÌWord:€#animal-grinding€Type=r€Code=#00033Ìà Ü àSense:€1€Cat:€nil€Ìà Ü àIsa€links:Ìà Ü àà  à#grindingÌà Ü àFeatures:Ìà Ü àà  à0€=€(RQS€+animal)Ìà Ü àà  à1€=€(RQS€+food-substance)Ð  Ð!  ÐTable€7ÌòòLexical€entries:€Example€of€sense€relationsóóÐ ‚Ò ÐWord:€coffee€Type=r€Code=c00000€No.Defs=2Ìà Ü àSense:€1€Cat:€nouÌà Ü àà  àDefin:€a€kind€of€bean€which€is€roasted€and€ground€to€produce€coffee-2Ìà Ü àIsa€links:Ìà Ü àà  à#coffee-bean€d-0Ìà Ü àFeatures:Ìà Ü àà  àcount€=€+Ìà Ü àà  àproper€=€-ÌÌà Ü àSense:€2€Cat:€nou€Ìà Ü àà  àDefin:€a€hot€drink€made€from€coffee-1Ìà Ü àFeatures:Ìà Ü àà  àcount€=€-Ìà Ü àà  àproper€=€-Ìà Ü àRole:Ìà Ü àà  à#grinding€coffee(1)ññÐ  ¢#ò ÐññññÌññ