�WPC� �7|;$UVx(S5�3��FQj���^ M@X��v�Z2�(�c�=�L��{���e�� ���]���\G�%2�  �is�contiguous�to,�is�juxtaposed,�and�is�close�to"�and���surrounds��as�including�"limits,�bounds,�confines,� �  �encloses,�and�circumscribes")�suggests�that�lexical�entries�(that�is,�nodes)�in�the�verb�hierarchy�and�adjective�network�can�themselves�be�used�as�relations.��We�were�required�to�include�lexical�relations�and�rules�in�the�semantic�network.���\�(Flickinger,�1987)�\�first�overtly�included�inflectional�relations�as�distinct�nodes�in�a�lexical�hierarchy.��There�are�several�types�of�derivational�rules�that�operate�in�the�lexicon.��At�the�most�basic,�we�have�morphological�relations�between�lexical�entries�(e.g.,�see��^�(Pentheroudakis�&�Vanderwende,�1993)�and��`�(Quirk,�et�al.,�1985:�Appendix�I)�`�on�word�formation).��These�rules�operate�at�the�lexical�entry�level,�rather�than�at�the�sense�level.��These�rules�have�not�been�seen�as�distinct�lexical�entries,�but�that�they�can�and�perhaps�should�be�is�based�on�consideration�of�sense�extension�or�lexical�rules.��These�rules�operate�primarily�on�what�have�been�identified�as�individual�senses�of�lexical�entries�(e.g.,�the���grinding��� c# �rule�which�converts�an�animal�sense�of�a�lexical�item�into�a�meat�sense,�for�which�see��b�(Copestake�&�Briscoe,�1991)�b�,�where�the�rule�itself�is�placed�among�the�lexical�entries�and�is�viewed�as�providing�the�conditions�for�the�transformation�of�one�sense�into�another).��We�also�were�required�to�consider�semantic�features�associated�with�lexical�items�(Katz�&�Fodor,�1963).���f�(Copestake�&�Briscoe,�1991)�f�showed�that�such�features�could�be�identified�as�distinct�lexical�items�(hereafter�termed���pseudoentries��)�that�can�be�inherited�by�regular� ^'�". �lexical�entries.��We�also�used�the�qualia�structures�of��h�(Pustejovsky,�1995)�h�.��The�formal�(that�is,�what�is�this)�aspect�of�a�qualia�structure�corresponds�to�the�ISA�backbone�of�WordNet,�while�the�constitutive�aspect�corresponds�to�its�meronymic�links.��The�agentive�and�telic�(the�purpose�associated�with�the�noun)�qualia�are�not�currently�included�in�WordNet.��We�may�view�these�aspects�as�relations�in�the�UMLS�hierarchy�and�hence� f-�(6 �give�them�independent�status�as�lexical�entries�with�selectional�restrictions�identifying�which�nodes�of�the�network�may�satisfy�the�restrictions.��The�final�relation�is�the�semantic�role.��For�lexical�items�having�subcategorization�patterns,�such�as�verbs�identifying�cases,�these�relations�provide�links�to�other�entries.��This�is�done�by�identifying�selectional�restrictions,�which�are�nothing�more�than�the�restriction�of,�say,�the���agent��or���instrument��to�a�subset�of�the� y �? �semantic�network.��In�general,�then,�nodes�of�the�semantic�network�may�be�viewed�as�lexical�entries�or�pseudoentries�representing�concepts.��Semantic�relations�may�be�viewed�as�labels�on�the�links�between�nodes.��However,�these�relations�are�better�viewed�as�distinct�nodes�in�the�network,�with�the�property�that�they�act�as�a�linking�mechanism�between�other�nodes;�semantic�relations�may�also�exist�within�their�own�hierarchy.��(These�are�similar�to�role�nodes�in��j�(Brachman�&�Schmolze,�1985)�j�.)��� ���6���% � �Comparison�of�Extended�Semantic� F�L �Networks�with�Previous�Work��� �� /M ��The�sublexicons�identified�in�the�analysis�of�a�cohesive�text�used�as�a�test�item,�a�cohesive�set�of�responses�to�that�test�item,�and�a�pre�defined�category�system�clearly�carve�out�portions�of�semantic�networks.��While�some�parts�of�these�sublexicons�correspond�to�subtrees�rooted�at�particular�nodes�in�these�networks,�several�additional�useful�"cuts"�through�the�network�have�been�identified.�̀�m�(Nida,�1975)�m�indicates�that�a�semantic�domain�may�be�defined�based�on�any�semantic�features�associated�with�lexical�items.��He�used�this�observation�to�assert�that�any�attempt�to�identify�a�single�hierarchy�or�ontology�was�somewhat�arbitrary�and�dependent�on�a�user's�need.��The�work�done�here�supports�that�observation,�but�does�so�in�the�context�of�a�much�larger�conception�of�semantic�relations�among�nodes�in�a�network.��The�identification�of�sublexicons�that�cut�across�parts�of�speech�corresponds�to�the�presence�of�semantic�components�in�identifying�categories.��This�notion�is�supported�in��o�(Burstein,�et�al.,�1996)�o�,�where�concept�grammars�are�based�on�collapsing�phrasal�and�constituent�nodes�into�a�generalized�XP�representation,��This�notion�of�using�semantic�components�as�the�basis�for�category�development�has�been�invoked�as�well�in��q���r�(Laffal,�1995)�r�.��The�semantic�network�carved�out�by�adhering�to�principles�of�lexical�cohesion�is�the�most�interesting.��In�the�first�place,�the�principal�nodes�of�the�network�are� �-)l �not�in�themselves�naturally�related,�thus�following�Nida�in�allowing�the�use�of�"on�the�fly"�criteria�for�defining�a�sublexicon.��In�the�second�place,�the�need�to�invoke�semantic�roles�(and�their�selectional�restrictions)�in�analyzing�the�discourse�structure�clearly�develops�"on�the�fly"�conceptual�clusters,�similar�to�those�described�in��t�(McRoy,�1992)�t�,�particularly�situational�clusters.��The�relationship�between�the�sublexicon�formed�from�the�ETS�response�set�and�the�ETS�test�item�suggests�further�that�the�identified�sub�semantic�network�has�affinities�to�scripts�as�described�by��v�(Schank�&�Abelson,�1977)�v�.��The�procedures�described�here�are�consistent�with�those�described�in��x�(Barri�/�re�&�Popowich,�1996)�x�and�indeed�would�benefit�from�those�procedures�in�further�automating�the�identification�of�situational�clusters.��� ���7��Conclusions:�Use�of�Extended�Semantic� �  �Networks�for�Characterization,�Definition,�and�Naming�of�Categories��� �� ��  ��Category�development�is�important�in�many�fields�of�research�and�is�usually�approached�with�statistical�techniques�of�classification�based�on�assignment�of�characteristics�to�units�of�analysis.��In�fields�associated�with�linguistics,�category�development�has�been�most�often�used�for�content�analysis�and�information�retrieval.��In�these�fields,�after�the�development�of�a�category�system�and�identification�of�lexemes�judged�to�lie�within�each�category,�the�analysis�of�text�is�then�based�on�the�frequency�with�which�the�lexemes�of�a�category�occur�within�units�of�analysis.��The�field�of�information�retrieval�performs�this�last�step,�usually�with�only�a�rudimentary�category�system�based�on�identifying�the�root�words�of�lexemes.��However,�within�information�retrieval,�considerable�promise��z�(Liddy,�et�al.,�1993)�z�has�been�obtained�by�using�a�category�system�based�on�the�subject�codes�in�Longman's�Dictionary�of�Contemporary�English�(LDOCE)��|�(Longman�Dictionary�of�Contemporary�English,�1978)�|�.��Labeling�texts�with�a�pre�defined�set�of�categories,�even�using�such�categories�for�identifying�the�principal�topics�of�texts,�has�shown�a�clear�benefit�of�customizing�lexicons�to�suit�computational�tasks��~�(Hearst�&�Sch�G�tze,�1996)�~�.��This�approach�to�customizing�the�lexicon�built�upon�the�semantic�network�embodied�in�WordNet,�but�made�use�of�only�its�principal�ISA�backbone,�that�is,�the�hypernymic�and�hyponymic�relations�between�synsets.��The�most�straightforward�categories�are�those�characterized�as�lists�of�subtrees�in�an�ISA�hierarchy.��The�definition�of�such�a�category�is�simply�the�set�of�highest�nodes�that�cover�all�concepts�in�the�category�(as�described�for�the�MCCA�category���Detached�roles��).�� -m(5 �Each�node�should�be�a�concept�in�the�category�or�a� �-.)6 �node�such�that�any�of�its�children�in�the�category�have�that�node�as�its�direct�parent.��It�is�legitimate�to�exclude�subtrees�rooted�at�the�descendants�of�the�nodes�in�the�set.��The�name�for�the�category�is�equivalent�to�the�set�of�nodes�defining�the�category.��The�category�systems�described�in����(Nida,�1975;�Hearst�&�Sch�G�tze,�1996;�Laffal,�1995)���are�of�this�type.��Another�relatively�straightforward�type�of�category�is�that�formed�with�lexical�items�that�have�well�defined�morphological�or�derivational�relations�between�pairs�of�the�concepts.��This�group�can�also�contain�other�types�of�lexical�rule�relations.��These�morphological,�derivational,�and�lexical�rule�relations�are�subsidiary�within�the�category.��There�is�a�set�of�semantic�components�that�lie�below�them;�these�are�the�principal�defining�characteristics�of�the�category,�handled�as�in�the�previous�type�of�category.��The�category�system�in��������(Burstein,�et�al.,�1996)���,�which�generalizes�part�of�speech,�is�of�this�type.��The�most�complex�type�of�category�system�is�the�one�necessary�to�describe�the�discourse�structure�of�a�text.��When�a�text�has�been�analyzed�as�described�in�the�previous�section,�the�result�is�a�set�of�discourse�segments�related�to�one�another�(with�many�identified�as�subsidiary),�discourse�entities�and�eventualities,�and�various�(primarily)�role�relations�between�these�entities.��We�view�the�concepts�and�relations�(including�the�discourse�relations)�as�essentially�present�in�and�licensed�by�the�lexicon,�and�then�instantiated�by�the�given�text�to�carve�out�a�subnetwork.��The�definition�of�this�subnetwork�is�then�constructed�by�identifying�the�highest�nodes�in�the�ISA�backbone�and�the�additional�relations�that�operate�on�the�backbone,�along�with�selectional�restrictions�that�are�used.��Characterizing�this�subnetwork�is�again�a�matter�of�identifying�the�topmost�ISA�nodes�(and�perhaps�more�importantly�here,�more�accurately�identifying�descendants�that�are�to�be�excluded).��Naming�this�subnetwork�is�again�based�on�the�set�of�topmost�nodes,�which�may�in�this�case�include�relations�(as�from�the�UMLS�tree�of�semantic�relations).��This�process�of�characterizing�a�subnetwork�is�quite�similar�to�the�development�of�supercategories�in�(Hearst�&�Sch�G�tze,�1996).��Thus,�to�at�least�that�extent,�this�process�may�be�viewed�as�leading�to�identification�of�the�topic�of�a�text.��� ���8��Acknowledgements��� �� �)�$g �����xU���xU�We�would�like�to�thank��#��xU���xU�y#����xU���xU�Jill�Burstein,�Robert�Amsler,� K+�&i ��#��xU���xU(z#����xU���xU�Don�McTavish,�Thomas�Ptter,�Adam�Kilgarriff,�  ,\'j �Bonnie�Dorr,�Lisa�Rau,�Marti�Hearst,�and�anonymous�reviewers�for�their�discussions�and�comments�on�issues� �-�(l �relating�to�this�paper�and�its�initial�draft.�#��xU���xU�z#����� � ������ ������9�����References��� �� 2� ���0 � �� � �Allen,�J.�(1995).���Natural�language�understanding��� �, �(2nd).�Redwood�City,�CA:�The�Benjamin/Cummings�Publishing�Company,�Inc.� ���� ��0 � �� � �Barri�/�re,�C.,�&�Popowich,�F.�(1996).�Concept�clustering�and�knowledge�integration�from�a�children's�dictionary.�COLING96.� ���� ��0 � �� � �Brachman,�R.,�&�Schmolze,�J.�(1985).�An�overview�of�the�KL-ONE�knowledge�representation�language.���Cognitive�Science��,�171-216.�� 4 ���� ��0 � �� � �Burstein,�J.,�Kaplan,�R.,�Wolff,�S.,�&�Lu,�C.�(1996,�June).�Using�lexical�semantic�information�techniques�to�classify�free�responses.�In�E.�Viegas�&�M.�Palmer�(Eds.),���Breadth�and�Depth�of� �8  �Semantic�Lexicons��.�Workshop�Sponsored�by�the� ��  �Special�Interest�Group�on�the�Lexicon.�Santa�Cruz,�CA:�Association�for�Computational�Linguistics.� ���� ��0 � �� � �CL�Research.�(1997�-�in�preparation).���DIMAP-3�users� �< �manual��.�Gaithersburg,�MD.������� ��0 � �� � �Copestake,�A.�A.,�&�Briscoe,�E.�J.�(1991,�June�17).�Lexical�operations�in�a�unification-based�framework.�ACL�SIGLEX�Workshop�on�Lexical�Semantics�and�Knowledge�Representation.�Berkeley,�CA:�Association�for�Computational�Linguistics.� ���� ��0 � �� � �Dorr,�B.�(in�press).�Large-scale�dictionary�construction�for�foreign�language�tutoring�and�interlingual�machine�translation.���Journal�of�Machine� v� �Translation��.�7����� ��0 � �� � �Flickinger,�D.�(1987).�Lexical�rules�in�the�hierarchical�lexicon�[diss],�Stanford,�CA:�Stanford�University.� ���� ��0 � �� � �Halliday,�M.�A.,�K.,�&�Hasan,�R.�(1976).���Cohesion�in� z�" �English��.�London:�Longman.�;�#���� ��0 � �� � �Hearst,�M.�A.,�&�Sch�G�tze,�H.�(1996).�Customizing�a�lexicon�to�better�suit�a�computational�task.�In�B.�Boguraev�&�J.�Pustejovsky�(Eds.),���Corpus� ~!�& �processing�for�lexical�acquisition��(pp.�77-96).� ?"�' �Cambridge,�MA:�The�MIT�Press.� ���� ��0 � �� � �Katz,�J.�J.,�&�Fodor,�J.�A.�(1963).�The�structure�of�a�semantic�theory.���Language��,���39��,�170-210.��$�*���� ��0 � �� � �Laffal,�J.�(1995,�October).�A�concept�analysis�of�Jonathan�Swift's���A�Tale�of�a�Tub��and���Gulliver's� &T!, �Travels��.���Computers�and�the�Humanities��,�pp.� �&"- �339-361.� ���� ��0 � �� � �Levin,�B.�(1993).���English�verb�classes�and� G(�#/ �alternations:��A�preliminary�investigation��.� )X$0 �Chicago,�IL:�The�University�of�Chicago�Press.� ���� ��0 � �� � �Liddy,�E.�D.,�Paik,�W.,�&�Yu,�E.�S.�(1993,�June�22).�Document�filtering�using�semantic�information�from�a�machine�readable�dictionary.�Workshop�on�Very�Large�Corpora:�Academic�and�Industrial�Perspectives.�Columbus,�OH:�Association�for��-�(6���� �Computational�Linguistics.� D(#D(# ��0 D ��  �Lindberg,�D.�A.,�B.,�Humphreys,�B.�L.,�&�McCray,�A.,�T.�(1993).�The�Unified�Medical�Language�System.���Methods�of�Information�in�Medicine��,���32��,�281-291.��C9D(#D(# ��0 D ��  ���Longman�Dictionary�of�Contemporary�English��(P.� �: �Proctor,�Ed.).�(1978).�Harlow,�Essex,�England:�Longman�Group.� D(#D(# ��0 D ��  �Macleod,�C.,�&�Grishman,�R.�(1994).���COMLEX�syntax� � G= �reference�manual��.�Philadelphia,�PA:�Linguistic� � > �Data�Consortium,�University�of�Pennsylvania.� D(#D(# ��0 D ��  �McRoy,�S.�W.�(1992).�Using�multiple�knowledge�sources�for�word�sense�discrimination.���Computational�Linguistics��,���18��(1),�1-30.�� BD(#D(# ��0 D ��  �McTavish,�D.�G.,�Litkowski,�K.�C.,�&�Schrader,�S.�(1995,�September).�A�computer�content�analysis�approach�to�measuring�social�distance�in�residential�organizations�for�older�people.�Society�for�Content�Analysis�by�Computer.�Mannheim,�Germany.� D(#D(# ��0 D ��  �McTavish,�D.�G.,�Litkowski,�K.�C.,�&�Schrader,�S.�(1997).�A�computer�content�analysis�approach�to�measuring�social�distance�in�residential�organizations�for�older�people.���Social�Science� F�L �Computer�Review��,�in�press.�WMD(#D(# ��0 D ��  �McTavish,�D.�G.,�&�Pirro,�E.�B.�(1990).�Contextual�content�analysis.���Quality�&�Quantity��,���24��,�245-265.���OD(#D(# ��0 D ��  �Miller,�G.�A.,�Beckwith,�R.,�Fellbaum,�C.,�Gross,�D.,�&�Miller,�K.�J.�(1990).�Introduction�to�WordNet:�An�on-line�lexical�database.���International�Journal�of� �R �Lexicography��,���3��(4),�235-244.���SD(#D(# ��0 D ��  �Nida,�E.�A.�(1975).���Componential�analysis�of�meaning��.� N�T �The�Hague:�Mouton.� D(#D(# ��0 D ��  �Pentheroudakis,�J.,�&�Vanderwende,�L.�(1993).�Automatically�identifying�morphological�relations�in�machine�readable�dictionaries�(pp.�114-131).�Oxford,�England.� D(#D(# ��0 D ��  �Pustejovsky,�J.�(1995).���The�generative�lexicon��.� �$Z �Cambridge,�MA:�The�MIT�Press.� D(#D(# ��0 D ��  �Quirk,�R.,�Greenbaum,�S.,�Leech,�G.,�&�Svartik,�J.�(1985).���A�comprehensive�grammar�of�the�English� "g] �language��.�London:�Longman.��"(^D(#D(# ��0 D ��  �Schank,�R.�C.,�&�Abelson,�R.�(1977).���Scripts,�plans,� �#�_ �goals�and�understanding��.�Hillsdale,�NJ:�Lawrence� Z$�` �Erlbaum.� D(#D(# ��0 D ��  ���UMLS�knowledge�sources��[7th�Experimental�Edition].� �%,!b �(1996).�Bethesda,�MD:�National�Library�of�Medicine.� D(#D(# ��0 D ��  �Wolff,�S.�R.,�Macleod,�C.,�&�Meyers,�A.�(1995).���COMLEX�word�classes��.�Philadelphia,�PA:� �(0$f �Linguistic�Data�Consortium,�University�of�Pennsylvania.� D(#D(# �