detection of relations in textual documents manuela kunze, dietmar rösner university of magdeburg c...
TRANSCRIPT
![Page 1: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/1.jpg)
Detection of Relations in Textual Documents
Manuela Kunze,
Dietmar Rösner
University of Magdeburg Knowledge Based Systems and Document Processing
![Page 2: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/2.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 2
Introduction
http://en.wikipedia.org/wiki/Unsupervised_learning
![Page 3: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/3.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 3
Introduction
• to extract information from text, you can use techniques like simple pattern matching etc.
• additional knowledge is required:• 'Thursday': a day of a week• meaning of
• (implicit) `open' vs. `close'• `Pay-what-you-wish'
• text understanding / techniques of NLP • `Exhibition of over 30 color photographs and stories of life in
China's Yunnan Province …'
![Page 4: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/4.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 4
Introduction
ontologies contain information about:
• definition/description of concepts and
• description of instances
• kind of relation (name, type),– definition of domain and range values,
– characteristic of the relation: cardinality, transitivity, ...,
![Page 5: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/5.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 5
Natural Language Processing
• NLP techniques: – case frame analysis– exploiting syntactic structures– corpus-based IE for an initial ontology
• corpus:– autopsy protocols (400 protocols)– different document parts:
• findings• histological findings• background• discussion• …
– short linguistic structures – typical attribute-value structures
![Page 6: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/6.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 6
Overview
Case Frame
Analysis of Specific Syntactic Structures
Discussion/Conclusion
![Page 7: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/7.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 7
Case Frames
• resources:– results from syntactic parser
<NP TYPE="COMPLEX" RULE="NPC3" GEN="MAS" NUM="SG" CAS="NOM"> <NP TYPE="FULL" RULE="NP1" CAS="NOM" NUM="SG" GEN="MAS"> <N>Flachschnitt</N> </NP> <PP RULE="PP1" CAS="AKK"> <PRP CAS="AKK">in</PRP> <NP TYPE="FULL" RULE="NP2" CAS="AKK" NUM="SG" GEN="NTR"> <DETD>das</DETD> <N>Zungengewebe</N> </NP> </PP> </NP>
– results from semantic tagger– description of case frames
![Page 8: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/8.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 8
Case Frames
• (corpus-based) definition of roles for a concept– `Flachschnitt' (flat cut)
• `location'– sem. category: `tissue'– PP, case of NP: accusative, preposition: `in'
– `Herausschleudern' (skidding)• `patient'
– sem. category: `body-hum'– NP; case of NP: genitive
• `location' – sem. category: `vehicle' – PP, case of NP: dative, preposition: `aus'
![Page 9: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/9.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 9
Case Frames…<CONCEPT TYPE="medicalOperation">
<WORD>Flachschnitt</WORD> <DESC>medizinischer Schnitt</DESC> <SLOTS> <RELATION TYPE="LOCATION"> <ASSIGN_TO>TISSUE</ASSIGN_TO> <FORM>P(akk, fak, in)</FORM> <CONTENT>in das Zungengewebe</CONTENT> </RELATION> </SLOTS> </CONCEPT>
<CONCEPT TYPE="traffic-event"> <WORD>Herausschleudern</WORD> <DESC>event</DESC> <SLOTS> <RELATION TYPE="PATIENT"> <ASSIGN_TO>BODY-HUM</ASSIGN_TO> <FORM>N(gen, fak)</FORM> <CONTENT>des Koerpers</CONTENT> </RELATION> <RELATION TYPE="LOCATION"> <ASSIGN_TO>VEHICLE</ASSIGN_TO> <FORM>P(dat, fak, aus)</FORM> <CONTENT></CONTENT> </RELATION> </SLOTS> </CONCEPT>
…
![Page 10: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/10.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 10
Case Frames
• coverage of phrases like `fracture of elbow joint'?
• abstraction– `fracture' (sem. category: `trauma')
• role `patient': sem. category: `bone'
– `bruise' (sem. category: `trauma')• role `patient': sem. category: `organ'
– `hematoma' (sem. category: `trauma')• role `patient': sem. category: `tissue'
• concept x (sem. category: `trauma')– role `patient': sem. category: `body-part'
![Page 11: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/11.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 11
Case Frames
• results:– relations are defined by the case frame
• name/type of relation• domain, range
– corpus-based abstractions:• redefinition of semantic restriction
– use the least general hypernym as semantic restriction
• not yet extracted:– information about the characteristic of a relation
![Page 12: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/12.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 12
Overview
Case Frame
Analysis of Specific Syntactic Structures
Discussion/Conclusion
![Page 13: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/13.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 13
Analysis of Specific Syntactic Structures
• from general to specific information• resources:
– results from syntactic parser– results from semantic tagger– description of interpretation of syntactic structures
• Which word class can be interpreted as concept/instance?
• Which word class describes a relation?– adjective in a NP: describes the noun in the NP relation `prop‘– negations: negate concepts, verbs, or properties of a concept– particle: modification of adjectives
![Page 14: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/14.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 14
Analysis of Specific Syntactic Structures
CLMed N ADJ
prop(N, ADJ)
N interpreted as concept
ADJ interpreted as concept
results:
prop_catadj(N,ADJ)
![Page 15: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/15.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 15
Analysis of Specific Syntactic Structures
`liver tissue bloodless‘
Steps:
bloodless*blood
concentrationbloodless
liver_tissue* tissueliver tissue
• nouns and adjectives are interpreted as concept/instance
• adjectives describe a relation• in general: 'prop'
prop_blood-concentrationprop_blood-concentration
conceptinstancerelation
![Page 16: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/16.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 16
Analysis of Specific Syntactic Structures`liver tissue bloodless‘
…
<owl:Class rdf:ID="lebergewebe">
<rdfs:subClassOf><owl:Class rdf:ID="tissue"/></rdfs:subClassOf></owl:Class>
<owl:Class rdf:ID="blood-concentration"/>
<owl:Class rdf:ID="blutleer">
<rdfs:subClassOf rdf:resource="#blood-concentration"/></owl:Class>
<owl:ObjectProperty rdf:ID="prop_blood-concentration">
<rdfs:domain rdf:resource="#tissue"/><rdfs:range rdf:resource="#blood-concentration"/></owl:ObjectProperty>
<lebergewebe rdf:ID="Lebergewebe_6">
<prop_blood-concentration><blutleer rdf:ID="blutleer_7"/></prop_blood-concentration></lebergewebe> …
![Page 17: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/17.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 17
Analysis of Specific Syntactic Structures"kaum wahrnehmbare Unterblutungen"(Engl. "hardly detectable hematomas")
results of syntactic parser:<NP TYPE="FULL" RULE="NP4" CAS="_" NUM="PL" GEN="FEM">
<ADJP RULE="ADJP1">
<ADV>kaum</ADV>
<ADJ>wahrnehmbare</ADJ>
</ADJP>
<N>Unterblutungen</N>
</NP>
results of semantic tagger:– `kaum': weak-graduation– `wahrnehmbar': unknown token– `Unterblutung': trauma
resources for interpretation:• N: concept/instance• ADJ:
• concept/instance• rel: prop
• ADV:• concept/instance• rel: mod
adverb specifies adjective
adjective specifies noun
![Page 18: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/18.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 18
Analysis of Specific Syntactic Structures
`hardly detectable hematomas‘ Steps:
detectable* unspecified
hematoma* traumahematoma
• nouns, adjectives and adverbs are interpreted as concept/instance
• adjectives and adverbs describe relations
prop_unspecifiedprop_unspecified
conceptinstancerelation
hardly* hardly weak-graduation
mod_weak-graduationmod_weak-graduation
![Page 19: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/19.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 19
Analysis of Specific Syntactic Structures`hardly detectable hematomas‘
<owl:Class rdf:ID="unterblutung"><rdfs:subClassOf rdf:resource="#trauma"/></owl:Class>
<owl:Class rdf:ID="trauma"/>
<owl:Class rdf:ID="wahrnehmbar">
<rdfs:subClassOf rdf:resource="#unspecified"/></owl:Class>
<owl:Class rdf:ID="unspecified"/>
<owl:Class rdf:ID="kaum">
<rdfs:subClassOf rdf:resource="#weak-graduation"/></owl:Class>
<owl:Class rdf:ID="weak-graduation"/>
![Page 20: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/20.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 20
Analysis of Specific Syntactic Structures`hardly detectable hematomas‘
<owl:ObjectProperty rdf:ID="mod_weak-graduation">
<rdfs:domain rdf:resource="#unspecified"/>
<rdfs:range rdf:resource="#weak-graduation"/></owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="prop_unspecified">
<rdfs:domain rdf:resource="#trauma"/>
<rdfs:range rdf:resource="#unspecified"/></owl:ObjectProperty>
<unterblutung rdf:ID="Unterblutungen_5">
<prop_unspecified rdf:resource="#wahrnehmbare_4"/></unterblutung>
<wahrnehmbar rdf:ID="wahrnehmbare_4">
<mod_weak-graduation rdf:resource="#kaum_3"/></wahrnehmbar>
<kaum rdf:ID="kaum_3"></kaum>
![Page 21: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/21.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 21
Analysis of Specific Syntactic Structures
conceptinstancerelation
Protégé Plugin for Visualization: Ontoviz
Phrases like: • NP NP NP• NP N Adj Conj Adj• NP N conj N Adj• …
![Page 22: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/22.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 22
Analysis of Specific Syntactic Structures
• results– definition of concepts/instances– corpus-based definition/concretion of relations:
• prop prop_catADJ
• information about domain, relation
• not extracted:– information about the characteristic of a relation
![Page 23: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/23.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 23
Overview
Case Frame
Analysis of Specific Syntactic Structures
Discussion/Conclusion
![Page 24: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/24.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 24
Conclusion
• NLP techniques for extraction of information– analyse syntactic structures – information about semantic categories– result: corpus-based description of an initial ontology
• case frame analysis– relations are described in the case frame– disadvantage: creation of case frames– advantage: a definition of the relation
• analysis specific syntactic structures– a general interpretation of tokens and the syntactic structures– redefined by results from the semantic tagger– disadvantage: in some case, only the general relation definition is
delivered– advantage: less effort to describe the resources
![Page 25: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/25.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 25
Conclusion
• no information about the characteristic of a relation (cardinality, …)
• solutions– analyse occurrences in the corpus
• corpus-based assumption about cardinality
– integration of additional knowledge• initial domain specific ontology
![Page 26: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing](https://reader030.vdocument.in/reader030/viewer/2022013004/56649da85503460f94a95caf/html5/thumbnails/26.jpg)
Kunze, Rösner: Detection of Relations in Textual Documents 26
Key Aspects for IE
• ‘conceptual’ preprocessing steps: Names of concepts occur in different linguistic structures; compound vs. complex noun phrase (like ‘liver tissue’ and ’tissue of liver’)
– handle only one canonical linguistic structure as a representative for all paraphrases
• treatment of generalisation within local contexts – The token ‘liver’ may occur in the first sentence of a paragraph. In the next sentences
of the paragraph, only the hypernym ‘organ’ is used.
• concept or instance: which term in a linguistic structure has to be interpreted as a concept and which as an instance of a concept resp.
• definition of the scope for a concept: – a paragraph starts with a description of an organ (e.g. organ ‘liver’ in: ‘The liver
shows ... . Bloodrichness of the tissue.’ ), after this follows a description of parts of the organ (e.g., ‘Gewebe’). In such cases, additional knowledge about the domain has to be employed (for example, about meronyms or holonyms)
– tissue part-of liver vs tissue part-of concept X