causaltriad: toward pseudo causal relation discovery and ...ir.hit.edu.cn/~sdzhao/stan_zhao_acm bcb...
TRANSCRIPT
CausalTriad: Toward Pseudo Causal Relation Discovery and Hypotheses Generation from
Medical Text Data
Sendong (Stan) Zhao+, Meng Jiang*, Ming Liu+, Bing Qin+, Ting Liu+
+Harbin Institute of Technology, China*University of Notre Dame, USA
Pseudo Causal Relation
• Golden standard⁃ Randomized controlled experiments
⁃ Too costly
• Observational data ⁃ Structured data, eg. EHR
⁃ Unstructured data (Text data), eg. medical literature, patient report
• Pseudo causal relation⁃ Semantic-level causal relations
⁃ Verified true causal knowledge
⁃ Or, have not been identified previously
⁃ Or, no evidence to support them
Previous Studies
• Extract causal relations from single sentences
• While causal relations usually span multiple sentences
• Use only textual information and ignore structural information
• While causal relations naturally have an attached network structure
• Only extraction rather than inference
• While causality itself is a basic logical rule
Causation Transitivity
• Preserving transitivity is a basic desideratum for an adequate analysis of causation
--L. A. Paul and Ned Hall “Causation: A User’s Guide”
𝐴 𝐵
……
𝐶 𝐴 𝐶
Causation Transitivity in Medical Text
Obesity usually increases the risk of diabetes.
People with diabetes have more sugar in blood
called hyperglycemia.
Metformin has become a mainstay of type 2
diabetes management and is now the recommended
first-line drug for treating the disease.
Obesity Diabetes
Hyperglycemia
Metformin
?
?
cause
cause
Motivation
• Jointly utilize
⁃ Textual information (context and co-occurrence)
⁃ Structural information (causation transitivity rule)
• Through inference to
⁃ Discover causal relations in text
⁃ Generate new causal relation hypotheses
Problem Definition
• Problem: Causal Relation Discovery from Triad Structures
• Medical Cause-Effect Candidates Network𝐺 = 𝑉, 𝐸 , 𝐸 ∈ 𝑉 × 𝑉
• Triad Structure
⁃ Each Triangle in the network
⁃ Basic unit
Our method
• Causal Relation Candidates Matching
• 3 Clues for Causal Discovery
⁃ Causal Association
⁃ Contextual Information
⁃ Causal Transitivity Rules
• Factor Graph Model
Causal Relation Candidates Matching
• Medical Dictionary
⁃ Dryad data package
⁃ TCMonline and TCMID
• For every n consecutive sentences
• Match medical entities
• Pair each of them into several pairs
• Every two pairs with a shared entity generate a triad structure
• Eg. (𝑒𝑖, 𝑒𝑘) and (𝑒𝑖, 𝑒𝑗) generate a triad structure (𝑒𝑘, 𝑒𝑖, 𝑒𝑗)
Our method
• Causal Relation Candidates Matching
• 3 Clues for Causal Discovery
⁃ Causal Association
⁃ Contextual Information
⁃ Causal Transitivity Rules
• Factor Graph Model
3 Clues for Causal Discovery
• Causal Association⁃ Frequently co-occurring entities are more likely to be a causation [Do and
Roth 2013]
⁃ ei is a possible cause of entity ej, if ej happens more frequently with ei than by itself [Suppes 1970]
• Contextual Information⁃ Causal relations in the text tend to share special contexts
⁃ Like domain-related words, causal triggers, connectives, etc.
• Causation Transitivity Rule
Causal Association
• Modeling causal association
𝐶𝐴 𝑒𝑖𝑗 = 𝐼(𝑒𝑖 , 𝑒𝑗) × 𝐷(𝑒𝑖 , 𝑒𝑗) × 𝑀𝑎𝑥(𝑢𝑖 , 𝑢𝑗)
⁃ Larger mutual information
𝐼 𝑒𝑖 , 𝑒𝑗 = 𝑙𝑜𝑔𝑃(𝑒𝑖 , ej)
𝑃 𝑒𝑖 𝑃(𝑒𝑗)
⁃ Award pairs that co-exist closer, while penalizing those are further apart in text
𝐷 𝑒𝑖 , 𝑒𝑗 = − log𝑠𝑒𝑛𝑡 𝑒𝑖 − 𝑠𝑒𝑛𝑡 𝑒𝑗 + 1
2 ×𝑊𝑆⁃ Model the frequency of co-occurrence of two medical entities, 𝑀𝑎𝑥 𝑢𝑖 , 𝑢𝑗
𝑢𝑖 =𝑃(𝑒𝑖,𝑒𝑗)
max𝑘
𝑃 𝑒𝑖,𝑒𝑘 −𝑃(𝑒𝑖,𝑒𝑗 )+𝜀, 𝑢𝑗 =
𝑃(𝑒𝑖,𝑒𝑗)
max𝑘
𝑃 𝑒𝑘,𝑒𝑗 −𝑃(𝑒𝑖,𝑒𝑗 )+𝜀
Contextual Information (1)
• Encode Synthetic Context
Contextual Information (2)
• Encode context based on pre-trained word2vec Word Embedding
• Three ways
Causation Transitivity Rules
• angle rules and triadic rule
Integrate 3 Clues
• Combining evidence from both textual supports and structural inferences, the above three clues are better equipped to discover causal relations.
• They are complementary in several ways:
⁃ Causal association gives preferences to frequently co-occurring causal pairs.
⁃ Causal transitivity rules are designed to identify causal relations with few textual supports except for those that follow the transitivity rule and generate new causal hypothesis.
⁃ Incorporating contextual information from the text can potentially eliminate those frequently co-occurring medical entities which are not causal.
Our method
• Causal Relation Candidates Matching
• 3 Clues for Causal Discovery
⁃ Causal Association
⁃ Contextual Information
⁃ Causal Transitivity Rules
• Factor Graph Model
CausalTriad: Factor Graph for Each Triad Structure
Experiments
• Data collection
⁃ TCM consists of the abstracts of 106,151 papers.
⁃ HealthBoards consists of post messages on health and medical issues such as diseases, symptoms, medicines, and side-effects, etc.
• Generating new causal relation hypotheses
Experimental Results
• Different types of causal relations⁃ DISEASE–cause–SYMPTOM
⁃ FORMULA–against–DISEASE
⁃ HERB–against–DISEASE
⁃ FORMULA–relieve–SYMPTOM
⁃ HERB–relieve–SYMPTOM
⁃ DISEASE–bring–DISEASE
⁃ DRUG–against–DISEASE
⁃ DISEASE–cause–SYMPTOM
Experimental Results
• Patterns causal reasoning rules
Experimental Results
• Causal relation extraction
Experimental Results
• Extracting causal relations from single sentence and multiplesentences.
• Extracting implicit causal relations
Experimental Results
Influence Factors
• Influence from the size of labeled training data
Influence Factors
• Influence from the number of bootstrapping rounds and window size
Conclusions
• We propose CausalTriad to incorporate both textual and structural clues for causal relation discovery from text.
• Experimental results on two datasets demonstrate that:
⁃ CausalTriad is effective for discovering explicit and implicit causal relations from both single sentence and multiple sentences.
⁃ CausalTriad can generate new causal relation hypotheses through inference.
Thank You!Any comments and suggestions?
Homepage: http://ir.hit.edu.cn/~sdzhao/
Email: [email protected]
Sendong (Stan) Zhao Meng Jiang Ming Liu Ting LiuBing Qin