tweeting beyond facts – the need for a linguistic perspective
TRANSCRIPT
Tweeting Beyond Facts ---The Need for a Linguistic Perspective
Sabine BerglerCLaC Labs
Sofia 2015
CLaC Labs Core Idea
Linguistics (like mathematics) is general consistent (across domains, corpora, and tasks) modular (= compositional)
Domain knowledge is specific only sometimes compositional reasonably well supported for some domains
(NLM suite of tools for BioNLP)
CLaC Modules and Architecture
discourse structure
embedding graph (typed)coreference semantic annotations parse tree, dependencies domain ontology lexical semantics
Archaeological Approach Theory
• shallow• slow and careful (small goals)• attention to context • analyzed extensively• iterative
Practice
• linguistically inspired• modular• vetted in shared tasks• extensive ablation studies• reuse in different pipelines for additional evaluation
Pertussis Seroprevalence in Korean Adolescents and Adults Using Anti-Pertussis Toxin Immunoglobulin G J Korean Med Sci. 2014 May;29(5)This finding indicates that natural pertussis infection is endemic in older adults and that Tdap booster vaccination rates at 11-12 yr of age may be insufficient. Reports from Israel and the Netherlands have already indicated that the highest pertussis seroprevalence was in older adults (13,18). Because protective immunity against pertussis may last for 4-12 yr after a primary DTaP vaccination series (19,20), natural pertussis infection could occur in older adults even after previous vaccinations.
Legend: report negation modal temporal ordering
Existence and FactsThe mean anti-PT IgG titer and pertussis seroprevalence were 35.53 ± 62.91 EU/mL and 41.4%, respectively.
The mean anti-PT IgG titers and seroprevalence were not significantly different between the age groups.
However, the seroprevalence in individuals 51 yr of age or older was significantly higher than in individuals younger than 51 yr (46.5% vs 39.1%, P = 0.017).
Legend: negation comparison contrast irrealis
Negation: explicit and implicit
trigger (different length lists available, domain specific possible) linguistic scope (derived from parser information)
We observed no genetic alterations in the IRF-4 promoter, which can account for the lack of IRF-4 expression.
entailment: no alterations? no alterations in the IRF-4 promoter?
Stanford Parse Tree
(S (NP (PRP We)) (VP (VBD observed) (NP (NP (DT no) (JJ genetic) (NNS alterations)) (PP (IN in) (NP (NP (DT the) (NN IRF-4) (NN promoter)) (SBAR (WHNP (WDT which)) (S (VP (MD can) (VP (VB account) (PP (IN for) (NP (NP (DT the) (NN lack)) (PP (IN of) (NP (NN IRF-4) (NN expression)))))))))))))))
Collapsed Typed Dependencies nsubj(observed-2, We-1)
root(ROOT-0, observed-2)neg(alterations-5, no-3)amod(alterations-5, genetic-4)dobj(observed-2, alterations-5)det(promoter-9, the-7)nn(promoter-9, IRF-4-8)prep_in(alterations-5, promoter-9)nsubj(account-13, promoter-9)aux(account-13, can-12)rcmod(promoter-9, account-13)det(lack-16, the-15)prep_for(account-13, lack-16)nn(expression-19, IRF-4-18)prep_of(lack-16, expression-19)
NEGATOR developed by Sabine
Rosenberg 1. trigger detection 2. linguistic scope determination3. focus of negation detection4. negation and modality interaction
Leader in two Shared Task competitions:
*Sem 2012 pilot task on negation focus (sole participant)
CLEF 2012 QA4MRE pilot task on interaction of negation and modality (Rank 1 and 2 of 6 with over 10% advance)
ModNegator for CLEF QA4MRE assembled from existing modules:
negation triggers from NEGATOR modality triggers from Kilicoglu
scope from NEGATOR (auxiliary rules added)
Rank 1 with wide margin (Conan Doyle data)narrow greedy
macroaverage .64 .62
microaverage .71 .68
accuracy .71 .67
Error CaseScope barrier relative clause:Dr Gallo had initially suggested that AIDS was caused by HTLV-I, a virus that noone disputes he discovered.
ModalTrigger: suggestedModal Scope: Dr Gallo had initially suggested that AIDS was caused by HTLV-I, a virus that no one disputes he discovered.
NegTrigger: no Negation Scope: Dr Gallo had initially suggested that AIDS was caused by HTLV-I, a virus that no one disputes he discovered.
NEGATOR: disputes : LABEL = NEGMODGold Standard: disputes : LABEL = NEG
Speculative Language (aka Hedging)
Also we could not find any RAG-like sequences in the recently sequenced sea urchin lancelet hydra.
Caspases can also be activated with the aid of Apaf-1, which in turn appears to be regulated by cytochrome c and dATP.
Phenotypic differences are suggestive of distinct functions for some of these genes in regulating dendrite arborization.
Speculative Language Detection Halil Kilicoglu
BioNLP 08, BioNLP 09, CoNLL 2011 same system adapted for subsequent tasks based on triggers and parser dependencies also incorporates negation, modality, etc
Embedding Predications Halil Kilicoglu
2012
Unified account of semantic phenomena beyond categorical assertions
core notion: semantic embedding categorization: comprehensive, domain-independent, consolidated embedding graph: compositional semantic interpretation genre-independent: news, molecular biology, shared tasks
Kilicoglu Processing Pipeline
Syntactic Dependency Graph 1
Dependency Graph 2
Typed Combined Embedding Graph
Sentiment Towards Vaccination
The incidence ☹ of pertussis ☹ decreased ☺ with the introduction of the diphtheria-tetanus-whole cell pertussis (DTwP) vaccination ☺ in children around the world (1), and a decrease ☺ in pertussis ☹ was also observed in Korea where the DTwP vaccination ☺ has been universally recommended ☺ for infants and children since 1954 (2).
However, pertussis ☹ began to rise ☹ in the 1990s in Europe and North America, especially in adolescents (1,3,4,5), and it has been also observed since the 2000s in Korea (2).
Summary: §1:☺ §2: ☹
Sentiment InferencesThe incidence ☹ of pertussis decreased with the intro-duction of the diphtheria-tetanus-whole cell pertussis (DTwP) vaccination.
Baseline: count sentiment words, use majority vote: ☹Lexical semantics + syntactic inferences:
NP: (The incidence of pertussis☹ ) ☹ → pertussis ☹Valence shifter verb: decreased(NP) = decreased( )☹ =☺Bonus inference:decrease(DTwP, pertussis )☺ → DTwP ☺☹
Sentiment Analysis for Tweets Canberk ÖzdemirSemEval 2015 Task 10B: rank 9 (of 40)
introduces new large semantic lexicon: Gezi
combines 5 sentiment lexica (aFinn smallest, Gezi largest)
uses linguistic scope for negation and modality (NEGATOR)
benefits from 5 point sentiment scale (strong pos, pos, neg, strong neg)
Tweets with Figurative Language Canberk Özdemir
SemEval 2015 Task 11: rank 1 (of 35) with wide margin
same system as for Task 10B
no special tailoring for figurative language apart from using training data for decision tree
linguistic notions at the moment equivalent to training for figurative language
Negation and Modality in ClacSentipipe negation triggers from Rosenberg modality triggers: modal auxiliaries scope from NEGATOR
He is hurt. -2Negation flips and dampens (*-.5) He is not hurt. +1
Modality dampens (*.5) He may be hurt. -1
features: negated-negative, modalized-negative, …
Sample Tweet Gold Annotations Need car financing? Toyota of Hollywood has you covered! http://t.co/rMFV0qYNOK
Kobe Bryant is better than the 40th best player. I would say about 25th
@TV_Exposed: Every episode of Friends is coming to Netflix on January 1st http://t.co/OiVJzaTOh9 damn i want netflix heere tooo
Equalizer tomorrow, Alexander and the Terrible Horrible No Good Very Bad Day & Fury Sunday. #lastfreemovieweekend
Current Work at CLaC Labs
Extend the trigger scope approach for
✓negation Modality sentiment annotation! modification (human monocytes)! emotion annotation! causal chain extraction! vaccine avoidance argument detection in blogs
Explicit Negation
Noun Phrases
Sundries
Underappreciated Items
numbers (IV, twice, 100,00, 100.00, 100,000) amounts (57%, 16Gb, 12ml, pH7, 7mph) locations person tense and aspect (this type of research has not been
done/was not done/is not done/is not being done) modifier semantics (prenominal modifiers: long-term
prospective studies, adverbials: virtually no risk)
Junk Language? there is much information in ignored language
linguistic treatments are universal, can be adapted to domain specific usage
a suite of general, language oriented modules should be considered as a form of preprocessing of the data, followed by domain specific treatments
this can significantly improve the downstream specialized processing
Conclusion linguistic principles form a solid baseline for modular,
adaptable NLP modules
trigger-linguistic scope approach to speculative language, negation, and modality proved effective
parsing feasible, even for tweets, with preprocessing
extra-propositional parts of text prove effective in task-oriented evaluation
Headnoun, Base NP, MaxNP, PP<MaxNP>
<BaseNP> a 1993 <headnoun> survey </headnoun> </BaseNP>
<PP> of pediatricians and family practitioners </PP></MaxNP>
overly simplistic heuristic: in <MaxNP> <BaseNP> the news in California <BaseNP> <MaxNP>
ellipsis, coordination, …<MaxNP> <BaseNP> the health <BaseNP>
of <MaxNP> <BaseNP> vaccinated vs unvaccinated children </BaseNP> </MaxNP> </MaxNP>
Causal Triggers
Causality Michelle Khalife
is pervasive in language conveys important information trigger lists exist for biomedical texts triggers require predicate argument structure