eoy
TRANSCRIPT
Finding Contradictions, Contrasts and Negations in ProteinProtein Interactionsin the Biomedical LiteratureUsing Text Mining
Farzaneh Sarafraz
COMP7000 PresentationText Mining GroupSupervised by Dr. Goran Nenadić28 November 2008
Natural Language Processinga.k.a Computational Linguistics
Natural Language Processing
Natural Language Processing
Text
Easily shared Primary information source Most uptodate
Unstructured No explicit data
Text Mining
IEIR DM
Finding Contradictions, Contrasts and Negations in ProteinProtein Interactionsin the Biomedical LiteratureUsing Text Mining
Text Mining in Biomedicine
Biomedical Scientific Literature >17M articles from >5K journals
since 1950s adding 2000 every day Impossible for humans to manage Specific (rather peculiar) language
Finding Contradictions, Contrasts and Negations in ProteinProtein Interactionsin the Biomedical LiteratureUsing Text Mining
ProteinProtein Interactions
ProteinProtein Interactions
Finding Contradictions, Contrasts and Negations in ProteinProtein Interactionsin the Biomedical LiteratureUsing Text Mining
Example
"Our results indicate that gp120 from two different strains of HIV binds to a larger region of the CD4 protein than previously described."
Example
gp120 synonyms
− Transmembrane Glycoprotein− Envelope Glycoprotein− Surface Glycoprotein− SU− gp160− Envelope Surface Glycoprotein gp160 Precursor
− gp41− TM− ENV− HIV1 gp8
Example
Binds is almost the same as− Interacts with− Frictionates− Associates with− Activates− Colocalizes with− Cleaves
Example
CD4+ T T4(CD) CD4+ (T) CD4(+) T cellCD4, T CD4 (T) CD4(T) CD4 TcellT CD4 CD4(+)T CD(4+) T CD4(+) TcellCD4(+) T CD4+T CD4 T CD4(+)T cellCD4 T CD4(+)T CD4+ T cell CD4+TcellT4+ (CD) CD4+T CD4, T cell CD4(+)TcellT4 (CD) T (CD4) CD4+ Tcell CD4 T cell
CD4 can be expressed as
Even after all this...
The chimpanzeebased CD4(8192) peptide, however, which differs from the human peptide by a single amino acid substitution (E for G) at position 87, was considerably less potent than the human CD4(8192)based peptide congener to inhibit HIV1induced cellcell fusion.
Contradiction and Contrasts
Author A reports p Author B reports ¬p
We have p under conditions q But we have ¬p under conditions q'
Negations
Linguistic− "Protein A does not interact with protein B."− "We lack evidence that A interacts with B."
Biological− "Protein A inhibits protein B."− dephosphorylates / depolymerizes− downregulates (vs. upregulates)− etc.
Finding Contradictions, Contrasts and Negations in ProteinProtein Interactionsin the Biomedical LiteratureUsing Text Mining
Framework
HIV1 and Human ProteinProtein interactions− Manually over 7 years− >3000 journal papers− >5000 tuples− Gold standard
Other negative reports− Journal of Negative Results in BioMedicine
Other gold standards
Detecting ProteinProtein Interactions
Recognize gene/protein names− State of the art ~ 87%
Identify gene/protein names Detect the interaction and its qualities
− 70 "different" interactions in reference DB
Protein Name Identification
1500 human proteins− State of the art ~ 87%− Available tools ~ 15%− Our method ~ 35%
20 HIV proteins− No available tool− Our method ~ 95%
Applications
Contradictions and Contrast
Other diseases
Negations
New HIV1 literature
Achieved so far & plan for future
Reproduce the HIV1 interactions database Designed an interaction ontology Identify patterns of negation, contradiction,
contrast Use the above data to increase the annotation
accuracy
Evaluation
Widely used evalutation measures− Precision, Recall, FScore− Sensitivity and Specificity
Benchmarks and datasets used in challenges Manually annotated gold standards
Summary
Finding Contradictions, Contrasts and Negations in ProteinProtein Interactionsin the Biomedical LiteratureUsing Text Mining