Applied text miningLars Juhl Jensen
exponential growth
~40 seconds per paper
teach it specific tricks
information retrieval
named entity recognition
information extraction
text/data integration
medical text mining
information retrieval
find the relevant papers
user-specified query
“yeast AND cell cycle”
dynamic query expansion
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming
step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation
and degradation
no tool will find that
named entity recognition
identify the concepts
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming
step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation
and degradation
comprehensive lexicon
cyclin dependent kinase 1
orthographic variation
flexible matching
upper- and lower-case
spaces and hyphens
cyclin dependent kinase 1
cyclin-dependent kinase 1
prefixes and suffixes
Pafilis et al., PLOS ONE, 2013
manually annotated corpus
the pragmatic way
augmented browsing
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming
step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation
and degradation
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming
step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation
and degradation
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009reflect.ws
information extraction
formalize the facts
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming
step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation
and degradation
NLPNatural Language Processing
part-of-speech tagging
what you learned in schoolpronoun pronoun verb preposition noun
multiword detection
Gene and protein namesCue words for entity
recognitionVerbs for relation extraction
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
Saric et al., Proceedings of ACL, 2004
extract stated facts
the pragmatic way
guilt by association
within paragraphs
undirected associations
text/data integration
protein associations
Szklarczyk et al., Nucleic Acids Research, 2015string-db.org
STRING + 300k chemicals
Kuhn et al., Nucleic Acids Research, 2014stitch-db.org
subcellular localization
Binder et al., Database, 2014compartments.jensenlab.org
tissue expression
tissues.jensenlab.org Santos et al., submitted, 2015
disease–gene assocations
diseases.jensenlab.org Frankild et al., Methods, 2015
curated knowledge
Letunic & Bork, Trends in Biochemical Sciences, 2008
experimental data
computational predictions
gene neighborhood
Korbel et al., Nature Biotechnology, 2004
different formats
different identifiers
variable quality
common identifiers
score calibration
why so many resources?
Swiss army knife syndrome
EMBO Practical Course Computational Biology:Genomes to SystemsPuerto Varas, 3-9 April 2014
Thanks for your attention!
141