proteomics, systems biology and knowledge-mining in drug and biomarker...
TRANSCRIPT
Proteomics, Systems Biology and Knowledge-Mining in Drug and
Biomarker Discovery
Mark Boguski
December 5, 2007
2
3
PAM-250
4
The practice of The practice of ““proteomicsproteomics”” extends back to the early 1970sextends back to the early 1970s
19251925--19831983
Dayhoff, M.O. and Eck, R.V. (1970)MASSPEC: a computer program for complete sequence analysis of large proteins from mass spectrometry data of a single sampleComputers in Biology and Medicine 1:5-28
5
Dayhoff, M.O. and Eck, R.V. (1970)MASSPEC: a computer program for complete sequence analysis of large proteins from mass spectrometry data of a single sampleComputers in Biology and Medicine 1:5-28
““This new method should make This new method should make feasible various experiments, for feasible various experiments, for
example, with single example, with single organisms.organisms.””
19251925--19831983
The practice of The practice of ““proteomicsproteomics”” extends back to the early 1970sextends back to the early 1970s
6
SCIENTIFIC AMERICANSCIENTIFIC AMERICAN
7
The Scope and Challenge of Systems Biology
Modeling
Omics Complex cell systems assays
Meters
Molecules Pathways Cells Tissues Humans
10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 1
Seconds 10-6 102 104 105 108
ScaleAdapted from: Butcher, E.C. et al. (2004) Systems biology in drug discovery.
Nature Biotechnology 22, 1253 - 1259
8
System Biology: What are the deliverables?function z=power(x,y),z=x^y,endfunction
function z=root(x,y),z=y^(1/x),endfunction
function xdot=f(t,x)
// compartment_compartment id: compartment compartment_compartment=3e-12;
// k1f_v1 id: k1f reactionID: v1
k1f_v1=0.003;
<?xml version="1.0" encoding="UTF-8" ?>- <sbml xmlns="http://www.sbml.org/sbml/level2"metaid="metaid_0000001" level="2" version="1">
- <model metaid="metaid_0000002"name="Kholodenko1999_EGFRsignaling">
- <notes>- <body
xmlns="http://www.w3.org/1999/xhtml">- <p align="left">
- <font face="Arial, Helvetica, sans-serif
Conceptual model
Mathematical model
Computer model
Kholodenko et al. Targets of EGFR in Tumor Cells JBC 274:30169, 1999
Validation Data
SB Markup Language
9
The Process of a Systems Biology Analysis
Butcher et al. (2004) Nature Biotechnology 22, 1253 - 1259
10
5
- 30- 25- 20- 15- 10- 50
0 50 100 150 200
0 1 2 3 4 5-2
0
2
4
6
time
contro
l
cyto
0 1 2 3 4 5-1
0
1
2
3
time
nuc
0 1 2 3 4 5-2
0
2
4
6
time
drug
0 1 2 3 4 5-1
0
1
2
3
time
...
“omics”Experiments
MathematicalModels
Monitoring and modeling network activity via proteomics measurements
11
Crude cell lysates
...Spotting Exp. A
Exp. B
Exp. C
Exp. D
Sample dilution
Ab1
Ab2
Ab3
Ab6
Ab4
Ab5
Analysis Assay
Reverse Protein Arrays: probing pathways with antibodies
Key features of arrays: • Scalable in terms of samples and analytes• Small sample volumes (0.5 nl per spot)• Large degree of automation throughout the process
12
Reverse arrays: sample requirements & capacity
1 sample
4 dilutions induplicate
13
Level of detection: How much protein is needed?
14
Planar Waveguide Principle - for HighSensitivity Fluorescence Microarray Detection
free label
Imaging of surface-confined fluorescence
excitation of bound label
CCD camera
microarray on chip
15
Kinetics of ERK phosphorylation in T-cell signaling
16
Cell lines, disease models
Cell lysate
Antibodies toKey nodes in
pathwaysDisease “signature” &Systems Response Profiles
Proteome &Pathway
Annotations
- Cmpd (or RNAi)
+ Cmpd (or RNAi)…incubate…
Cell lysate
“Reverse” arrays
+
eIF4E
LKB1
mTOR
PP PPP
S2448T2446
MO25 STRAD
AMPKα
P172γ
4E-BP1PP
Branch ChainAmino Acids
Raptor GβL
Rheb14-3-3
Growth Factor ReceptorsIP3K
PDKPKB
AMPK β
TSC1TSC2
p70S6K
mTOR
S2448T2446Raptor
eIF4EeIF3
eIF4B
P
P P P
4E-BP1PP PTranslation
initiation
VEGF
FKBP12mTOR
Rictor GβL
Rho PKCα
RasRafMek 1/2Erk 1/2
PPP
HIF-1α
IRS1
PathwayModel
17
Customized Signaling Network Database
Pathway nodes whereantibodies are available
(Only for illustration)
Mouse-over shows Abspecificity
Hyperlinksto Ab web Reports
18
The Process of a Systems Biology Analysis
Butcher et al. (2004) Nature Biotechnology 22, 1253 - 1259
19
Curating the literature
20
Mining the “Bibliome”
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Users / W
eekday
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
Bas
e Pa
irs
(Mill
ions
)
GenBank Base Pairs
Users per Weekday
+
Seventeen Years of Growth:NCBI Data and User Services
Growth of NCBI Data and User Services
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Users / W
eekday
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
Bas
e Pa
irs
(Mill
ions
)
GenBank Base Pairs
Users per Weekday
+
BLASTEntrez
GenBankat NCBIdbEST
3D StructureNetwork Entrez
WWWdbSTS
BankItGenomesTaxonomy
OMIMGeneMapCn3DUniGene
PubMedPSI-BLASTVASTePCR
Microbial Genomes PHI-BLASTCGAP
Human GenomeLinkOut LocusLinkRefSeqdbSNP
PubMed CentralBLINKMapViewerGEOGeneRIFs
dbMHCBookShelfHuman Genome-Transcripts alignmts
WGSHLA HaplotypesHuman Genome -TPA
Entrez GenesMouse CompositeGenome
Gnomon
PubChemTrace ArchiveCCDSCancer ChromosomesEnvironmental Samples
Public AccessInfluenza SequencesGenSATGeneTestsWhole Genome Assoc
Growth of NCBI Data and User Services
GenBank191Million Bases
ESTs966 Million Bases
High Throughput Genomes12 Billion Bases
Shotgun Sequencing142 Billion Bases (not including 1.1 Trillion Trace Bases)
WholeGenome
Sequences
ESTStandardGenBankEntries
EST
EST
HTG
HTG
HTG
StandardGenBankEntries
HTG20011997
2006
1994
Growth of Sequence Data and Shift in Data Types
24
Dickman S (2003) The challenges of searchingthe scientific literature. PLoS Biol 1(2): e48
25
Drug Targets, Biomarkers
Computed associations
Filtering and ranking
Evaluation by human expertsMEDLINE/Embase
6,000,000 articles
Human Genome 24,000 genes
Disease Area ~10-100 disease
entities
CAS/IDDB 10,000 Patents
Text-mining adapted for Biomedical Discovery
Feedback loop with different filters and/or
ranking criteria
User interfaces
26
Drug Targets, Biomarkers
Computed associations
Filtering and ranking
Evaluation by human experts
Feedback loop with different filters and/or
ranking criteria
MEDLINE/Embase6,000,000 articles
Human Genome 24,000 genes
Disease Area ~10-100 disease
entities
CAS/IDDB 10,000 Patents
Text-mining adapted for Biomedical Discovery
Ultralinks
User interfaces
The Ultralink – an expert system for contextual hyperlinking in knowledge management
OR
Beyond Google® and PubMed
28
Connecting Knowledge Corpora
Indexing of large heterogeneous data collections (databases, full texts) to enable semantic expansionInformation retrieval and extraction, entity recognition, semantic enrichmentKnowledge Map (for navigating the conceptual network)Terminology Hub (thesauri and ontologies)Ontology-associated rules
29
Examples of entities that constitute our terminologies
Chemical entities – IUPAC names, trivial names, trade names, compound codes…Biological entities – targets, genes/protein, receptors, ligandsmodes and mechanisms of actions..Diseases, Indications, Side Effects, ContraindicationsInstitutions, Affiliations, PeopleGeographic locations
30
The Ultralink can be called from Internet Explorer
Internet Explorer Integration
Plug in
Internet Explorer Integration
Plug in
Web Page Tagged Document
2
Sends the document for
analysis
3
Gets back tagged parts
1
User requests for analysis
4
Injection of specific HTML
tags
GPS Lexical Analysis Server ToolsGPS Lexical Analysis Server Tools
Lexical ExtractLexical Extract
ZoningZoning
TaggingTagging
DocStructuresDocStructures
Meta-RulesMeta-Rules
TerminologyTerminology
Web Service (WSDL)Web Service (WSDL)
31
Annotations of records in PubMed
Activate UltraLink
32
Annotations of any web page
33
“Mouse-over”
“Click”
Color coding according to concept type, e,g,Yellow = Gene Name; Tan = Institution
34
BLAST Interface
35
More Ultralink Examples
36
Ultralink technology is integrated with MS Office
Internet Explorer Integration
Plug in
Internet Explorer Integration
Plug in
2
Sends the document for
analysis
3
Gets back tagged parts
1
User requests for analysis
4
Injection of specific HTML
tags
GPS Lexical Analysis Server ToolsGPS Lexical Analysis Server Tools
Lexical ExtractLexical Extract
ZoningZoning
TaggingTagging
DocStructuresDocStructures
Meta-RulesMeta-Rules
TerminologyTerminology
Web Service (WSDL)Web Service (WSDL)
Office Document
Tagged Document
37
Annotations on full text in a Word document
38
Conclusions (for Microsoft)
• Biology is inherently messy, phenomenological and contingent
• Much progress has been made in information integration, much less in standardization
• Software tools and database systems will still have to deal with the complexities and multiple dimensions of biological systems for the foreseeable future
39
Modeling
Omics Complex cell systems assays
Meters
Molecules Pathways Cells Tissues Humans
10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 1
Seconds 10-6 102 104 105 108
ScaleAdapted from: Butcher, E.C. et al. (2004) Systems biology in drug discovery.
Nature Biotechnology 22, 1253 - 1259
Should we be optimistic?
40
Modeling
Omics Complex cell systems assays
Meters
Molecules Pathways Cells Tissues Humans
10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 1
Seconds 10-6 102 104 105 108
ScaleAdapted from: Butcher, E.C. et al. (2004) Systems biology in drug discovery.
Nature Biotechnology 22, 1253 - 1259
Should we be optimistic?
“... [models], insofar as they represent informational patterns abstracted from their instantiation in a biological substrate, can never fully capture the embodied actuality, unless they are as prolix and noisy as the body itself.”
N.K. HaylesHow We Became Posthuman: Virtual Bodies in Cybernetics,
Literature and Informatics(Univ. Of Chicago Pres, 1999)
41
Modeling
Omics Complex cell systems assays
Meters
Molecules Pathways Cells Tissues Humans
10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 1
Seconds 10-6 102 104 105 108
ScaleAdapted from: Butcher, E.C. et al. (2004) Systems biology in drug discovery.
Nature Biotechnology 22, 1253 - 1259
Should we be optimistic?
Yes -- Systems Biology approaches will lead to more rigorous definitions of biological entities and relationships
and to better organization of our knowledge.
42