engineering systems biology lots of questions...03/31/2008 ece690 bio-signal processing (sokhansanj)...
TRANSCRIPT
Engineering Systems BiologyEngineering Systems BiologyLots of Questions...Lots of Questions...
Bahrad A. Sokhansanj,Bahrad A. Sokhansanj, PhDPhDMolecular Health Engineering LaboratoryMolecular Health Engineering Laboratory
School of Biomedical Engineering, Science &School of Biomedical Engineering, Science &Health Systems, Drexel UniversityHealth Systems, Drexel University
ECE690 Biological Signal Processing IIECE690 Biological Signal Processing IIMarch 31, 2008March 31, 2008
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 2
Engineering Quantitative Biology
( )
12233
23112
1211
PPdt
dP
PPdt
dP
Pdt
dP
µµ
µµ
µµ
+=
!=
+!=
Building Models Quantitative Measurement
Molecular Health Engineering Laboratory
Problem: Genetic / Disease-Related Variation in Cellular Regulation
Solution: Integrate quantitative biomeasurement and modeling
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 3
Understanding Biological Regulation:The Central Dogma
Genes(DNA)
OtherCells
Function/Environment
Message(RNA)
Proteins
Regulation
A coordinated network of complex biochemicalprocesses involving RNA, DNA, proteins, andother chemicals.
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 4
Central Dogma
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 5
Stimulus, e.g. Reactive Oxygen Species (ROS; oxidative stress)
Cell Death? Survival?
Biomolecular Networks are Complicated
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 6
Growth Signaling Network
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 7
The λ Phage Biological “Circuit”: A Role forElectrical Engineering?
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 8
The Landscape of “Post-Genome” Biology
Epidemiological Data Comparative Genomics/Proteomics
Microarrays & DNA chips
Proteomics (MS & 2DGE)
Quantitative blood & tissue analysis
1000 2000 3000 4000
0
10
20
30
40
1000 2000 3000 4000
1192.36
1878.07
2572.61
2734.72
0
10
20
30
40
1000 2000 3000 4000
1246.64
1482.071689.18
2150.5
2376.01
Multiscale Dynamic Imaging
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 9
Microarray Experiments
Microarrrays measure the relative change in mRNAconcentration (gene expression) under a change inconditions.
Amino Propyl Silane
Glass Microscope Slide
cDNA targets
cDNA probes
Two Samples
“Targets”
mRNA mRNA
cDNA cDNA
red green
dye dye
Mix & Hybridize
massagemassage
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 10
Microarrays Produce Ugly Data
• Microarray fabrication isinconsistent: experiments havepoor repeatibility– size, shape, and alignment of
spots varies from array to array– defects in the slide and different
washing protocols result in intra- &inter-slide variations inbackground
• Results are highly sensitive toimage analysis– spot recognition, intensity
quantification and normalization,background subtraction
• DNA chips are more consistent,but still not perfect - and they costmuch more
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 11
Operon Signals in Expression Data
-3
-2
-1
0
1
2
3
0 20 40 60 80 100 120 140 160 180 200
ORF
2/1
4/1
4/2
Log Expression
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 12
Example of proteomic data from 2D gels
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 13
Metabolic gene pathways of Y. pestis
Low Ca
Metabolomic datatypes - Popular Platforms
1H Nuclear MagneticResonance Spectroscopy
(NMR)
Liquid Chromatography –Mass Spectrometry
(LC-MS)
Metabolomic datatypes - Problem
• Metabolomic datasets (e.g.NMR, MS-based) are large,open systems
• Physical interpretation of thedata is challenging
Transcriptomics – relatively straightforward interpretation
timem/z
DB/+DB/DB
relative abundance
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 16
Molecular Signatures – Genome, Proteome,Metabolome, Glycome, etc.
• New technologies allow large scale, parallel measurementof cell state:– transcription (mRNA expression; gene chips, RT-PCR)– translation (protein expression; gels, MS, protein chips)– protein modification (gels, MS)– protein-protein interaction (2-hybrid, protein chips, MS)– metabolites (MS, NMR)– carbohydrates and glycosylation (MS, ?)– (also large scale phenotypic changes)
• At first order, we can either/both– identify groups of cells/tissues/populations that share
common patterns– detect patterns of gene/protein/metabolites/etc. that correlate
with previously identified phenotypic groups
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 17
Many Ways to Identify Signatures
• Identifying major “components” of variation (potentiallysomething that has to be removed from data, such as afundamental difference between sampled groups)– singular value decomposition (SVD), principle component
analysis (PCA), etc.
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 18
Singular Value Decomposition
http://public.lanl.gov/mewall/kluwer2002.html
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 19
Complex Regulation Drives Yeast Cell CycleYeast cyclins are proteins responsible forregulating cell cycle transitions. Cyclin gene(mRNA) expression data is taken fromSpellman, et al., Molec. Biol. Cell, 9:3273, 1998.
http://cyberia.cfdrc.com/datab/Applications/cell_tissue_bio/cellcycle/cellcycle.html
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 20
Singular Value Decomposition(Yeast Cell Cycle Microarray Data)
(based on 38 genes, elu expression dataset)
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 21
Many Ways to Identify Signatures
• Identifying major “components” of variation (potentiallysomething that has to be removed from data, such as afundamental difference between sampled groups)– singular value decomposition (SVD), principle component
analysis (PCA), etc.• Finding groups within data
– clustering– self-organizing maps– support vector machines
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 22
Clustering (k-means)
http://rana.lbl.gov/FuzzyK/images/figure3.html
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 23
Many Ways to Identify Signatures
• Identifying major “components” of variation (potentiallysomething that has to be removed from data, such as afundamental difference between sampled groups)– singular value decomposition (SVD), principle component
analysis (PCA), etc.• Finding groups within data
– clustering– self-organizing maps– support vector machines
• Separating known groups– univariate methods (i.e. B-Tests, T-Tests on each gene,
ANOVA on each gene)– horrible “capitalization on chance” problems– linear discriminant analysis / canonical variate analysis
• these methods can be generalized for undetermined data, thoughthe relative magnitudes of variables becomes significant in thatcase (but that filters out potentially noisy data) OR you getcapitalization by chance by using stepwise methods
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 24
Linear Separation – Group Classification
NonlinearClassification?(Kernel methods)
Canonical Variate Analysis(Linear Discriminant Analysis)
Find optimal linearcombinations ofvariables that maximizeinter-group differenceswhile minimizingintra-group differences.
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 25
What Do We Get From Signatures
• Pattern for discrimination between groups (responders,non-responders, different genetic populations, etc.)– therapeutic design– diagnostics
• Lists of Genes– what genes appear to be the most significant in determining
the difference between groups or cause the formation ofdistinct patterns within data?
• you get long lists when you “capitalize on chance” usingunivariate methods
• shorter lists from multivariate methods or when you use “honest”statistics modified for variable selection
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 26
Model-Driven Experimental Design
Genome Sequence (DNA)Homology Analysis
Competing HypotheticalGene Networks
Gene Knockout /Overexpression
Gene Expression Microarrayand/or
Protein Expression (MS / 2DGE) Design Optimal Experiment(s)
COMPUTATIONCOMPUTATION
EXPERIMENTEXPERIMENT
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 27
SimulationTime scale
BiologicalDetail
Many Cells
SingleGene
SingleMolecules
Atoms
fsec µsec seconds min hrs
Single Cell
10-100 GeneNetwork Stochastic Simulation
Metabolic networks
System of ODEsEvolution of population
statistics
days
Molecular DynamicsBinding constants
Structural effects of proteinmutations
Finite StateSystem Simulation
Homology-basedstructure prediction
Quantum ChemistryReactive mechanismsActive site chemistry
Modeling at Different Scales
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 28
Oxidative Stress & Human Health
Lung
Joints
Heart
SkinKidney
Eye
GI
Vessels
Multi-organ Brain
TraumaStrokeAlzheimer’s Disease
COPDAsthmaARDSHyperoxia
RheumatoidArthritis
AngioplastyKeshandisease(seleniumdeficiency)
BurnDermatitisPsoriasis
Renal graftGlomerulonephritisDegenerative retinal damage
Cataractogenesis
Ischemic BowelEndotoxinLiverInjury
VasospasmArtherosclerosis
RadiationAgingCancerInflammatory-Immune injuryIschemia-ReflowDiabetes
OXIDATIVE STRESS
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 29http://greengenes.llnl.gov/repair/html/overview.html
DNA Damage:Oxidative StressAdduct FormationChromosome Break
Oxidative Stress:Low Dose RadiationEndogenous MetabolismEnvironmental ToxinsHypoxiaIschemiaTraumaNeurodegeneration
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 30
DNA Repair Mechanisms (in E. coli)
http://www.web-books.com/MoBio/Free/Ch7G.htm
Base Excision Repair(Single Strand Break Repair) Nucleotide Excision Repair Mismatch Repair
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 31
Disease
Cancer
Aging
Environmentionizing radiation,sunlight, pollution
Lifestyleobesity, smoking,alcohol
Biologyinflammation, trauma,ischemia, aging,metabolism
Input
DNA Base Modifications
8-oxoguanineuracilthymine glycol5-hydroxyuracil3-methyladenineabasic sites... etc.
Mathematical Equations
LESION
3’flap-gap
5’nicked
Nicked
ABASIC
5’flap-gap
Gapped
Nicked
REPAIRED
LPR, flap
NickedGapped
Base Excision Repair Pathways
TranscriptionalDefects
Mutations
Cell Death
Output
Persistent DNA Damage
DNA Base Excision RepairMolecular Health Engineering
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 32
U.S. Department of Energy Human Genome Program, http://www.ornl.gov/hgmis.
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 33
U.S. Department of Energy Human Genome Program, http://www.ornl.gov/hgmis.
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 34
U.S. Department of Energy Human Genome Program, http://www.ornl.gov/hgmis.
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 35
Table 2 Amino acid substitution variants identified in DNA repair and repair-related genes
Gene
name Exon Codon
Common
residue
Variant
residue
Allele
frequenc
y
Mouse
residue cDNA sequence 5'3'
APE1 3 51 Gln His 0.03 Gln GAT CA(G/C) AAAAC
APE1 3 64 Ile Val 0.01 Ile TCAAG (A/G)TC TGC
APE1 5 148 Asp Glu 0.33 Glu GGC GA(T/G)GAGGA
APE1 5 241 Gly Arg 0.01 Gly GCTTC (G/A)GGGAA
FEN1 No
variantsLIG1 3 24 Ala Val 0.01 Thr GGAG G(C/T)A TCCA
LIG1 4 62 Arg Trp 0.01 Gln CGGCC (C/T)GG GTC
LIG1 9 249 Gly Glu 0.01 Gly GCCA G(G/A)GGCTC
LIG1 10 267 Asn Ser 0.02 Asn TTAC A(A/G)TCCTG
LIG1 13 369 Val Ile 0.01 Ile AGTCC (G/A)TC CGG
LIG1 13 409 Arg His 0.01 Cys GTTC C(G/A)C GACA
LIG1 16 480 Met Val 0.01 Val CAGCC (A/G)TG GTG
LIG1 20 614 Thr Ile 0.01 Thr GGTC A(C/T)A TCCT
LIG1 22 673 Glu Asp 0.01 Gln CGT GA(G/T)CCCCT
LIG1 22 677 Arg Leu 0.01 Arg TTCC C(G/T)G CGCC
LIG3 18 780 Arg His 0.03 Cys GTCC C(G/A)C AAGG
LIG3 19 811 Lys Thr 0.01 Lys TGCA A(A/C)GCCTT
LIG3 21 899 Pro Ser 0.01 Thr AGAAC (C/T)CT GCG
POLB 1 8 Gln Arg 0.01 Gln GCCG C(A/G)G GAGA
POLB 7 137 Arg Gln 0.006 Arg TCAG C(G/A)AATTG
POLB 12 242 Pro Arg 0.005 Pro GCTT C(C/G)C AGTA
POLD1 1 19 Arg His 0.12 Arg GGCC C(G/A)T GGGG
POLD1 1 30 Arg Trp 0.006 Ser CACCT (C/T)GG CCA
POLD1 3 119 Arg His 0.15 Arg ATCC C(G/A)C GGCT
POLD1 4 173 Ser Asn 0.05 Ser CATC A(G/A)CCGGG
POLD1 4 177 Arg His 0.003 Arg CAGT C(G/A)CGGGG
POLD1 19 849 Arg His 0.011 Arg ACTG C(G/A)CCGCC
POLD1 26 1086 Arg Gln 0.01 Arg GGTG C(G/A)GAAGG
Mohrenweiser HW, Xi T, Vazquez-Matias J, Jones IM. Identification of 127 amino acid substitution variants in screening 37 dna repair genes in humans. Cancer Epidemiol. Biomarkers & Prev., 11: 1054-1064, 2002.
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 36
R237A
L104R
E126D
Molecular Modeling of Amino Acid Variants (Ape1)
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 37Hadi, M. Z., Coleman, M. A., Fidelis, K., Mohrenweiser, H. W. and Wilson, D. M. III, Nucleic Acids Res., 28, 3871-3879, 2000.
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 38
• Differential equations for eachenzymatic activity: kcat, KM andprotein concentrations taken fromexperimental data• Based on physical measurementsof cell: assume well-mixed proteins,but kcat/KM for slowed diffusion in thenucleus• Model is consistent withexperimental mechanistic data (i.e.predominance of short patch BERand coordination between proteins)
Sokhansanj et al. NAR 2002
Predictive BER System Model
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 39
Percentage increase in Steady State Damage (for continuous formation of damage) and Repair Time (foran initial amount of damage to be cleared) given sub-functional, potentially non-lethal variants
% Increase in …
Protein Variant
% of Wild Type
Enzyme Activity
Steady State
Damage
Time to
Repair
Ogg1 (excision) S236C 63% 3% 9%
Ogg1 (excision) Hypothetical 10% 29% 341%
Ape1 (5’ -incision) D148E 95% 0% 0%
Ape1 (5' -incision) R237A 35% 2% 1%
Pol! (gap-filling) Hypothetical 50% 4% 2%
Pol! (gap-filling) Hypothetical 10% 13% 7%
Pol! (5'-dRp lyase) Hypothetical 50% 4% 1%
Pol! (5'-dRp lyase) Hypothetical 10% 32% 56%
Lig1 Hypothetical 50% 21% 34%
Sokhansanj and Wilson CEBP 2006
Pathway Impact of Variants
03/31/2008 ECE690 Bio-Signal Processing (Sokhansanj) 40
Acknowledgments
• Students– Andrew Atkins– He Zhao– Chris Abdullah– Krista Szymborski– Suman Datta, MS (Merrimack)– Geoff Gipson, PhD (with Drs. Sue
Connor and Kay Tatsuoka of GSK)
• Website:http://www.pages.drexel.edu/~bas44
• Email:[email protected]
• Drexel Biosciences– E. Gardner– A. Saunders– M. Howett– M. Lechner
• Collaborations– D. M. Wilson, III (NIH)– X. Hu (Drexel IST)– G. Rose (Drexel ECE)– K. Pourrezaie (Drexel Biomed)– H. Nilsen (Oslo)