expresso and chips studying drought stress in plants with cdna microarrays lenwood s. heath...
TRANSCRIPT
Expresso and ChipsStudying Drought Stress in Plants
with cDNA Microarrays
Lenwood S. Heath
Department of Computer Science
Virginia Tech, VA 24061
Expresso and Chips Fordham University May 6, 2003
Outline
• Expresso Team• Drought Stress in Plants• Microarray Technology• Expresso System• Biological Results• Networks in Biology• Future
Expresso and Chips Fordham University May 6, 2003
EXPRESSO TEAM
VT
Ron Sederoff
Lenny Heath
Naren Ramakrishnan
Layne Watson
Cecilia Vasquez-RobinetShrinivasrao Mane Allan SiosonMaulik ShuklaHarsha Rajasimha
Jonathan Watkinson
Boris Chevone Ruth Grene
Andrew McElrone
Catarina Moura
Duke
NCSU
Expresso and Chips Fordham University May 6, 2003
Outline
• Expresso Team• Drought Stress in Plants• Microarray Technology• Expresso System• Biological Results• Networks in Biology• Future
Expresso and Chips Fordham University May 6, 2003
Grand GoalDevelop explanatory and predictive
models of phenomena occurring within plant cells in response to
drought and other oxidative stresses
Expresso and Chips Fordham University May 6, 2003
Questions Currently Addressed in the Grene Lab
1. Big picture: What makes a plant successfully acclimate to drought stress?
2. Specifically: Which changes in gene expression are associated with physiological acclimation to drought stress?
3. Goal: Using Expresso and the smarts of several computer scientists, can we construct, or amend, pathways depicting the perception of drought stress, and successive events which culminate in acclimation?
4. Future Work: Which changes in the metabolite population are associated with acclimation?
Expresso and Chips Fordham University May 6, 2003
Long term objective of drought experiments in Expresso
Develop explanatory and predictive models of phenomena occurring within plant cells in response to drought using cDNA microarrays and metabolomics.
Gene Expression
Stress perceptionMetabolic acclimatory responses
Protective responses - LEAs, antioxidants
Expresso and Chips Fordham University May 6, 2003
Responses to Environmental Signals
Expresso and Chips Fordham University May 6, 2003
0
-2
-10
-15
DAYS
=
wat
er p
ote
nti
on
al (
bar
s)
Cycle
ICycle
IICycle
III
Experiment2:Cycles of
SevereDroughtStress
DRY DOW
N
DRY DOW
N
Water given
Water given
Water given
Water withheld Water
withheld
Water withheld
RE
CO
VE
RY
0
-2
-10
-15
DAYS
=
wat
er p
ote
nti
al (
bar
s)
Experiment 1:Cycles of
MildDroughtStress
DR
Y DO
WN
DR
Y DO
WN
DR
Y DO
WN
Water withheld
Water given
Water given
Water given
Water withheld
Water withheld
Water withheld
RE
CO
VE
RY
RE
CO
VE
RY
RE
CO
VE
RY
Cycle
ICycle
IICycle
III
DRY DOW
N
RE
CO
VE
RY
RE
CO
VE
RY
DR
Y DO
WN
Water given
Water withheld
RE
CO
VE
RY
Cycle
IV
= PS (photosynthesis)
= Needles harvest
Expresso and Chips Fordham University May 6, 2003
Outline
• Expresso Team• Drought Stress in Plants• Microarray Technology• Expresso System• Biological Results• Networks in Biology• Future
Expresso and Chips Fordham University May 6, 2003
How the microarray process works(courtesy J.M. Trent)
Expresso and Chips Fordham University May 6, 2003
Flow of Procedures
Hypotheses
Select cDNAs
PCR
Extract RNA
Replication and Randomization
Reverse Transcription and
Fluorescent Labeling
Robotic Printing
Hybridization
Identify Spots
Intensities
Statistics
Clustering
Data Mining, ILP
CS and Biologists
Biologists
CS
Confirm with RT-PCR
Experiment
PS, water pot.
Expresso and Chips Fordham University May 6, 2003
Key Steps in cDNA Microarrays
• Probe generation and microarray design– What to put on the chip?– How to amplify desired genetic material?– Where should selected probes be placed?
• Target preparation and hybridization– How to isolate samples from control and treated
tissues?– How to ensure suitable conditions for hybridization?
• Data generation and analysis– What methods are available for image processing?– How to accommodate errors in downstream analysis?– How to validate results from microarray studies?
Expresso and Chips Fordham University May 6, 2003
Outline
• Expresso Team• Drought Stress in Plants• Microarray Technology• Expresso System• Biological Results• Networks in Biology• Future
Expresso and Chips Fordham University May 6, 2003
• Integration of design and procedures
• Integration of image analysis tools and statistical analysis
• Data mining using inductive logic programming (ILP)
• Closing the loop
• Integrating models
Expresso: A Problem Solving Environment (PSE) for Microarray Experiment Design and Analysis
Expresso and Chips Fordham University May 6, 2003
Probe Selection
• Biologists provide keywords • Keywords used to search Arabidopsis database at
TIGR• Arabidopsis proteins used to BLAST against pine EST
database– Cut-off value of 10e-4– Select ESTs close to 3’ end of Arabidopsis protein
(without compromising match)
Expresso and Chips Fordham University May 6, 2003
Example of cDNA Selection: bZIPs
At1g58110 bZIP family transcription factor BI397695 NXPV_104_B12_FAt2g21230 bZIP family transcription factor AW290027 NXNV009G09FAt3g51960 bZIP family transcription factor BF049843 NXCI_111_F10_FAt3g56660 bZIP family transcription factor BF778575 NXSI_088_C07_FAt3g60320 bZIP protein AW042749 ST24H11At3g60320 bZIP protein BG318985 NXPV_022_C01_FAt1g52320 bZIP protein, putative BQ198053 NXLV124_H06_FAt2g31370 bZIP transcription factor (POSF21) BG833004 NXPV_084_G10_FAt2g12900 bZIP transcription factor family protein BM133964 NXLV_014_G03_FAt5g06960 bZIP transcription factor, OBF5 BM428294 NXRV_012_A06_FAt1g02110 bZIP-like protein AW697487 ST61B05
• At3g60320 e-210• At2g21230 e-190• At3g51960 e-190• At3g56660 e-185• At3g60320 e-150• At1g58110 e-134• At5g06960 e-100
• At3g60320 e-200• At1g23600 12• At3g43920 34
• At3g60320 e-1• At2g37640 15• At4g28990 28
Arabidopsis gene At3g60320
Best hit pine contig
Pine ESTs
Expresso and Chips Fordham University May 6, 2003
Elements of Array Design• Precise tracking of clones from NCSU archive to
deposition on the slide
• Spiking controls:
– Orient layout of spots
– Generate standard curves
– Normalize laser focus and intensity between channels
• Replication of deposits
• Printing by more than one pin
• Placement at different positions on the slide
Expresso and Chips Fordham University May 6, 2003
cDNA libraries at NCSUJuvenile and normal wood
96 Well Archive Plates (VT)
Addition of blanks, and spiking controls
96 Well PCR Plates
96 Well Storage Plates
Cleaning
384 Well Printing PlatesTransfer 4 to 1
Expresso and Chips Fordham University May 6, 2003
12 x 24 Subarray of deposits
1 4
13 16
Slide
Mic
roar
ray
PrintingPlates
Expresso and Chips Fordham University May 6, 2003
• Reciprocal labelings• Modified loop design
(Kerr and Churchill, 2001)
Hybridization
C3
T1
C1
T3
C2
T2
Expresso and Chips Fordham University May 6, 2003
Image Capture and Analysis• Image capture on ScanArray 5000
– Model laser and photomultiplier tube
– Model inconsistencies in slide and spot
• Image analysis– Currently using ScanArray Express
– Incorporate into Expresso
Expresso and Chips Fordham University May 6, 2003
Wolfinger Statistical Approach
• Assumption: Biological phenomena are in terms of multiplicative effects [Kerr, Churchill, 2001]
• Two Stage Analysis Method [Wolfinger, et al., 2001]– Normalization Step
• ANOVA Mixed Model as the Normalization Model• Removes the Global Effects from Array, Dye, Pin,
Treatment, etc.– Gene Treatment Interaction Estimation
• ANOVA Mixed Model as the Gene Model• Multiple Comparisons per Gene
Expresso and Chips Fordham University May 6, 2003
Analysis: The Wolfinger Model• Two-phase analysis to remove global effects and estimate
the interaction between gene and treatment
APPDATy .
G GT GA GD GS(A) .
– y is the log of intensity value of a specific spot on a specific array accounts for the overall mean of values in a specific comparison– T, A, D, P and AP are constant and represent variation in different factors
accounts for residual from the ANOVA model– G is the overall mean of the residual for each gene in a comparison and GT is the
overall mean of the residual in treatment or control– A t-test is used to test whether the GT between treated and control is different or equal
ANOVA normalization model
Gene model
Expresso and Chips Fordham University May 6, 2003
Log of
2 fold1.4 fold
Expresso and Chips Fordham University May 6, 2003
Analysis: Data mining by redescription (ILP)
• Based on a collection of 15 relational databases implemented using Postgres– Experimental conditions– cDNA details– Physiological measurements– Gene expression levels
Expresso and Chips Fordham University May 6, 2003
Inductive Logic Programming
• A more expressive way to mine patterns than attribute-based clustering
• Traditional clustering (SOMs, agglomerative etc.)
• Clusters are difficult to interpret
• Clusters may not correspond to biological knowledge
• Difficult to incorporate a priori information
• ILP
• Mines only clusters that are “describable” in terms of prior knowledge, e.g., functional categories
Expresso and Chips Fordham University May 6, 2003
How ILP is used in Expresso• Infers rules relating gene expression levels to
categories, or to other expression levels, without explicit direction
• Example Rule:
[Rule 142] [Pos cover = 69 Neg cover = 3]
level(A,moist_vs_severe,not positive) :- level(A,moist_vs_mild,positive).
• Interpretation:
“If the moist versus mild stress comparison was positive for some clone named A, it was negative or unchanged in the moist versus severe comparison for A, with a confidence of 95.8%.”
Expresso and Chips Fordham University May 6, 2003
Another example• This one relates expression level to functional categories
level(A,moist_vs_mild,positive) :-
category(A, transport_protein).
• What ILP needs as input
• Training data
• Genes placed in functional categories (can be a many-many relationship)
• Expression levels, physiological data (can be multi-dimensional)
• What ILP produces as output
• Rules “redescribing” sets of genes defined using one facet in terms of another – (it finds sets automatically!)
Expresso and Chips Fordham University May 6, 2003
How ILP works• Searches through every possible subset of genes that
can be redescribed, from one facet to another
• Uses clever pruning strategies to pick out the best redescriptions (rules)
• Evaluates promising rules in terms of
• Support: how many genes are being considered in the rule?
• Confidence: of the genes that satisfy the body, how many also satisfy the head?
• Arranges rules in terms of support, confidence, or other metric
Expresso and Chips Fordham University May 6, 2003
Current Work on Models
• Populate library of models for various stages– biophysics (PCR, hybridization)– molecular biology (sequence selection)– robotics (pipetting and transfers)– statistics (error propagation and assessment of treatment effects)– surrogate models (all stages)
• Configure suitable sequences of models– “run” or “optimize”
• Example scenarios– “perform end-to-end validation of gene expression”– “design a chip that hybridizes to cDNAs from closely related
species”– “where should I sample next for improving data mining results?”
Expresso and Chips Fordham University May 6, 2003
Why is this problem difficult?• Model-based optimization of compositional codes
– sequential refinement and optimization infeasible!– models are of various fidelities– errors compound further into the design cycle!
• Current approaches– “hand tuning” or “word of mouth” protocols– lack of understanding of functional relationships– do not harness existing biological knowledge
• Need to judiciously– configure virtual experiments to give realistic estimates– minimize cost of additional data collection– maximize information content per experiment
Expresso and Chips Fordham University May 6, 2003
An example of Expresso modeling• Capture PCR reaction dynamics
– to model gene quantification computationally– e.g., a Markov model
• Factors– reaction temperature– rate coefficients– number of reaction cycles– activation energy for nucleotide addition
• Optimize PCR model to pose– “how many RNA molecules were there in the start of the
system?”– leads to full-scale physics-based validation of microarray
experiments
Expresso and Chips Fordham University May 6, 2003
Closing-the-loop in Data Mining
Redesign probe set to clarify functional patterns– discrete optimization problem
• minimizing cross-hybridization• maximizing specificity
Expresso and Chips Fordham University May 6, 2003
Outline
• Expresso Team• Drought Stress in Plants• Microarray Technology• Expresso System• Biological Results• Networks in Biology• Future
Expresso and Chips Fordham University May 6, 2003
0
-2
-10
-15
DAYS
=
wat
er p
ote
nti
on
al (
bar
s)
Cycle
ICycle
IICycle
III
Experiment2:Cycles of
SevereDroughtStress
DRY DOW
N
DRY DOW
N
Water given
Water given
Water given
Water withheld Water
withheld
Water withheld
RE
CO
VE
RY
0
-2
-10
-15
DAYS
=
wat
er p
ote
nti
al (
bar
s)
Experiment 1:Cycles of
MildDroughtStress
DR
Y DO
WN
DR
Y DO
WN
DR
Y DO
WN
Water withheld
Water given
Water given
Water given
Water withheld
Water withheld
Water withheld
RE
CO
VE
RY
RE
CO
VE
RY
RE
CO
VE
RY
Cycle
ICycle
IICycle
III
DRY DOW
N
RE
CO
VE
RY
RE
CO
VE
RY
DR
Y DO
WN
Water given
Water withheld
RE
CO
VE
RY
Cycle
IV
= PS (photosynthesis)
= Needles harvest
Expresso and Chips Fordham University May 6, 2003
Net Photosynthesis (mol CO2 m-2 s-1)
Significant Gene Expression
Condition Cycle Control Stressed Positive Negative
Mild 1 4.28 2.48 133 94
2 3.54 3.82 213 159
3 4.75 3.28 62 90
Severe 1 3.67 0.88 145 144
2 3.00 0.19 162 156
3 2.90 0.77 135 53
Expresso and Chips Fordham University May 6, 2003
A.
B.
Positive Change in Expression
Negative Change in Expression
Expresso and Chips Fordham University May 6, 2003
Expresso and Chips Fordham University May 6, 2003
Clone ID Annotation 1 2 3 1 2 3NXCI_047_C05 DAHP synthase + - -NXCI_071_C01 3-dehydroquinate synthase + + -NXCI_117_D08 3-dehydroquinate dehydrataseNXNV_185_H02 Shikimate dehydrogenaseNXCI_034_B01 Shikimate kinase
EPSP SynthaseNXCI_163_G07 Chorismate synthase + - +NXSI_051_F10 Chorismate synthaseNXCI_016_F11 Chorismate mutase +
Prephenate aminotransferaseArogenate dehydrataseArogenate dehydrogenase
NXCI_093_H05 PAL -NXSI_118_A03 Cinnimate 4 hydroxylaseNXCI_087_F07 Cinnimate 4 hydroxylaseNXCI_045_B07 Cinnimate 4 hydroxylase + +
12 E05 Caffeoyl O methyl transferase +NXSI_055_H08 Caffeoyl O methyl transferaseNXSI_130_F05 Caffeoyl O methyl transferase02 B03 Cinnamyl alcohol dehydrogenase -NXNV_162_F07 Cinnamyl alcohol dehydrogenase -NXCI_165_H04 Cinnamoyl CoA reductase -34 F04 Cinnamoyl CoA reductaseNXNV_044_G05 Laccase -NXSI_127_C02 Laccase + -NXNV_136_F10 Laccase - + +NXCI_005_C10 Laccase -NXCI_018_F10 Pinoresinol reductase
Chalcone synthaseNXCI_098_F10 Chalcone/Flavone isomerase + +07 H08 Chalcone/Flavone isomerase + + +NXNV_127_E04 Isoflavone reductaseNXNV_127_F01 Isoflavone reductaseNXCI_002_E07 Isoflavone reductase + + -NXSI_063_D01 Naringenin-2-oxo dioxygenase + + +28 B11 Naringenin-2-oxo dioxygenase + +13 H06 Leucoanthocyanidin reductase +
Mild Severe
Flavonoids
Aromatic Amino Acid
Phenyl-propanoid
Lignin
Expresso and Chips Fordham University May 6, 2003
Expresso and Chips Fordham University May 6, 2003
Expresso and Chips Fordham University May 6, 2003
Outline
• Expresso Team• Drought Stress in Plants• Microarray Technology• Expresso System• Biological Results• Networks in Biology• Future
Expresso and Chips Fordham University May 6, 2003
Glycolytic Pathway, Citric Acid Cycle, and Related Metabolic Processes
Expresso and Chips Fordham University May 6, 2003
Carbon Metabolism
Expresso and Chips Fordham University May 6, 2003
Responses to Environmental Signals
Expresso and Chips Fordham University May 6, 2003
ROS Response
Expresso and Chips Fordham University May 6, 2003
Network of Munnik and Meijer
Expresso and Chips Fordham University May 6, 2003
Network of Shinozaki and Yamaguchi-Shinozaki
Expresso and Chips Fordham University May 6, 2003
• Partial differential equations
• Boolean networks
• Bayesian networks
• Logic programs
• Neural networks
• Petri nets
• Fuzzy cognitive maps
• Weak or none (ad hoc)
Mathematical Models for Biological Networks
Expresso and Chips Fordham University May 6, 2003
• Chemical Reaction
• Molecules: proteins (enzymes and others), DNA, RNA, organic molecules, water, etc.
• Cellular components: membranes, chromosomes, nucleus, ribosomes, etc.
• Processes: metabolism, environmental sensing
• Environmental Condition
• Time or Stage
What a Node Might Represent
Expresso and Chips Fordham University May 6, 2003
• Transformation in a Chemical Reaction: Substrate to product
• Catalytic Relationship: Enzyme to substrate or reaction
• Protein/Protein Interaction
• Signal Transduction
• Regulation of Transcription
• Regulation of Translation
• Activation and Deactivation
What an Edge Might Represent
Expresso and Chips Fordham University May 6, 2003
Outline
• Expresso Team• Drought Stress in Plants• Microarray Technology• Expresso System• Networks in Biology• Future
Expresso and Chips Fordham University May 6, 2003
Ongoing Expresso Work
• Increase model library coverage– New biophysics models of hybridization and spotting
• A heterologous chip– Pinus taeda (Loblolly Pine)– Picea abies (Norway Spruce)
• Multimodal networks– Represent and manipulate biological networks– Incorporate into Expresso and biologists’ work
Expresso and Chips Fordham University May 6, 2003
• Missing biological data is a fact of life
• As a consequence, a network can be lacking in some details, biologically wrong, or even self-contradictory
• Ability to reason computationally with uncertainty and with probabilities is essential
• Uncertainty can suggest hypotheses that can be tested experimentally to refine a network
Uncertainty in Networks
Expresso and Chips Fordham University May 6, 2003
Reconciling Networks
Expresso and Chips Fordham University May 6, 2003
• Nodes and edges have flexible semantics to represent:
- Time
- Uncertainty
- Cellular decision making; process regulation
- Cell topology and compartmentalization
- Rate constants, etc.
• Hierarchical
Multimodal Networks
Expresso and Chips Fordham University May 6, 2003
• Help biologists find new biological knowledge
• Visualize and explore
• Generating hypotheses and experiments
• Predict regulatory phenomena
• Predict responses to stress
• Incorporate into Expresso as part of closing the loop
Using Multimodal Networks
Expresso and Chips Fordham University May 6, 2003
Supported by:Next Generation Software
Information Technology Research
NSF