towards modeling the gene regulation problem from static...
TRANSCRIPT
SCI2S Group
Departamento de Ciencias de la Computación e
Inteligencia Artificial
Escuela Técnica Superior de Ingeniería Informática
Universidad de Granada, Granada, Spain
Towards Modeling the Gene Regulation Problem
From static to dynamic … from bacterial
to human models
Groisman LabMicrobiology Department
Howard Hughes Medical Institute
Washington University School of
Medicine
St. Louis, Missouri USA
Cobb LabCellular Injury and Adaptation
Laboratory
Washington University in St.
Louis
St. Louis, Missouri USA
Computational Biology Group
Departamento de Ciencias de la
Computación
Facultad de Ciencias Exactas y Naturales
Universidad de Buenos Aires, Buenos
Aires, Argentina
AN EXPERIMENTAL-ORIENTED APPROACH TO
BIOINFORMATICS
Genotype
PhenotypeSydney Brenner
Science 287: 2173
“The analysis of genome sequences gives us a comprehensive
protein parts list…But there is one important piece of information
that is almost totally missing: the sequence information that
specifies when and where and for how long a gene is turned on or
off.”
SALMONELLA: A GRAM-NEGATIVE PATHOGEN
WITH A VARIED LIFESTYLE
Signal
Effectors
Regulator
Sensor
Response
SIGNAL TRANSDUCTION CASCADE
BY TWO-COMPONENT REGULATORY SYSTEMS
mgtB
Mg2+ transport
mgtA
Mg2+ transport
PhoQ��
PhoP
low Mg2+
-PO3
THE PHOP/PHOQ REGULATORY SYSTEM
IS ACTIVATED IN LOW Mg2+ ENVIRONMENTS
mgtB
Mg2+ transport
mgtA
Mg2+ transport
PhoQ��
PhoP
low Mg2+
-PO3
LOW Mg2+ INDUCES PMRA-ACTIVATED GENES
VIA THE PHOP/PHOQ-ACTIVATED PMRD PROTEIN
pmrDPmrD
low Mg2+
PhoQ��
PhoP -PO3
PmrB��
-PO3PmrA
pbgP
LPS modification
Fe3+ REPRESSES pmrD EXPRESSION
VIA THE PMRA/PMRB SYSTEM
high Fe3+
pmrDPmrD
low Mg2+
PhoQ��
PhoP -PO3
PmrB��
-PO3PmrA
pbgP
LPS modification
-10
mig-14
-10-35 +1PhoP-box
ATTGTTTATTTACTGTTTAGCGCGCGCGCTTGACAATTTTTATAAT
TRANSCRIPTIONAL REGULATION:
RECRUITING AND COOPERATING
THE PMRD PROTEINS OF SALMONELLA AND E. COLI
EXHIBIT UNUSUALLY LOW AMINO ACID IDENTITY
PmrD
PhoQ��
PhoP -PO3
PmrB��
PmrA -PO3
85.4% 93.3% 84.0% 90.5%55.3%
(the median amino acid identity between Salmonella and E. coli proteins is 90%)
-10+1
-35RcsB-box
PmrA-box
-10+1
-35RcsB-box
PhoP-box
PmrA-box
Salmonella
E. coli
THE ugd PROMOTER OF SALMONELLA,
BUT NOT OF E. COLI, HARBORS A PHOP BOX
THE E. COLI PMRA/PMRB SYSTEM
RESPONDS TO Fe3+ BUT NOT TO LOW Mg2+
high Fe3+
pmrDPmrD
low Mg2+
PhoQ��
PhoP -PO3
PmrB��
-PO3PmrA
pbgP
LPS modification
THE SALMONELLA PMRA/PMRB SYSTEM
RESPONDS TO Fe3+ AND LOW Mg2+
high Fe3+
pmrDPmrD
low Mg2+
PhoQ��
PhoP -PO3
PmrB��
-PO3PmrA
pbgP
LPS modification
-10+1
PhoP-boxPmrA-box
Salmonella
-10+1
PhoP-box
E. coli
THE pmrD PROMOTER OF SALMONELLA,
BUT NOT OF E. COLI, HARBORS A PMRA BOX
phoPphoQ
mgtA
Mg2+ transport
PhoQ
PhoP
PmrB
PmrA
SEQUENTIAL ACTIVATION OF THE PHOP REGULON
PmrD
mig-14 pmrD
pbgP
mgtCB pagP
pmrC pmrA pmrB
LPS modificationvirulence
PmrAPhoP
DNA
AT DIFFERENT TIMES,ADD CROSSLINKER,
LYSE CELLS, ANDFRAGMENT DNA
IMMUNOPRECIPITATION
ANTI-PhoP ANTI-PmrA
CROSS-LINKING REVERSAL AND DNA
PURIFICATION
CHROMATIN IMMUNOPRECIPITATION (ChIP) STUDY
OF PHOP- AND PMRA-REGULATED PROMOTERS
high Mg2+ low Mg2+
PCR SPECIFIC FORPhoP- OR PmrA-
REGULATED PROMOTERS
HMg2+
phoP
mgtA
pmrD
mgtC
pagP
pmrC
pbgP
promoters
IPtotal
IPtotal
IPtotal
IPtotal
IPtotal
IPtotal
IPtotal
0 5 10 20 30 60 90-time (min)L
ORDERED BINDING OF THE PHOP AND PMRA
PROTEINS TO THEIR TARGET PROMOTERS
-5 0 5 10 20 30 60 90
mgtC
pcgL
pagC
mig-14
pagP
pmrD
slyB
phoP
mgtA
rstA
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
TEMPORAL ORDER OF BINDING
ANATOMY OF BACTERIAL PROMOTERS
-35 -10+1
Class I promoter
-35 -10+1Class II promoter
-10+1
THE PHOP PROTEIN CONTROLS TRANSCRIPTION
FROM DIFFERENT CLASSES OF PROMOTERS
mgtAPhoP-box
-10+1
PhoP-boxmgtC
-35
-10+1
PhoP-boxPmrA-boxpmrD
ugd-10
+1-35RcsB-
boxPhoP-box
PmrA-box
. Quality of the binding site for the transcription factor
. Distance and orientation of the binding site relative to the
RNA polymerase binding site
. Quality of the binding site for RNA polymerase
. Presence of binding sites for other regulatory proteins
FACTORS DETERMINING TRANSCRIPTION
FROM A BACTERIAL PROMOTER
THE PHOP/PHOQ SYSTEM REGULATES EXPRESSION
OF ACID pH RESISTANCE GENES IN E. COLI
yhiWYhiW
low Mg2+
PhoQ��
PhoP -PO3
gadA
yhiE YhiE
hdeA
Gene Promoter Scan (GPS)
•Identifying regulatory profiles that summarize the behavior of the PhoP regulon
•Understanding the subjacent biological properties of the regulatory profiles
•Uncovering interactions with other regulatory systems and proteins
OUR ULTIMATE GOAL IS TO DEFINE THE
MOLECULAR LOGIC OF PHOP REGULATION BY …
REGULATORY FEATURES CONSIDERED INCLUDE …
TFS location & length
Genechip
PhoP-boxGene strand
NOT
PhoP-boxTFS-boxes
High down- regulated
Upnregulated
Low down-regulated
Direct repeat PhoP-motif
First-repeat damaged motif
Second-repeat damaged motif
etc…
Binding sites-learning methods
D
Clustering/Proto-typing methods
E. coli
Salmonella
Others
d
bc
a
etc…
PhoP-boxSigma 70
D
Expression
&
Expression
Sigma 70 sequences
e
Expression
0 20 40 60 80 100 120 140 160 180 2000
5
10
15
20
25
f
g
-200 -150 -100 -50 0 50 1000
10
20
30
40
50
60
70
80
90
-100 -90 -80 -70 -60 -50 -40 -30 -200
5
10
15
20
25
30
35
40
Promoter-learning methods
PhoP-box
Expression&
PhoP-boxPhoP-box
Sigma 70
D &
Orientation&TAtaaT
TTGaca
+1
DD/ORPhoP-box
Sigma 70D
wt1 wt2 wt3 wt4 PhoP-1 PhoP-2 PhoP-3 PhoP-4 0
0.5
1
1.5
2
2.5
3x 10
4
0.5
1
1.5
2
2.5
3x 10 4
wt1 wt2 wt3 wt4 PhoP-1 PhoP-2 PhoP-3 PhoP-4 0 0
0.5
1
1.5
2
2.5x 10
4
wt1 wt2 wt3 wt4 PhoP-1 PhoP-2 PhoP-3 PhoP-4
0.83 0.6 0.0
0.5
1
1.5
2
2.5
3x 10 4
wt1 wt2 wt3 wt4 PhoP-1 PhoP-2 PhoP-3 PhoP-4 0
Similarity to Average (rank)0 200 400 600 800 1000
0.2
0.4
0.6
0.8
1
1.2
1.4
Similarity to Average (rank)0 200 400 600 800 1000
0.2
0.4
0.6
0.8
1
1.2
1.4
Similarity to Average (rank)0 200 400 600 800 1000
0.2
0.4
0.6
0.8
1
GENE EXPRESSION FEATURES
T TGTTGATAATT TGTTTATM4
AT
ATGGTTTAC
TAATAAG
TGTTTAAM3
A TGTTTATGATTTGTCGTTTAAM2
T TGTTTACGTTC TATTGTATM1
TGTTTA_ _ _ _ _ TGTTTA
#5 TTGTTTATGATTTGTTTAA | 0 6 3 1 0 21 25 25 25 25 25 5 4 1 1 6 21C | 3 0 0 3 1 3 25 25 25 25 25 1 0 2 0 0 2G | 6 19 2 0 4 0 25 25 25 25 25 9 19 0 4 4 2T | 16 0 20 21 20 1 25 25 25 25 25 10 2 22 20 15 0
BINDING SITES SUBMOTIFS
mig-14-10PhoP-box -10-35 +1-10-35 +1-10-35 +1
PhoP-box
mig-14
-10-35 +1+1PhoP-box
PROMOTER FEATURES: RNA POLYMERASE, CLASS
AND LOCATION
1.0
0.8
0.5
0.0
-200 -150 -100 -50 0 50 100
Distance from binding sites to
+1
-100 -90 -80 -70 -60 -50 -40 -30 -200
5
10
15
20
25
30
35
40
Distance from binding sites to
+1
-200 -150 -100 -50 0 50 1000
10
20
30
40
50
60
70
80
90
Distance from binding sites to
+1
PROMOTER CONDITION: ACTIVATED/REPRESSED
AND DISTANCE RELATIONSHIPS
-100 -90 -80 -70 -60 -50 -40 -30
-200
5
10
15
20
25
30
35
40
- 100 - 90 - 80 - 70 - 60 - 50 - 40 - 30 - 20
0
-100 -90 -80 -70 -60 -50 -40 -30 -20
0
5
10
15
20
25
30
35
40
0.83 0.6 0.0
1.0
0.8
0.5
distance to the binding site
+1Binding-site -10-35 +1
-100 -90 -80 -70 -60 -50 -40 -30
- 100 - 90 - 80 - 70 - 60 - 50 - 40 - 30 - 20
-100 -90 -80 -70 -60 -50 -40 -30
0
5
10
15
20
25
30
35
40
0
5
10
15
20
25
30
35
40
CLOSE, MEDIUM AND REMOTE PROMOTERS
RNAPalpha
CRP
RNAPalpha
CRP
RNAPalpha
CRP
I II
RNAPalpha
CRP
III
BINDING SITE ORIENTATION AND POSSIBLE
TOPOLOGIES
mig-14-10PhoP-box -10-35 +1Fis-box MalT-box
Distances among transcription factor binding sites
0 20 40 60
80 100 120 140 160 180 2000
5
10
15
20
25
TRANSCRIPTION FACTOR INTERACTIONS
Gene X
Y-binding siteClass II Promoter
Class I Promoter
Gene Z
Y-binding site
Class II Promoter
W-binding site
high
expression
remote
close
close
medium distance
both tandem motifs
first tandem motif
both tandem motifs
A CONCEPTUAL CLUSTERING APROACH FOR
DISCOVERY OF REGULATORY PROFILES
ADVANTAGES OF CONCEPTUAL CLUSTERING
(Michalsky, Chesseman, Cook, Ruspini, etc.)
•Hypothesis generation – data exploration – model fitting – finding true topologies
•Deal with structural data
•Iterative and incremental search of interesting repetitive substructures
•Attributes and relations belong to more than one substructure
•Attributes and relations are flexible
•Ability to deal with missing values
•DISCOVER DESCRIBE -> PREDICT
direccionclose... clo... me... me... rem... remot... exp1-phop, ... exp2-gad...exp3-pyr, S c...
TGGTATTCACGAAAAGTTTATGTgttTAGTGACGTTCTGTTTAagtACATAGTTAGGCGCTGTTTAACTACATAGTTAGGCGCTGTTTAACTACATATTGCTCCACTGTTTATATcgcTATTGCCGTTTTGTTTAtccAGATATATAACGTCGGTTTATAATGTTATTTTTAGCCGGTTTAAATCACTATTGATGTTTGGTTAAGATctgTATTGACGATTGGTTAAtgtTGATATTTCGTTGAAGTTAATGATGCTATTTACAAGCTGATAACAActtTATTGAGGTTGTATTGAtaacttTATTGAGGTTGTATTGAtaaTGCCGTTGATAAAGAGTTTATCTcgcCATTGATAAACTGTTTAacatttCGTTTAGGTTTTGTTTAagtCAGCGTATAGCTTATGTTTATAACAGCGTATAGCTTATGTTTATAAtacGGTTTACTTTCTGGTTAaaaTCTGGTTTATCGTTGGTTTAGTTtctGGTTTATCGTTGGTTTAattGCTGGTTTATTTAATGTTTACCCtctGGTTTATTAACTGTTTAtccATTTGTTTAAGGAATGATTAATTCAGTTTTTAAAAACTGTTTAAAGCATTGTTTAGGTTTTGTTTAAGTCACTGTTTATATTTTGTTTAGTACATTGTTTAGGGTTTGTTTAATTtatTATTTACGGTGTGTTTAaactatTATTTACGGTGTGTTTAaacctaCAGTTACTCCTGGTTTAagttctCGTTTAGAAAAGATTTAtggcttCGTTTAAGATTGGTTAAttacttCGTTTAAGATTGGTTAAttaCATTAATTATGCAAAATTTATGGACCTGATGAAAACTTGTTTAGAAtcaTGATTATAGATTGCTTAttataaTGTTTCCTTATATTTTAaattcaTGTTTAAACACGCTTTAtttgttTGTTTAGATACGGTTTActtaaaTGTTTAAGCCCGGTTTAataaaaTGTTTAGCTTGTATTTAatgctgTGTTTAGAGAGAATTTAcatTTCTGTATATGTCATGTTGATGGTTCTGTATATGTCATGTTGATGGTTCTGTATATGTCATGTTGATGGctcTGTTTATAGTTTGTTAAgatctcTGTTTATAGTTTGTTAAgatcctCGTTTAACAATGGTTGAggaTTTTGTTTATAATTGGTTGATCCttcTGTTTAAGTTTGTTTGAtatACTTGTTTAGAAACGATTGATAGACTTGTTTAGAAACGATTGATAGGCTGGTTGAGCATTTGTTGAAAAggcGGTTGAGGGTTCGTTGAaaaggcGGTTGAGGGTTCGTTGAaaaTTTTGTTGATATTTCGTTGAAGTTTTTGTTGATATTTCGTTGAAGTcgaTGTTGTTAAACAGTTTAtcaATTTGTTGTTTCATTGTTAAAAAATTTGTTGTTTCATTGTTAAAAAtcgGGTTTATATGTTTTTAAcatcacGGTTGAGCAACTATTTActt
Ara...C... D... Fr... IH... M... Ox...YgiX-... a ctiva ted_repre ssed (clus ter)
P1
P2
P3
P4
E1
E2
E3
I1
I2
I3
I4
I5
yqjEyhdGygiW yaiNpyrIgadB gadAb3100dpsybjXslyBybjXmgtCompXpipDpagCpmrDyrbLhdeDyeaFnmpCybjXrstAyiaGpagPybjXybcUybjXyhiWphoPb2833pagChilApagDpagPmgtApgtEhdeDompXompXyrbLrstAhdeAslyBslyBmgtAudgb1826pmrDrstAphoPtrs5_8rstAybjXnmpCyobGpipDompTompTugtLmgtCybjXvirKvirKmig-14mgtCmgtCyeaFslyBb1825yiaGyhiWyhiE
pipD yhiWmgtC pagC mgtC mgtC virK virK pipD pagP pagD pagP mig-14 ugtL mgtC pagC yhiW phoP phoP yrbL rstA rstA rstA rstA yrbL ybjX ybjX ybjX slyB slyB pgtE mgtA mgtA yqjE ybjX ybjX ybjX ybjX b1826 b2833 slyB ompX ompX ompX yobG ompT ybcU slyB b3100 ompT hilA pmrD b1825 hdeD yiaG yhiE dps pmrD gadA yiaG gadB hdeD hdeA ygiW udg trs5_8 yhdG yaiN pyrI yeaF nmpC nmpC yeaF
yqjE yhdG ygiW yaiN pyrI gadB gadA b3100 dps slyB mgtC b1825 b1826 ompT pmrD b2833 nmpC ybjX ybjX ybjX yhiE pipD pipD pmrD virK yrbL yhiW yhiW ybjX mgtA mgtA phoP phoP yeaF trs5_8 yrbL ompT ybcU pagC pagC yobG rstA slyB slyB yeaF rstA pgtE mgtC mgtC ybjX udg mig-14 pagD hdeD hdeA hdeD pagP pagP nmpC slyB mgtC rstA rstA ompX ompX ompX ybjXybjX virK yiaG yiaG hilA ugtL
yqjE rstA rstA rstAb3100 pyrI gadB gadA ompX ompX ompX ygiW dps yhdG b1825 yaiN b1826 pagC pagC virK virK mig-14 yiaG yiaG rstA b2833 slyB slyB yhiE mgtC mgtC mgtC mgtC ybjX ybjX ybjX pgtE mgtA nmpC yhiW yhiW pipD pipD pmrD pagD ybjX ybjX ybjX ybjX slyB slyB ompT ompT ybcU hilA ugtL nmpC phoP phoP yobG pagP pagP hdeD hdeD hdeA mgtA trs5_8 yeaF yeaF yrbL udg pmrD
pyrIb3100gadAyhdGyaiNdpsygiWgadByqjEyeaFnmpChilAyhiWmgtCpagCyeaFpipDnmpCudgybjXybjXybcUmig-14ybjXybjXybjXybjXyhiWvirKvirKmgtCyhiEphoPpagPmgtAybjXpmrDyrbLyobGyiaGyiaGpgtEmgtCpagDmgtCslyBslyBphoPompXompTslyBmgtAyrbLrstApipDrstApmrDpagCompXrstApagPhdeDslyBb1826hdeArstAb2833hdeDompTb1825ompXugtLtrs5_8
M1
M2
M3
M4
M5
Expression Binding-site motifs Interactions Orientation Activated/Repressed
A1
A2
A3
A4
yqjEyhdGygiWyaiNpyrIgadBgadAb3100dpspagCpipDpagCugtLpipDhilAyhiWmgtCmgtCyhiWvirKmig-14yeaFybjXybjXybjXybjXybjXtrs5_8slyBompThdeAompTyrbLpagPyrbLmgtCyobGyiaGyiaGudgpgtEpagDvirKmgtCyhiEphoPpagPmgtApmrDyeaFybjXybjXybcUslyBslyBslyBrstArstArstArstAphoPompXompXompXnmpCnmpCmgtAhdeDhdeDpmrDb2833b1825
O1
O2
O3
THE BUILDING BLOCK PROFILES OF PHOP-
REGULATED GENES
Sigma 70 Promoters
THE COMBINED PROFILES OF PHOP-REGULATED
GENES
Learn features of profiles
Combine profiles
Predict and adapt profiles
Group Prototype Search
Group Prototype Search
Group Prototype Search
Evaluating profiles
Gene Promoter Scan: ALGORITHM
Small number of clusters
Large coverage
Good generality
Minimal overlapBetween clusters
More distinct clusters
Better defined concepts
Big cluster descriptions
More features
More inferential power
AN EVALUATION OF THE GENERATED COMBINED
PROFILES
Class I Promoter
medium medium
close
Class I Promoter
close
Class II Promoter
remote
Class I PromoterClass II Promoter
Class II Promoter
mgtC
Class II Promoterremote
close
Class I Promoterclose
Class II Promoter
remote
Class I PromoterClass II Promoter
Class I Promoter
remote
Oposite
mig-14
PhoP-binding site
PhoP-binding site
Oposite
f1 f2 f3 f4
f1,f2 f3,f4
f1,f2,f3 f1,f2,f4 f1,f3,f4 f2,f3,f4
f1,f3,f3,f4
PI=0.4SI=0.2
PI=0.2SI=0.4
PI=0.1SI=0.1
GPS: A CONCEPTUAL CLUSTERING METHOD
P0 P1 P2 P3
O0
O1
O2
promoters (cluster)
0.0002
47E-12
0.97 0.19
0.66 0.97
0.007
mig-14 pagC yeaF slyB yhiW virK ybjX mgtC
P0-O0
b3100 pyrIdps yaiNgadA ygiWgadB yhdGyqjE
P1-O2
b2833
ybjX
mgtA
phoP
ybcU
yeaF
yrbL
rstA
hdeD
ompX
rstA
slyB
yiaG
P1-O1
ybjX ybjXyhiW pipDhdeA pagCybjX pagChilA
P3-O1
slyB
yhiW
yeaF
mgtC
ybjX
mig-14
pagC
virK
mgtC
P3- O2
b1825yhiE
yhiWyiaGvirKmgtC
promoters (cluster)
orientation (cluster)
P2-O1
ompT
ompT
trs5_8
pipD
ybjX
ugtL
P2-O2
b1826
nmpC
pmrD
rstA
rstA
phoP
yobG
ybjX
pmrD
slyB
mgtA
yrbL
pagD
pgtE
udg
mgtC
nmpC
ompX
pagP
O2
O1
O0
P3
P2
P1
P0
9.04E-11 … 0.9729
Color by probability of interaction
0 … 29
Size by records count Line with by count
GPS EVALUATION: PROBABILITY OF
INTERSECTION
0.35 0.49 0.95 0.85 0.75 0.69
remoteclass I
close class II
close medium medium class I class II class I
remoteclass II
RNA Polymerase Features
00.050.10.150.20.250.30.350.40.450.50.550.60.650.70.750.80.850.90.95
remoteclass I
close class II
close medium medium class I class II class I
remoteclass II
slyB
ybjX
yeaF
mig-14
virK
yhiW
mgtC
mgtC
pagC
0
0.050.10.150.20.250.30.350.40.450.50.550.60.650.70.750.80.850.90.95
1
GPS EVALUATION: SIMILARITY OF
INTERSECTION
•Class II
•PhoP-binding sites consensus TGTTTANNNNNTGTTTA
•Direct binding motif repeat
THE PUBLISHED PHOP-REGULATED PROMOTER
(Yamamoto et al., 2002; Minagawa et al., 2003)
OUR ANALYSIS RECOVERED THE TYPICAL PHOP-
REGULATED PROMOTER
High Expression
High ExpressionClass II Promoter
mgtA Directorientation
good close
PhoP-binding site
phoP
Directboth
repeats
Class II Promoter
PhoP-binding siteboth repeats
PhoP-boxphoP
mgtA
+1
-10
-10
+1PhoP-box
mgtC
mig-14-10
-10 +1PhoP-box
OmpR-box PhoP-box -10-10-10 -35 +1+1+1
PhoP-box -10-10-35 +1+1Fis-box
pagC-10 +1PhoP-boxPhoP-box -10-10-35 +1+1OmpR-LPR-
IHF
OUR ANALYSIS DISCOVERED ATIPYCAL PHOP-
REGULATED PROMOTERS
Class II Promoter
orientationvery goodmedium
PhoP-binding site
Opposite
Class II Promotervery goodmedium
bad close
Class I Promoter
bad close
Class II Promoter
Class II Promotergoodremote
Class I Promoter
goodremote
phoPphoQ
mgtA
Mg2+ transport
PhoQ
PhoP
PmrB
PmrA
SEQUENTIAL ACTIVATION OF THE PHOP REGULON
pmrC pmrA pmrB
pbgP
LPS modificationvirulence
mgtCB pagP
PmrD
mig-14 pmrD
PROMOTER –EXPRESSION –ORIENTATION
PROFILES
E1
E1
P3
O1
Close class I
Medium class II
Mediumclass I
Remoteclass II
Remote.class I
Close class II
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
E1 E2 E30
0.10.2
0.3
0.40.50.6
0.7
0.80.9
1
mgtCybjXmig-14slyBpagCyhiWvirK
PhoP-binding site
T TGTTTA TAATT TGTTGA T
THE MINIMAL PROFILES THAT DESCRIBES THE
SEQUENTIAL ACTIVATION OF THE PHOP REGULON
A GT
CGTTTA TGATTT TGTTTA A
mgtA
phoP
T TATTGTA CGTTCTGTTTA T
pmrD
A TGTTTA AGAA
TACT ATGGTTTA AT
mig-14mgtCB pagP
Class II Promoter
mig-14
Class II Promoter
Oposite
Class II Promotergood first repeat
Class I Promoter
good remote
orientation goodmedium
pagC
Direct
LRP-binding sitePhoP-binding site
how affinity
high affinity
close
close
OmpR-binding site
IHF-binding site
low affinity
OpositeClass I Promoter
distance
good both repeats
orientationhigh affinity
Opposite
Class II Promoter
distance
Repressed
close
PhoP-binding site
high affinity
Activated
Activated
Activated
distance
distance
goodmedium
orientation
Oposite
orientation
goodmedium
goodmedium
poorclose
Activated
good first repeat
high affinity
OmpR-binding site close
Class II Promoter
Class I Promoter
Activated
distance
distancePhoP-binding site
Activated
excellentmedium
goodmedium
good remote
distance
distance
distancedistance
PhoP-binding site
orientation
low affinityhigh affinity
mgtCclose
distanceActivatedActivated
distance
Activated
Class I Promoter
Class I Promoter
orientation
high affinity
PhoP-binding sitedistance
Repressed
Class II Promotergoodclose
distance
close
good first repeat
Fis-binding site
THE MAXIMAL PROFILES THAT DEFINES THE
SEQUENTIAL ACTIVATION OF PHOP
Imperfect second PhoP-box tandem
Opposite orientation
Medium distance promoters
Activated
THE PHOP/PHOQ SYSTEM REGULATES EXPRESSION
OF ACID pH RESISTANCE GENES IN E. COLI
yhiWYhiW
low Mg2+
PhoQ��
PhoP -PO3
gadA
yhiE YhiE
hdeA
EXPRESSION AND EXPRESSION-PROMOTER
PROFILES
Expression E1 E2 E3
E1 E2) E300.10.20.30.40.50.60.70.80.91
E1
E2
E3
00.10.20.30.40.50.60.70.80.9
1
00.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Close class I
Medium class II
Mediumclass I
Remoteclass II
Remote.class I
Close class II
b1825yhiEyhiWyiaG
dpsgadAgadBygiW
PromoterP0 P1 P2 P3
E1
E2
E3
Expression-Promoter
E1 E2 E3
00.10.20.30.40.50.60.70.80.9
1
Promoter
Motif
Interactions
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
AraC
.
ArcA ArgR CRPCytR
DeoR
DnaA FIS FN
RFru
R Fur
GlpR IHF Lrp MalT MelR NarLOmpR OxyR RcsU Rha
S
PmrA-YgiX YhiW
-0.050
0.050.1
0.150.2
0.250.3
0.350.4
0.45
AraC
. ArcA ArgR CRP
CytR DeoR DnaA FIS FNRFru
R Fur GlpR IHF Lrp MalT MelR NarLOmpR OxyR RcsU RhaS
PmrA-YgiX YhiW
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
AraC
.
ArcA ArgR CRPCytR
DeoR
DnaA FIS
FNR
FruR Fu
rGlpR IH
F Lrp MalTMelR
NarLOmpR
OxyR
RcsU
RhaS
PmrA-Y
giX YhiW
0
0.10.2
0.30.4
0.5
0.6
0.7
0.8
Close class I
Medium class II
Mediumclass I
Remoteclass II
Remote.class I
Close class II
00.10.20.30.40.50.60.70.8
Close class I
Medium class II
Mediumclass I
Remoteclass II
Remote.class I
Close class II
Close class I
Medium class II
Mediumclass I
Remoteclass II
Remote.class I
Close class II
0
0.1
0.2
0.3
0.4
0.50.6
0.7
0.8
hilA hdeAnmpC hdeBpagP hdeDybjX
slyB phoPybcUyhiWmgtA
slyB yhiE
MOTIFS-PROMOTER-INTERACTION PROFILE
THE PROFILE THAT DEFINES THE PHOP/PHOQ
REGULATION OF ACID pH RESISTANCE GENES
Class II Promoter
very goodmedium
PhoP-binding site Class II Promotervery goodmedium
bad close
Class I Promoter
bad close
Class II Promoter
Class II Promotergoodremote
Class I Promoter
goodremotegood
closeClass II Promoter
Class I Promoter
Class II Promoter
Class I Promoter
Class II Promoter
bad medium
bad medium
bad medium
bad medium
gene
Low Expression
YhiE
YhiW
PhoP -PO3
yhiE
yhiW
YhiE
YhiW
gadA
yhiE
yhiW
hdeA
YhiE
YhiW
PhoP -PO3
yhiE
yhiW
ADVANTAGES OF CONCEPTUAL CLUSTERING
(Michalsky, Chesseman, Cook, Ruspini, etc.)
•Hypothesis generation – data exploration – model fitting – finding true topologies
•Deal with structural data
•Iterative and incremental search of interesting repetitive substructures
•Attributes and relations belong to more than one substructure
•Attributes and relations are flexible
•Ability to deal with missing values
•DISCOVER DESCRIBE -> PREDICT
Conceptual Clustering: Substructure Discovery
Input Problem Input Database
R1
C1
T1
S1
T2
S2
T3 T4
S3 S4
Compressed Database
S1 C1
R1
S1 S1S1
Recognize repetitive relations among attributes that describe
objects in structural databases (substructure discovery)
Model fittingHypothesis generation
and testingFinding true topologies
Data exploration Group predictions Data reduction
Substructure
object
object
triangle
squareshape
shape
on
Substructure Discovery and Evaluation Strategy by MOGA CC
Root
MOGA CC: Local and Global Evaluation
TTTTTGTCATAAGAAAAAATATCAAACAAACTT
O2
O1
Hybrid Promoter Analysis Methodology (HPAM)
Time Delayed Neural Network (TDDN)+
Multi-Objective Scatter Search genetic algorithm (MOSS)
T TGTTGATAATT TGTTTATM4
AT
ATGGTTTAC
TAATAAG
TGTTTAAM3
A TGTTTATGATTTGTCGTTTAAM2
T TGTTTACGTTC TATTGTATM1
TGTTTA_ _ _ _ _ TGTTTA
#5 TTGTTTATGATTTGTTTAA | 0 6 3 1 0 21 25 25 25 25 25 5 4 1 1 6 21C | 3 0 0 3 1 3 25 25 25 25 25 1 0 2 0 0 2G | 6 19 2 0 4 0 25 25 25 25 25 9 19 0 4 4 2T | 16 0 20 21 20 1 25 25 25 25 25 10 2 22 20 15 0
BINDING SITES SUBMOTIFS
-10-10-35 +1
PROMOTER FEATURES: RNA POLYMERASE, CLASS
AND LOCATION
capa de entrada capas ocultas
capa de salida
“pesoscorrespondientes”(valen lo mismo)
neuronas “time shifted”(corridas en el tiempo)
reconocen la misma subsecuencia de la
secuencia de entrada.
TDNN: A WEIGHT SHARING NEURAL NETWORK
MOSS: A MULTIOBJECTIVE GENETIC FUZZY SYSTEM
Gene Expression Networks Iterative Explorer (GENIE)
Promoter binding dynamics reveals architecture
of small bacterial regulatory networks
high Fe3+
pmrDPmrD
low Mg2+
PhoQ��
PhoP -PO3
PmrB��
-PO3PmrA
pbgP
LPS modification
-10PhoP-boxPmrA-box
-10PhoP-box
-10PmrA-box
-10PhoP-boxphoP
mgtA
pmrD
pmrA
GENERATING GENETIC CIRCUITS BY GPS
NAVIGATION
PmrB
P
PmrAPmrA
pbgP
PhoQ
P
PhoPPhoP
low Mg2+ high Fe3+
pmrDPmrD
THE USE OF CONCEPTUAL CLUSTERING FINDINGS
AS A HYPOTESIS GENERATOR
GENE PROMOTER ACTIVITY AS BUILDING BLOCK
INTERACTIONS
(Bower and Bolouri, 2001)
KINETIC MODEL OF TWO PROTEINS P AND Q
BINDING TO DNA
HMg2+
phoP
mgtA
pmrD
mgtC
pagP
pmrC
pbgP
promoters
IPtotal
IPtotal
IPtotal
IPtotal
IPtotal
IPtotal
IPtotal
0 5 10 20 30 60 90-time (min)L
ORDERED BINDING OF THE PHOP AND PMRA
PROTEINS TO THEIR TARGET PROMOTERS
SEQUENTIAL ACTIVATION OF THE PHOP REGULON
phoPphoQ
mgtA
Mg2+ transport
PhoQ
PhoP
PmrB
PmrA
pmrC pmrA pmrB
pbgP
LPS modificationvirulence
mgtCB pagP
PmrD
mig-14 pmrD
0
0.2
0.4
0.6
0.8
1.0
1.2
-10 10 30 50 70 90Time (min)
Rela
tive
IP
phoP
-5 0 5 10 20 30 60 90 -5 0 5 10 20 30 60 90
mgtA rstA
0
0.5
1.0
1.5
2.0
2.5
-5 0 5 10 20 30 60 90 (min) - IP
- input
0
0.2
0.4
0.6
0.8
1.0
1.2
-10 10 30 50 70 90Time (min)
-10 10 30 50 70 90Time (min)
pmrD
-5 0 5 10 20 30 60 90
0
0.2
0.4
0.6
0.8
1.0
1.2
Rela
tive
IP
-10 10 30 50 70 90Time (min)
slyB
-5 0 5 10 20 30 60 90
mig-14
-5 0 5 10 20 30 60 90 (min) - IP
- input
0
0.2
0.4
0.6
0.8
1.0
1.2
-10 10 30 50 70 90Time (min)
0
0.5
1.0
1.5
2.0
2.5
-10 10 30 50 70 90Time (min)
-5 0 5 10 20 30 60 90
mgtC pagP
-5 0 5 10 20 30 60 90
pagC
-5 0 5 10 20 30 60 90 (min)
- IP
- input
0
0.2
0.4
0.6
0.8
1.0
1.2
Rela
tive
IP
-10 10 30 50 70 90Time (min)
0
0.2
0.4
0.6
0.8
1.0
1.2
-10 10 30 50 70 90Time (min)
0
0.2
0.4
0.6
0.8
1.0
1.2
-10 10 30 50 70 90Time (min)
pcgL
-5 0 5 10 20 30 60 90
0
0.2
0.4
0.6
0.8
1.0
1.2
-10 10 30 50 70 90Time (min)
Rela
tive
IP
ORDERED PHOP BINDING AND TRANSIENT
OCCUPANCY OF PROMOTERS
0
0.5
1
1.5
2
2.5 rst
AmgtAphoPslyBpmrDpagPmig-14pagCpcgLmgtC
-5 0 5 10 20 30 60 90
Time
-5 0 5 10 20 30 60 90
mgtC
pcgL
pagC
mig-14
pagP
pmrD
slyB
phoP
mgtA
rstA
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
TEMPORAL ORDER OF BINDING
ij
ijiij ktA
ktAtX
)(1)(
)(
PROMOTER ACTIVITY AND PREDICTIONS FROM
MECHAELIS-MENTEN KINETIC MODEL
rate of the unrepressed promoter ithe effective affinity of the activator (concentration at half maximal activation) ikthe effective activation concentration in experiment j for all promoters )(tAj
0 50
0.5
1
phoP0 5
0
0.5
1
1.5
mgtA0 5
0
1
2
3
rstA
0 50
0.5
1
pmrD
0 50
0.5
1
slyB
0 50
0.2
0.4
mgtC
0 50
0.5
1
pagP
0 50
0.2
0.4
pagC
0 50
0.1
0.2
pcgL
0 50
0.5
1
1.5
mig-14
TEMPORAL ORDER OF BINDING WITH LEARNED
ACTIVATION CONCENTRATION
Promoters
rstA phoP pmrD mig-14 pcgL mgtA slyB pagP pagC mgtC
0
0.5
1
1.5
2
2.5
EFECTIVE PROMOTER RATE: ORDERED RESULTS
0
5
10
15
20
25
Time
-5 0 5 10 20 30 60 90
0
5
10
15
20
25
-5 0 5 10 20 30 60 90
EFFECTS OF THE ACTIVATION CONCENTRATION
0
0.5
1
1.5
2
2.5
Promoters
rstA phoP pmrD mig-14 pcgL mgtA slyB pagP pagC mgtC
0 50
0.5
1
phoP0 5
0
0.5
1
1.5
mgtA0 5
0
1
2
3
rstA
0 50
0.5
1
pmrD0 5
0
0.5
1
slyB
0 50
0.5
1
1.5
mig-14
0 50
0.2
0.4
mgtC
0 50
0.5
1
pagP
0 50
0.2
0.4
pagC0 5
0
0.1
0.2
pcgL
TEMPORAL ORDER OF BINDING WITH EXPERT-BASED
CONCENTRATION
pagK0 5
0
0.5
1
PhoP-box
PhoP-box -10-35
PmrA-box -10
+1
pmrD
mgtC
PhoP-box -10 mgtAtreR
+1
+1
-10 -35
TEMPORAL ORDER OF BINDING: PREDICTION BASED
ON REGULATORY FEATURES
Fe3+ REPRESSES pmrD EXPRESSION
VIA THE PMRA/PMRB SYSTEM
high Fe3+
pmrDPmrD
low Mg2+
PhoQ��
PhoP -PO3
PmrB��
-PO3PmrA
pbgP
LPS modification
)3(0
)2(0
)4(0
)1(
_
0
][][][
][][
][][_
_
_
PHOPPHOP
phopphopPHOPphop
HPHOPT
HphopT
dtPHOPd
HphopT
PHOPKPHOP
HT
dtphopd
phopPHOPphopPHOP
phopPHOP
Decaimiento espontáneo
(3)PHOPphop
Activación
Traducción
Decaimiento espontáneo
(1)
(2)
(4)
MATEMATHICAL INTERPRETATION OF BUILDING
BLOCKS
COMPOSING BUILDING BLOCKS INTO GENE
NETWORKS
Funcionalidad Mg2+ Fe3+ mgta pmrd pbgp
1 1 1 1 0 12 1 0 1 1 13 0 1 0 0 14 0 0 0 0 0
Entrada Salida
0
0,2
0,4
0,6
0,8
1
1,2
0 50 100 150 200 250 300
Tiempo (mins)
Con
cent
raci
ón
pmrd
mgta
pbgp
CONSTRAINT-BASED LEARNING ALGORITHM
0
0,2
0,4
0,6
0,8
1
1,2
0 50 100 150 200 250 300
Tiempo (mins)
Con
cent
raci
ónPHOQPHOQ-ACTphop_phoqPHOPPHOP-PpmrdmgtaE-Mg2PMRBPMRB-ACTpmra_pmrbPMRAPMRA-PpbgpE-Fe3PMRDPMRA-P_PMRD
Funcionalidad Mg2+ Fe3+ mgta pmrd pbgp # Parámetros Proporción Probabilidad
1 1 1 1 0 1 68 0,000186349 0,8813570122 1 0 1 1 1 68 0,001092771 0,9045841133 0 1 0 0 1 68 0,000735495 0,8993325074 0 0 0 0 0 68 0,469708768 0,988949126
Entrada Salida
SIMULATIONS AND EVALUATION
0
0. 2
0. 4
0. 6
0 2 4 6 8 10 12
alpha_PM RA -P
0
0. 2
0. 4
0. 6
0 2 4 6 8 10 12
alpha_PM RA -P_PM RD
0. 2934
0. 2936
0. 2938
0. 294
0. 2942
0. 2944
0 20 40 60 80 100 120
H_mgta
0
0. 2
0. 4
0. 6
0 20 40 60 80 100 120
H_pbgp
0
0. 1
0. 2
0. 3
0. 4
0. 5
0 20 40 60 80 100 120
H_PHOP
0. 293
0. 294
0. 295
0. 296
0. 297
0. 298
0 20 40 60 80 100 120
H_phop_phoq
0
0. 2
0. 4
0. 6
0 20 40 60 80 100 120
H_PHOP-P
0
0. 2
0. 4
0. 6
0 20 40 60 80 100 120
H_PHOQ
0
0. 2
0. 4
0. 6
0 20 40 60 80 100 120
H_PHOQ-A CT
0
0. 1
0. 2
0. 3
0. 4
0. 5
0 20 40 60 80 100 120
H_PM RA
0
0. 2
0. 4
0. 6
0 20 40 60 80 100 120
H_pmra_pmrb
0
0. 2
0. 4
0. 6
0 20 40 60 80 100 120
H_PM RA-P
0
0. 2
0. 4
0. 6
0 20 40 60 80 100 120
H_PM RA -P_PM RD
0. 285
0. 29
0. 295
0. 3
0 2 0 4 0 60 80 1 00 1 20
H_PM RB
0
0. 1
0. 2
0. 3
0. 4
0 20 40 60 80 100 12 0
H_PM RB -A CT
PARAMETER SENSITIVITY
Simulaciones
0
0,2
0,4
0,6
0,8
1
1,2
0 10 20 30 40 50 60 70 80 90 100
Tiempo (mins)
Expr
esió
n
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
PREDICTIONS
Parámetros Modificados1 2 3 4 5 6 7 8H_PHOP-P 10 10 10 10 10 10 10 10H_mgta 20 20 20 20 20 20 20 20nu_mgta 1 1 1 1 1 5 5 5K_mgta 0,01 0,025 0,05 0,075 0,1 0,01 0,025 0,05
Parámetros Modificados9 10 11 12 13 14 15H_PHOP-P 10 10 10 10 10 10 10H_mgta 20 20 20 20 20 20 20nu_mgta 5 5 10 10 10 10 10K_mgta 0,075 0,1 0,01 0,025 0,05 0,075 0,1
Simulación
Binding relativo de PhoP al DNA de distintos genes
-0,2
0
0,2
0,4
0,6
0,8
1
1,2
-20 0 20 40 60 80 100
Tiempo (mins)
Bind
ing
phoPmgtArstApmrDslyBmig-14mgtCpagPpagKpagCpcgL
-5 0 5 10 20 30 60 90
mgtC
pcgL
pagC
mig-14
pagP
pmrD
slyB
phoP
mgtA
rstA
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
EXPERIMENTAL VALIDATION
Correlación de Pearson
Simulación phoP mgtA rstA pmrD slyB mig-14 mgtC pagP pagK pagC pcgL1 0,619287092 0,857949781 0,339818737 0,829000283 0,264923353 0,482460587 0,277169074 0,556744029 0,5991426 0,180180349 0,3209791932 0,745769371 0,906526323 0,502007318 0,901528946 0,424242626 0,628335893 0,419452365 0,671094889 0,722414443 0,332324322 0,470990363 0,843852314 0,915138454 0,648375986 0,936323412 0,581156672 0,751659807 0,552810422 0,762873798 0,822053179 0,482872977 0,6086787584 0,886153114 0,898741294 0,725794354 0,936364865 0,671873682 0,811663401 0,625347639 0,803557401 0,867101681 0,569665865 0,6820788125 0,813971572 0,842706965 0,657161056 0,882303929 0,641101063 0,746960693 0,600671959 0,782555306 0,834827405 0,54888883 0,6499702796 0,484578386 0,784076325 0,182657971 0,735239947 0,119355869 0,334879341 0,142821565 0,436909899 0,470369998 0,042201854 0,1771080887 0,858977561 0,961379585 0,648559205 0,971032198 0,499422627 0,75927577 0,509215514 0,747911953 0,80498073 0,413593112 0,5830255818 0,843524871 0,499762374 0,946114321 0,626705145 0,922203925 0,92202348 0,895723654 0,835517001 0,868634095 0,900230693 0,9324808999 0,550703753 0,117924047 0,749545426 0,247775955 0,917814195 0,650777545 0,732942357 0,530287422 0,591071935 0,850412566 0,733320591
10 0,286592176 -0,150739857 0,545795459 -0,027344699 0,799582889 0,394582616 0,512804486 0,253380729 0,325408172 0,687623039 0,50931898811 0,468974213 0,772242027 0,167076499 0,722511526 0,107396064 0,319675582 0,132281945 0,426383442 0,458211028 0,031994059 0,16522581612 0,913010303 0,975698548 0,713560888 0,981413357 0,533762236 0,804706746 0,509096392 0,715888847 0,79386547 0,42468576 0,58693351213 0,801054535 0,429836349 0,915624862 0,563254584 0,913774241 0,894472406 0,919408323 0,83067834 0,842771907 0,917072159 0,93035610614 0,425722417 -0,007911581 0,639678643 0,113528727 0,886942645 0,524659984 0,647662009 0,409550397 0,485339495 0,810487256 0,63651177515 0,122722329 -0,291314081 0,403220158 -0,17977406 0,698132005 0,225771766 0,349213006 0,071990352 0,155319882 0,556525278 0,349257421
Max Val 0,913010303 0,975698548 0,946114321 0,981413357 0,922203925 0,92202348 0,919408323 0,835517001 0,868634095 0,917072159 0,932480899Simulación 12 12 8 12 8 8 13 8 8 13 82 Mejor 4 7 13 7 9 13 8 13 4 8 13
CORRELATION BETWEEN EXPERIMENTAL
VALIDATION AND PREDICTIONS
phoPphoQ
mgtA
Mg2+ transport
PhoQ
PhoP
PmrB
PmrA
SEQUENTIAL ACTIVATION OF THE PHOP REGULON
pmrC pmrA pmrB
pbgP
LPS modificationvirulence
mgtCB pagP
PmrD
mig-14 pmrD
INGENEUE ENVIRONMENT
Microarray Integrated Analysis (MIA)
The Raw Material Available for the Analysis Consist of …
Gene expression (GeneChips) derived from longitudinal blood expression profiles of human volunteers treated with intravenous endotoxin compared to placebo
8 patients, four treated, patients 1-4, and four control, patients 5-8
From each patient there are 6 samples taken at different time points: at hour 0, hour 2, 4, 6, 9, 24. For patient 6, hours 4 and 6 are missing
… identify molecular pathways that provide insight into the host response over time to systemic inflammatory insults, as part of a Large-scale Collaborative Research Project sponsored by the National Institute of General Medical Sciences (www.gluegrant.org).
Our Ultimate Goal is to …
… by Building Genetic Networks Using Reverse Engineering Techniques
Hierarchical approach to model the dynamics of genetic networks by doing reverse engineering
Need of gene expression preprocessing by clustering genes into profiles
Difficulties of typical statistical methods in detecting significant changes between treated and control gene profiles
Revisit the reverse engineering approach by using data mining tools
Gene Expression Patterns Detected by Clustering Methods
First Approach to Obtain Dynamic Patterns: Boolean Networks
expression
time
B
C A B
B A or C
C (A and B) or (B and C) (A and C)A
First Approach to Obtain Dynamic Patterns: Boolean Networks
time
expression
t0 t1 t0 t1 t0 t1
A B
B A or C
C (A and B) or (B and C) (A and C)
Time t Time t+1A B C A B C
0 0 0 0 0 10 0 1 0 1 00 1 0 1 0 01 0 0 1 1 11 0 0 0 1 01 0 1 0 1 11 1 0 1 1 11 1 1 1 1 1
Static Vs. Dynamic Clustering
Clustering
0011 -> A1
1100 -> A2
0101 -> B1
1010 -> B2
A
B
t0
t1
Inferencia de relaciones temporales
The relationship between B(t0) and B(t1) can be captured by grouping B1 and B2 as a negative correlation. However, relationships among A(t0), B(t0) and A(t1) could not be determined in the same way because they are derived from a temporal behavior.
Static clustering
A(t0) Or B(t0) = A(t1)
0 0 0
0 1 1
1 0 1
A(t1) = A(t0) OR B(t0)
Not B(t0) = B(t1)
0 1
1 0
0 1
B(t1) = NOT B(t0)
A
OR
B
NotDynamic models
t0 t1
001 011
010 101
Dynamic Patterns Learned by Doing Reverse Engineering
D
B
E
C
A
113
28
Not
56[1
99]
184[71]
Not
98
99
226
78
49[206]
14 135
248
Not
195
Not
2[253]
126[129]Not 64
[191]Not 32[223]
16[239]
8[247]
4[251]
160
[95]
144
132
[123
]
63 [192
]
130
224
Not
143
96
Not
79
Not
48 [207
]24
[231
]
136
68
93[162]
34 [221
]
46
22 116
11
Not 186[69]
Not 110[145]
17[238] Not 18
[237]9
[246]
Not 115[140]
12
13
6
198
Not 70 35[220]
3[2
52]
131[124]
65
193
1[2
54]
NotNot
10 122 66Not 94 175[80]
138
[117
]
5[2
50]
Not
194
Not
33 47 [208
]
OR
Not
21[234]
Not
Not
236 0[255]
128[127]
OR
Not
Not
OR
OR
OR
OR
OR
20[235]
Not
A More General Approach: Hierarchical Modeling
Preprocess gene expression by clustering genes into prototypes
Build Boolean circuits based on prototypes
Recognize differential patterns between experiments and control
Interpret differences
Select interesting genes
Go deep into continuous modeling
Boolean Modeling: Pattern Preprocessing
OBJECTIVE: Find Genes with Different Treatment-Control Profiles, e.g.
TREATMENT CONTROL TREATMENT
TREATMENT
TREATMENT
PAT5 PAT6 PAT7 PAT8PAT1 PAT2 PAT3 PAT4
h0 h2 h4 h6 h9 h24
h0 h2 h4 h6 h9 h24
CONTROL CONTROLTREATMENT
Representation Issues…
TREATMENT
CONTROL
A cluster profile from treatment might result in several different profiles in control and vice versa.
Variability in Microarray Data...
TOTAL VARIABILITY: Biological Variability (main interest) + Systematic Variability (technical) + Random Variability
The statistical methods for microarray data analysis try to extract the biological variability and set aside the systematic and random variability.
Steps in the Statistical Analysis
SCALING: conversion of fluorescence values to numeric values in an established rank. We use Li-Wong scaling model in the whole analysis process.
NORMALIZATION: The process of removing systematic variability(detection of outliers).
FILTERING: preliminary elimination of genes that show no expression change at all between treatment and control.
DIFFERENTIAL ANALYSIS: Select a set of genes that are differentially expressed from treatment to control (show biological variability). This analysis will be performed over treatment / control and also over time. Since we work with data over time series, it is interesting to notice the behaviour of a gene over time. Accordingly we will analyse the data over both treatment and time.
TREATMENT CONTROL TREATMENT CONTROL TREATMENT CONTROL
Summarizing...
Yes/No
Number of Genes Selected by each Method
table 1
Results: ANOVA over Treatment, 2895 GenesTREATMENT CONTROL
Results: ANOVA over Time, 1226 GenesTREATMENT CONTROL
Results: ANOVA over Time/Treatment, 2715 Genes
TREATMENT CONTROL
Results: 7217 Genes Selected as the Union set of All Previous Methods
TREATMENT CONTROL CONTROL
Results: Clustering
TREATMENT
CONTROL
From the 7217 genes 5367 do not show an explicit differential profile expressed from control to treatment. Clustering has been applied and these clusters are not further studied, at least at this point.
5367 Genes
Summarizing Results, 1850 Genes Selected as Differentially Expressed from All Methods
TREATMENT TREATMENT CONTROL
Method Contribution to the Final Gene Selection
The percentage shown between parenthesis is the proportion of genes from the original method selection prior to applying any clustering (table 1) that stay in this final selection.
table 2
Genes Found by Li-Wong Filtered + t- test Comparisons and not by SAM
TREATMENT CONTROL
All the gene patterns found by any of the Li-Wong variants have been also found by ANOVA over treatment, some particular genes do not coincide though, but a small number.
Genes Found by Li-Wong with Time-Blocks
TREATMENT CONTROL
This type of profile shows very differentiated gene behavior in a patient with respect to all the others. These differences are between patients in the same experimental group, which can be caused by gender, age, conditions of the experiment... or other more interesting causes. Finding these patterns gives extra information about the experiment and should be taken into account.
Genes Found by SAM and ANOVA over Time/Treatment but not by Li-Wong
TREATMENT CONTROL
Genes Found by ANOVA over Treatment and by Li-Wong but Not by SAM
TREATMENT CONTROL
Genes Not Found by ANOVA over Time
TREATMENT CONTROL
Genes Found by ANOVA over Time and over Time/Treatment but Not by ANOVA over
Treatment
TREATMENT CONTROL
Genes Found by ANOVA over Treatment but Not by ANOVA over Time
TREATMENT CONTROL
Genes Found by ANOVA over Treatment but Not by any Other Combination of Methods
TREATMENT CONTROL
Coincidence Statistics: Pairwise Coverage of Examples
table 3
ColumnColumnRow
#)(#
Cell values correspond to percentage
High values -> most genes from row method are included in those from column method
Low values -> few genes from row method are included in those from column method
Which ones are the Best Fitting Methods? The Problem of Longitudinal Data
Clustering Visualization and Decision Making
Dependent Longitudinal Data: Repeated Measures ANOVA
Data Interactions: Relationship Between Gene Expression Changes and Other Features
Profile Search by Using Similarity to Prototypes
Clustering Visualization and Decision Making
TREATMENT
CONTROL
From the 7217 genes 5367 do not show an explicit differential profile expressed from control to treatment. Clustering has been applied and these clusters are not further studied, at least at this point.
5367 Genes
Dependent Longitudinal Data: Repeated Measures ANOVA
In repeated measures design ...
sample members are measured on several occasions or trials
each trial represents the measurement of the same characteristic under a different condition
... but in multivariate design ...
sample members are measured on several occasions, or trials
each trial represents the measurement of a different characteristic
Data Interactions: Relationship Between Gene Expression Changes and …
Gene Ontology (Molecular functions and biological processes)
Binding sites and CIS-features
Physical locations
Maps of biological pathways
expression1 2
0
2
4
6
8
transporter translation regulator
transcription regulator structural molecule
signal transducer obsolete molecular
chaperone
catalytic
binding
Molecular Function Expression
expression1 2
0
1
2
3
4 physiological
obsolete
biological development
cellular
Biological Process
Data Interactions: Gene Ontology
Associations of Functions and Processes
transporter
catalytic
chaperone
chaperonegene X chaperone
gene Y
Combined Associations of Functions and Processes
transporter
catalytic
chaperone
gene X
catalytic
chaperone
transporter
gene Y
Going Back to the Dynamic Modeling Problem
Preprocess gene expression by clustering genes into prototypes
Build Boolean circuits based on prototypes
Recognize differential patterns between experiments and control
Interpret differences
Select interesting genes
Go deep into continuous modeling
Recognize Differential Patterns between Experiment and Control Representations
D
B
E
C
A
113
28
Not
56[1
99]
184[71]
Not
98
99
226
78
49[206]
14 135
248
Not
195
Not
2[253]
126[129]Not 64
[191]Not 32[223]
16[239]
8[247]
4[251]
160
[95]
144
132
[123
]
63 [192
]
130
224
Not
143
96
Not
79
Not
48 [207
]24
[231
]
136
68
93[162]
34 [221
]
46
22 116
11
Not 186[69]
Not 110[145]
17[238] Not 18
[237]9
[246]
Not 115[140]
12
13
6
198
Not 70 35[220]
3[2
52]
131[124]
65
193
1[2
54]
NotNot
10 122 66Not 94 175[80]
138
[117
]
5[2
50]
Not
194
Not
33 47 [208
]
OR
Not
21[234]
Not
Not
236 0[255]
128[127]
OR
Not
Not
OR
OR
OR
OR
OR
20[235]
Not
Recognize Differential Patterns between Experiment and Control Representations
Not
2 32[223]64126
[129]16
[239]8
[247]4
[251]Not
66
Not
134
AN
D
12 [243
]23
1
Not
48
Not
96
192
194Not
144
200
AN
D
119
[136
]
Not
196
Not
9878 [177
]
88
Not
216
Not
211
147
Not
100
Not
50 [205
]10
2
Not
179
30
225
Not
112Not
240
199[56]
242
[13]
114
198Not
70 Not 220
101
7236 [219
]
146
[109
]
189
[246
]
195
6
3 131
130
65
193
227
184[71]
Not
135
7
Not
224
115
94 [161
]
Not
33 [222
]
208
Not
80
40[215]
168[87] 84
212
213
85Not
148[107]
74
75OR
165 210[45] 82 214Not
20
10
138
69
197
AND
139
244
Not
49
93
Not
209
81
Not
Not68
160
[95]
226 241
1[254]
0[255]
128[127]
229
OR
OR Not
113
28
Not
56 [199
]
184[71]
Not
98
99
226
78
49[206]
14 135
248
Not
195
Not
2[253]
126[129]Not 64
[191]Not 32 [223]
16[239]
8[247]
4[251]
160
[95]
144
132
[123
]
63 [192
]
130
224
Not
143
Not
96
Not
79
Not
48 [207
]24 [231
]
136
68
93[162]
34 [221
]
46 Not 104[151] 180 90 Not 82
210 Not 22 116
11
Not 186[69]
Not 110[145]
17[238] Not 72
[183]36
[219]18
[237]9
[246]
Not 115[140]
12
AN
D13
6
198
Not 70 35[220]
3[2
52]
131[124] 65
193
1[2
54]
AND NotNot
10 122 66Not 94 175[80]
40[215]
20[235]
138
[117
]
5[2
50]
Not
194
Not
33 47 [208
]
Not
87 [168
]
Not
OR
107 Not
202
101
77205
NotNot
AN
D
Not
85[170]
Not
42[213]
21[234]
NotAND
AN
D
Not
204 102
Not
76
38
166
Not
211
147
236
73
201
Not
0[255]
128[127]
OR
Not
Find architectural differential sub-circuits
Reverse gene mapping
TREATMENT CONTROL
Find Architectural Differences as Sub-circuits, e.g., members of the nerve growth factor (NGF) and
neurotrophic factor (GDNF) families
GDNF
40921_at 35453_at
2448
NGF 32
18OR4
OR
Preserve Relationships Among Genes by Using Temporal Clustering
31648_at31982_at32025_at32368_at32617_at32643_at32775_r_at33018_at33742_f_at33854_at34133_at35085_r_at36246_at37356_r_at37532_at37881_at39760_at40534_at40987_at41119_f_at41480_at41703_r_at41767_r_at853_at
41703_r_at
31648_at37881_at
41480_at
35618_at781_at
32[223]
16[239]
8[247] 18 9
[246]OR4[251]
1013_at2092_s_at31421_at32388_at32743_at32982_at34687_at35241_at35371_at35662_at36800_at36820_r_at36903_at37378_r_at37481_at37884_f_at38137_at38886_i_at39626_s_at39851_at40449_at41012_r_at
33496_at36760_at
41685_at
39806_at
1599_at31377_r_at32184_at32367_at32489_at32683_at33276_at34198_at34453_at34554_at34666_at35878_at36269_at36868_at36999_at37120_at37534_at38430_at38684_at39063_at41260_at746_at863_g_at
1895_at
32090_at35752_s_at
1629_s_at2002_s_at31653_at31810_g_at32666_at32999_at33103_s_at35203_at35245_at35258_f_at35405_at35698_at35973_at36214_at37489_s_at37895_at38405_at39663_at39878_at40041_at40086_at40964_at41228_r_at
35405_at37895_at
31810_g_at40041_at
32200_at40678_at41669_at
36895_at38038_at41784_at
37073_at37812_at39532_at40726_at
32[223]
16[239]
8[247] 18 9
[246]OR4[251]
35453_at
1560_g_at34757_at34997_r_at35373_at36911_at36931_at37671_at37804_at38096_f_at39015_f_at39513_r_at40921_at
31648_at37881_at
39806_at 41703_r_at
31345_at34790_at38829_r_at41401_at969_s_at
41685_at
1527_s_at1915_s_at31409_at31723_at32140_at32568_at32571_at33601_at33686_at33756_at34470_at35072_at35139_at35495_at36013_at36448_at36525_at36693_at37215_at37480_at37572_at37701_at37989_at38575_at38848_at39289_at39732_at40453_s_at40601_at40735_at754_s_at
31810_g_at40041_at
32090_at35752_s_at
35405_at37895_at
31329_at31454_f_at33548_f_at34382_at34973_at35034_at35268_at35904_at36835_at36976_at37351_at37863_at38463_s_at38642_at39034_at949_s_at
[AFFX-HUMRGE/M10098_5_at]
35618_at781_at
33496_at36760_at
1895_at
1262_s_at238_at32635_at32769_at34249_at34297_at34720_at34807_at35171_at35546_at38001_at38134_at38958_at41701_at
41480_at
32200_at40678_at41669_at
GDNF NGF
32 16 8 4
41685_at
39806_at
1895_at
32090_at35752_s_at
35405_at37895_at
31810_g_at40041_at
32200_at40678_at41669_at
41480_at
35618_at781_at
1
2
3
45
32 16 8 4
41685_at
39806_at
1895_at
32090_at35752_s_at
35405_at37895_at
31810_g_at40041_at
32200_at40678_at41669_at
41480_at
35618_at781_at
1
2
3
45
GPS, for Gene Promoter Scan
HPAM, for Hybrid Promoter Analysis Methodology
GENIE, for Gene Expression Networks Iterative Explorer
MIA, for Microarray Integrated Analysis
http://gps-tools.wustl.edu.
SUMMARIZING …
“…organisms of the most different sorts are
constructed from the very same battery of
genes. The diversity of life forms results from
small changes in the regulatory systems that
govern expression of these genes.”
François JacobIn Of flies, mice and men