Download - Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms
Learning rule-based models from gene expression time
profiles annotated with Gene Ontology terms
Jan Komorowski and Astrid Lägreid
J. Komorowski and A. Lägreid
Joint work with
• Torgeir R. Hvidsten, Herman Midelfart, Astrid Lægreid and Arne K. Sandvik
J. Komorowski and A. Lägreid
Selected Challenges in Gene-expression Analysis
• Function similarity corresponds to expression similarity but:– Functionally corelated genes may be expression-wise dissimilar
(e.g. anti-coregulated)– Genes usually have multiple function– Measurements may be approximate and contradictory
• Can we obtain clusters of biologically related genes?• Can we build models that classify unknown genes to
functional classes, that are human legible, and that handle approximate and often contradictory data?
• How can we re-use biological knowledge?
J. Komorowski and A. Lägreid
Data
• Data material– Serum starved fibroblasts, 8,613 genes
• Added serum to medium at time = 0• Used starved fibroblasts as reference• Measured gene activity at various time points
– 493 genes found to be differentially expressed• Results
– 278 genes known (3 repeats)– 212 genes unknown, (uncharacterized)– 211 genes given hypothetical function with 88% quality
J. Komorowski and A. Lägreid
Fibroblast - serum Fibroblast - serum responseresponse
10 4 8 24
quiescentnon-proliferating proliferating
serumserum samples for microarray analysis
J. Komorowski and A. Lägreid
10 4 8 24
quiescentnon-proliferating proliferating
protein synthesisprotein synthesis
lipid synthesislipid synthesis
stress responsestress response
cellcellmotilitymotility
re-entry re-entry cell cyclecell cycle
organelleorganellebiogenesisbiogenesistranscriptiontranscription
ProcessesProcesses
J. Komorowski and A. Lägreid
quiescentnon-proliferating proliferating
immediate early
delayed immediate
earlyintermediate
10 4 8 24
late
primary secondary tertiary
Dynamic processesDynamic processes
J. Komorowski and A. Lägreid
quiescentnon-proliferating proliferating
10 4 8 24primary secondary tertiary
Protein appears Protein appears afterafter the transcript the transcript
J. Komorowski and A. Lägreid
10 4 8 24
gene transcript protein
Protein dynamics are not always Protein dynamics are not always similar to transcript dynamicssimilar to transcript dynamics
J. Komorowski and A. Lägreid
Molecular mechanisms Molecular mechanisms of transcriptional of transcriptional
responseresponse
immediate earlyresponse genes
delayedimmediate earlyresponse genes
intermediate/lateresponse genes
effectorseffectors= cellular = cellular responseresponse
serumserum= signal= signal
immediate early response factors
secondarytranscription
factors
J. Komorowski and A. Lägreid
quiescentnon-proliferating proliferating
1 4 8 24
protein synthesis
DNA synthesis
energy metabolism
cell motility
stress responsecell motilitycell adhesion
DNA synthesis
lipid synthesis
cell cycle regulation
The dynamics of cellular processesThe dynamics of cellular processes
cell proliferation, negative regulation
J. Komorowski and A. Lägreid
Gene 0HR 15MIN 30MIN 1HR 2HR 4HR 6HR 8HR 12HR 16HR 20HR 24HR Process g1 0.00 -0.47 -3.32 -0.81 0.11 -0.60 -1.36 -1.03 -1.84 -1.00 -0.60 -0.94 Unknown
g2 0.00 0.66 0.07 0.20 0.29 -0.89 -0.45 -0.29 -0.29 -0.15 -0.45 -0.42 Transport and
defense response g3 0.00 0.14 -0.04 0.00 -0.15 -0.58 -0.30 -0.18 -0.38 -0.49 -0.81 -1.12 Cell cycle control
g4 0.00 -0.04 0.00 -0.23 -0.25 -0.47 -0.60 -0.56 -1.09 -0.71 -0.76 -0.62 Positive control of cell proliferation
g5 0.00 0.28 0.37 0.11 -0.17 -0.18 -0.60 -0.23 -0.58 -0.79 -0.29 -0.74 Positive control of cell proliferation
... ... ... ... ... ... ... ... ... ... ... ... ... ...
Process
Positive controlof cell
proliferation
Defenseresponse
Cell cyclecontrol
Ontology
Transport
g2 ... g2 ... g3 ...g4 ... g5
0 - 4(Increasing) AND 6 - 10(Decreasing) AND 14 - 18(Constant) => GO(cell proliferation)
Methodology1. Mining functional classes from an ontology
2. Extracting features for learning
3. Inducing minimal decision rules using rough sets
4. The function of unknown genes is predicted using the rules !-2
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10 12 14 16 18 20 22 24
J. Komorowski and A. Lägreid
Gene Ontology
GENEFUNCTION
CELLULARCOMPARTMENT
PROCESS
FUNCTION
Cell growth and maintenance
Metabolism
Energy pathwaysNucleotide and nucleic acid metabolism
DNA metabolism
TranscriptionDNA packagingDNA repairMutagenesis
Intracellular protein trafficIon homeostasisTransport
Lipid metabolism
Protein metabolism and modificationAmino-acid and derivative metabolismProtein targeting
Cell deathCell motilityStress responseOrganelle organizaton and responseOncogenesisCell proliferationCell cycle
Cell communication
Cell adhesionSignal transduction
Cell surface receptor linked signal transductionIntracellular signalling cascade
Developmental processes
Physiological processes
Blood CoagulationCirculation
J. Komorowski and A. Lägreid
Energy pathways DNA metabolismAmino acid and derivative
metabolismProtein targeting
Lipid metabolism Transport Ion hemostasis Intracellular traffic
Cell death Cell motility Stress responseOrganelle organization and biogenesis
Oncogenesis Cell cycle Cell adhesionCell surface receptor linked signal
transduction
Intracellular signaling cascadeDevelopmental processes Blood coagulation Circulation
Biological processes from GO
J. Komorowski and A. Lägreid
Hierchical Clustering of the Fibroblast Data
It’s not a cluster!
J. Komorowski and A. Lägreid
Gene Ontology vs. Clusters found by Iyer et al.
J. Komorowski and A. Lägreid
Template-based feature synthesisAll possiblesubintervals
in the time series
Templates:IncreasingDecreasing
Constant
Gene expressiontime series data
Groups containinggenes matching the
same templates overthe same subinterval
+
MATCH
12 measurement points, 55 possible intervals of length >2
J. Komorowski and A. Lägreid
Examples of template definitions
MIN. 0.6
MAX 0.2
MIN. 0.1
MIN. 0.1
2HR 8HR6HR4HR
MEANMIN. 0.2
8HR6HR4HR
MIN. 0.2
Constant-template
Increasing-template
MIN. 0
MIN. 0
12HR
8HR 12HR
1.0
0.5
J. Komorowski and A. Lägreid
Rule example 1
Rule Covered genes0 - 4(Constant) AND 0 - 10(Increasing) => GO(protein metabolism and modification) OR GO(mesoderm development) OR GO(protein biosynthesis)
M35296 J02783 D13748 X05130X60957D13748U90918 (unknown)
-1
-0.5
0
0.5
1
1.5
2
2.5
3
0 2 4 6 8 10 12 14 16 18 20 22 24
J. Komorowski and A. Lägreid
Rule example 2
Rule Covered genes0 - 4(Increasing) AND 6 - 10(Decreasing) AND 14 - 18(Constant) => GO(cell proliferation) OR GO(cell-cell signaling) OR GO(intracellular signaling cascade) OR GO(oncogenesis)
Y07909 X58377 U66468X58377X85106Y07909
-2
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10 12 14 16 18 20 22 24
J. Komorowski and A. Lägreid
Classification using template-based rules
IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN …
IF 0 - 4(Constant) AND 0 - 10(Increasing) THEN GO(prot. met. and mod.) OR …IF … THEN IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN ……
X60957
-1
-0.5
0
0.5
1
1.5
2
2.5
3
0 2 4 6 8 10 12 14 16 18 20 22 24
Process Votes protein metabolism and modification 6 protein amino acid phosphorylation 3 proteolysis and peptidolysis 2 transcription 1 transport 1 vision 1 …
+4
Votes are normalized and processes with vote fractions higher than a selection-threshold are chosen as predictions
J. Komorowski and A. Lägreid
Cross validation estimates Iyer et al.
PROCESS AUC SE Ion homeostasis 1.00 0.00 Protein targeting 0.99 0.03 Blood coagulation 0.96 0.08 DNA metabolism 0.94 0.09 Intracellular signaling cascade 0.94 0.06 Energy pathways 0.93 0.12 Cell cycle 0.93 0.04 Oncogenesis 0.92 0.11 Circulation 0.91 0.11 Cell death 0.90 0.10 Developmental processes 0.90 0.07 Transcription 0.88 0.11 Defense (immune) response 0.88 0.05 Cell adhesion 0.87 0.09 Stress response 0.86 0.15 Protein metabolism and modification 0.85 0.10 Cell motility 0.84 0.11 Cell surface rec linked signal transd 0.82 0.15 Lipid metabolism 0.81 0.14 Transport 0.79 0.17 Cell organization and biogenesis 0.79 0.11 Cell proliferation 0.79 0.06 Amino acid and derivative metabolism 0.69 0.06
AVERAGE
0.88
0.09
A:Coverage: 84%Precision: 50%
B:Coverage: 71%Precision: 60%
C:Coverage: 39%Precision: 90%
Coverage = TP/(TP+FN)Precision = TP/(TP+FP)
J. Komorowski and A. Lägreid
Cross validation estimates Cho et al.Process GO AUC SE apoptosis* GO:0006915 0.81 0.01 carbohydrate metabolism GO:0005975 0.72 0.02 cell adhesion* GO:0007155 0.77 0.02 cell cycle control* GO:0000074 0.83 0.01 cell motility* GO:0006928 0.81 0.01 cell proliferation GO:0008283 0.80 0.01 cell surface rec linked signal transd GO:0007166 0.79 0.01 cell-cell signaling GO:0007267 0.80 0.01 DNA metabolism GO:0006259 0.78 0.02 energy pathways GO:0006091 0.76 0.02 humoral immune response GO:0006959 0.77 0.02 immune response GO:0006955 0.81 0.01 intracellular signaling cascade GO:0007242 0.81 0.02 lipid metabolism GO:0006629 0.71 0.02 mesoderm development GO:0007498 0.77 0.02 mitotic cell cycle* GO:0000278 0.84 0.01 neurogenesis GO:0007399 0.78 0.01 oncogenesis GO:0007048 0.77 0.01 phototransduction GO:0007602 0.85 0.01 physiological processes GO:0007582 0.77 0.01 protein biosynthesis GO:0006412 0.80 0.02 protein metabolism and modification GO:0006411 0.77 0.01 protein amino acid phosphorylation GO:0006468 0.82 0.01 proteolysis and peptidolysis GO:0006508 0.80 0.02 transcription GO:0006350 0.71 0.01 transport GO:0006810 0.71 0.01 vision GO:0007601 0.83 0.01
AVERAGE 0.78 0.01
Coverage: 58%Precision: 61%
Coverage = TP/(TP+FN)Precision = TP/(TP+FP)
J. Komorowski and A. Lägreid
Protein Metabolism and Modification
A B C
D E
A – annotationsB – false negativesC – false positivesD – true positives E – pred. unknown gene
J. Komorowski and A. Lägreid
Re-classification of the Known Genes
J. Komorowski and A. Lägreid
Co-classifications for the Unknown Genes
J. Komorowski and A. Lägreid
Conclusions• Our methodology
– Incorporates background biological knowledge– Handles well the noise and incompleteness in the
microarray data– Can be objectively evaluated– Predicts multiple functions per gene– Can reclassify known genes and provide possible
new functions of the known genes– Can provide hypotheses about the function of
unknown genes• Experimental work needs to be done to
confirm our predictions
J. Komorowski and A. Lägreid
Genomic ROSETTA:http://www.idi.ntnu.no/~aleks/rosetta