data mining for genomicsweb.eecs.umich.edu/~hero/preprints/csu_slides03.pdf · istec seminar,...

62
ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining For Genomics Alfred O. Hero III The University of Michigan, Ann Arbor, MI ISTeC Seminar, CSU Feb. 22, 2003 1. Biotechnology Overview 2. Gene Microarray Technology 3. Mining the genomic database 4. The post-genomic era

Upload: others

Post on 08-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Data Mining For Genomics

Alfred O. Hero IIIThe University of Michigan, Ann Arbor, MI

ISTeC Seminar, CSUFeb. 22, 2003

1. Biotechnology Overview2. Gene Microarray Technology3. Mining the genomic database 4. The post-genomic era

Page 2: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

I. Biotechnology Overview! Genome: All the DNA contained in an organism. The

operating system/program for gene structure/function of an organism.

! Genomics: investigation of structure and function of very large numbers of genes undertaken in a simultaneous fashion.

! Bioinformatics: Computational extraction of information from biological data.

! Data Mining: Algorithms for extracting information from huge datasets using user-specified criteria.

Page 3: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Hierarchy of biological questions! Gene sequencing: what is the sequence of base pairs in

a DNA segment, gene, or genome? ! Gene Mapping: what are positions (loci) of genes on a

chromosome?! Gene expression profiling: what is pattern gene

activation/inactivation over time, tissue, therapy, etc? ! Genetic circuits: how do genes regulate

(stimulate/inhibit) each other’s expression levels over time?

! Genetic pathways: what sequence of gene interactions lead to a specific metabolic/structural (dys)function?

Page 4: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Retinal Gene Data Base

Link to sequence Link to NCBI database Source: Yu, Swaroop, etal (2002)

Page 5: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Source: Yu, Swaroop, etal (2002)

Page 6: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Genome Sequencing Status

! Whole genome has been sequenced for over 1000 viruses and over 100 microbes

! Plant and animal genomes sequenced" Oat,soybean,barley,rice,wheat,corn" Mouse,zebrafish,human

! Plant and animal genomes in progress" Cotton,tomato,potato…" Rabbit,dog,chicken…

Source: NCBI Entrez-Genome, http://www.ncbi.nlm.nih.gov:80/entrez/

Page 7: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Sequencing Milestones

Organism #of genes % genes with sequencing inferred function complete

E. Coli 4,288 60 1997Yeast 6,600 40 1996C. Elegans 19,000 40 1998Drosophila 12,000-14,000 25 1999Arabidopsis 25,000 40 2000Mouse 26,000-40,000 10-20 2002Human 26,383-39,114 10-20 2001

Source: http://www.biotech.ucdavis.edu/powerpoint/powerpoint.htm

Page 8: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

http://www-stat.stanford.edu/~susan/courses/s166/node2.html

Page 9: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Central Dogma of Molecular Biology

http://anx12.bio.uci.edu/~hudel/bs99a/lecture20

Page 10: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Central Dogma: From Gene to Protein

Source: NHGRI http://www.genome.gov/

Page 11: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Towards a unified theory….DNA

RNA

Proteins

Circuits

Phenotypes

Populations

GenBankEMBLDDBJ

MapDatabases

SwissPROTPIRPDB

Gene Expression?

Clinical Data ?

Regulatory Pathways? Metabolism?

Biodiversity?

Neuroanatomy?

Development ?

Molecular Epidemiology?

Comparative Genomics?

Source: http://www.biotech.ucdavis.edu/powerpoint/powerpoint.htm

Page 12: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Page 13: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

II. Gene Microarray Technologies

! High throughput method to probe DNA in a sample! Two principal microarray technologies:

1) Affymetrix GeneChip2) cDNA spotted arrays

! Main idea behind cDNA technology:1) Specific complementary DNA sequences arrayed on slide2) Dye-labeled RNA from sample is distributed over slide3) RNA binds to probes (hybridization)4) Presence of bound RNA-DNA pairs is read out by detecting spot fluorescence via laser excitation (scanning)

! Result: 10,000-50,000 genes can be probed at once

Page 14: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Specialized cDNA Array: Eye-Gene

I-gene slides

Gene Expression

wt RNA ko RNA

Source: J. Yu, UM BioMedEng Thesis Proposal (2002)

Page 15: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

I-Gene Array: Probe GenerationE15.5, Pn2,

MAR

Farjo, R & Yu, J. Vision Research 42 (2002)

Page 16: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

I-Gene Array: Printing and Processing

Farjo, R & Yu, J. Vision Research 42 (2002)

Page 17: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

I-Gene Array: Image Formation

overlay images

analysis

Page 18: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

• Treated sample labeled red (Cy5)• Control data labeled green (Cy3)

Page 19: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Single-Chip Raw Data Analysis5545 4434

Source: J. Yu, UM BioMedEng Thesis Proposal (2002)

Page 20: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Problem: Experimental Variability! Population – too wide genetic diversity! Cell lines - poor sample preparation ! Slide Manufacture – slide surface quality, dust

deposition! Hybridization – sample concentration, wash conditions! Cross hybridization – similar but different genes bind to

same probe! Image Formation – scanner saturation, lens

aberrations, gain settings! Imaging and Extraction – misaligned spot grid,

segmentationMicroarray data is intrinsically Statistical!

Page 21: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

III. Mining Statistical Genomic Data.Questions:! How to estimate true Cy5 and Cy3 from raw data?! How to compensate for experimental variability?! How to extract expression profile ratios from a set of

up to 50,000 probe responses?! How to specify gene profile selection criteria for

mining in this data?! How to discover complex genetic pathways to

disease, aging, etc?

Page 22: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Mining Statistical Genomic Data.Answers: ! Spot Extraction: Cy5/Cy3 or Cy5-Cy3?

" Image processing, image segmentation, non-linear anova models

! Comparing between microarray experiments" Statistical invariance, equalizing transformations,

normalization! Gene filtering and screening

" Simultaneous statistical inference, T-tests, FDR! Discovery of genetic pathways

" Clustering, dependency graphs, HMM’s

Page 23: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Spot Extraction Issues

! Technical noise and variability! Laser gain and calibration! Cy3/cy5 channel bleedthrough! Image formation gain! Spot-gridding algorithm! Spot segmentation algorithm

Page 24: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Technical Noise and Variability

Weak Signal Streaks

Good Signal

Irregular Spots Comet TailsSource: http://stress-genomics.org/

Page 25: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Gain Effects

Weak Normal Saturated

Optimal gain can be studied by information theory

Page 26: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Rate Distortion Lower BoundMSE

Hero Springer-03

Gain

Page 27: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Spot Segmentation Failure Modes

Grid misalignment Laser Misalignment

Source: C. Ball, Stanford Microarray Database

Page 28: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Steps in Conventional Segmentation

! Addressing – Locate “center of description”for each spot

! Spot Segmentation – Classification of pixels either as signal or background.

! Spot Quantification – Estimation of hybridization level/ratio of spot

Mathematical morphology unifies these steps

Page 29: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Segmentation via Morphological Operators

Alternate-Sequential FilteredOriginal Image

Final Segmented ImageWatershed TransformedSiddiqui, Hero and Siddiqui, Asilomar-02

Page 30: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Spot EigenAnalysis

! Gray level covariance matrix over each spot boundary is calculated

! Eigen analysis of each covariance matrix is performed

! Trends in direction of eigenvectors indicate systematic bias in spot printing

Siddiqui, Hero and Siddiqui, Asilomar-02

Page 31: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Circularity Coefficient: mean(r^2)/(mean(r))^2

Radius vs. Sphericity Coeff.

! Plot for Radius vs. Sphericity Coefficients (measure of circularity) of spots

! Spots with lower sphericity coefficients appear in lower half of plane

! Closer the sphericity coeff. is to 1, the better it is

! Deviation from circularity may give cause to discard data

Sphericity CoeffSiddiqui, Hero and Siddiqui, Asilomar-02

Page 32: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Another Dimension: Expression Profiles

Cy5/Cy3 hybridization profiles

Page 33: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Problem: Intrinsic Profile Variability

Within-gene variabilityAcross-gene variability

Page 34: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Solution: Experimental ReplicationExp MExp 1 Exp 2

…….M replicates

Issues: • Control by experimental replication is expensive• Surplus real estate allows replication in layout• Batch and spatial correlations may be a problem

Page 35: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Comparing Across Microarray Experiments

Experiment A Experiment B

Question: How to combine or compare experiments A and B?

Page 36: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Un-Normalized Data SetsWithin-experiment intensity variations mask A-B differences:

Experiment A (Wildtype) Experiment B (Knockout)Hero&Fleury, ISSP-03

Page 37: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Two Approaches

! If quantitative gene profile comparisons are required: " must find normalization function to align all data

sets within an experiment to a common reference.! If only ranking of gene profile differences is

required:" No need to normalize: can apply rank order

transformation to measured hybridization intensities

Page 38: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

A vs B Microarray Normalization Method

Inverse

Mean

Mean

Inverse

UnifTran

UnifTran

HousekeepingGene

Selector

Normalized A

Normalized B

Exp B

Exp A

Page 39: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Un-Normalized Data Set (Wildtype)

Hero&Fleury, ISSP-03

Page 40: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Normalized Data Set (Wildtype)

Hero&Fleury, ISSP-03

Page 41: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Profile Rank Order Statistics! Rank order algorithm: at each time point

replace each gene intensity with its relative rank among all genes" The relative ranking is preserved by (invariant to)

arbitrary monotonic intensity transformations.

*1

*2

*4 *3*1

*2*4 *3

x yx

1

1

10

10y

Page 42: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Mining Gene Expression Data

! Issues" Feature space" Feature selection criteria" Statistical robustification" Cross-validation" Experimental Validation

Page 43: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Y/O Human Retina Study

16 individuals in 2 groups of 8 subjects

(Yosida&etal:2002)

Selection criteria:

Page 44: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Fred Wright’s Human Fibroblast DataLemon&etal:2001

18 individuals in 3 groups of 6 subjects

Selection criteria:

Page 45: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Mouse Retinal Aging Data

24 mice in 6 groups of 4 subjects

Yosida&etal:2003

Selection criteria:

Page 46: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

NRL Knockout vs Wildtype Retina Study 12 knockout/wildtype mice in 3 groups of 4 subjects

Knockout WildtypeSwaroop&etal:2002

Selection criteria:

Page 47: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Data Mining with a Single Criterion

! Paired t-test with False Discovery Rate:

! For Y/O Human study:

Page 48: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Multicriterion scattergram: T-test

8226 Y/O retina genes plotted in multicriteria plane

Fleury&etal ICASSP-02

Page 49: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Multicriterion Selection Criteria! Seek to find Pareto-optimal genes which

strike a compromise between two criteria

A,B,D are Pareto optimal Pareto Fronts

Page 50: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Multicriterion scattergram: Pareto Fronts

firstsecondthird

Pareto fronts

Fleury&etal ICASSP-02

Page 51: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Cross-Validation Approach: Resampling

# replicates=m=4# time points=t=6# profiles=4^6=4096

Page 52: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Bayesian approach: Posterior Analysis

Page 53: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Pareto Front Likelihood table

Hero&Fleury:VLSI03

Page 54: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Robustification and Validation Issues! Cross-validation recomputes Pareto fronts

over all virtual profiles (Fleury&etal:2002). ! Bayesian Pareto front also robustifies prior

uncertainty in data (Hero&Fleury:2002).! Computational issues:

" Cross-validated fronts: completely data-driven but computation is O(m^t)

" Bayesian Pareto fronts: requires joint density of criteria and marginalization. Computation is linear in # replicates (m) and # time points (t).

Page 55: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

The Post-Genomic Era

! Whole genomes of species will be mapped! Genetic pathways to structure, metabolism,

disease, will remain as open questions! Pathway analysis: what are the important

gene interactions? " Requires performing many more experiments than

zero-interaction analysis" Computational load is exponentially increasing in

number of genes in pathway " New algorithms and models are needed

Page 56: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Draft Pathways for Photoreceptor Function

WntWnt/Ca /Ca –– calmodulincalmodulin pathwaypathway

Retinoid acid pathway

Bmp pathway

Source: J. Yu, UM BioMedEng Thesis Proposal (2002)

Page 57: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Each Link: Gene Co-regulation Study

-10

0

10

20

30

40

50

60

70

80

Rxra Rxrb Rxrg Rara

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-20

-10

0

10

20

30

40

50

60

70

80

Wnt5asfrp2

mfrpCalm2

Camk2bmyo5a

myo7aMtap6

RcvrnGcap1 Gc1 Gc2

Hrg4

-35

-25

-15

-5

5

15

Bmp2 Bmp4 Brk-1 Tob1 Smad1 Smad4

-35

-25

-15

-5

5

15

Bmp2 Bmp4 Brk-1 Tob1 Smad1 Smad4

-35

-25

-15

-5

5

15

Bmp2 Bmp4 Brk-1 Tob1 Smad1 Smad4

-35

-25

-15

-5

5

15

Bmp2 Bmp4 Brk-1 Tob1 Smad1 Smad4

-35

-25

-15

-5

5

15

Bmp2 Bmp4 Brk-1 Tob1 Smad1 Smad4

-35

-25

-15

-5

5

15

Bmp2 Bmp4 Brk-1 Tob1 Smad1 Smad4

-10

0

10

20

30

40

50

60

70

80

Rxra Rxrb Rxrg Rara

Bmp Bmp genesgenes

RARAgenesgenes

CalmodulinCalmodulin familyfamily

Page 58: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Conclusions

! Signal processing, math, computer science, statistics: ever-increasing role in genomics

! New frontiers:" Protein arrays" Mass Spect" Molecular Imaging

! Bottleneck will remain: computational and statistical inadequacies!

Page 59: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Dawning of Post-Genomic Era

Page 60: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Post-Post-Genomic Era?

Page 61: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Or….

Page 62: Data Mining For Genomicsweb.eecs.umich.edu/~hero/Preprints/csu_slides03.pdf · ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS Data Mining

ISTeC Seminar, Colorado State University, 2/03 The University of Michigan Dept. of EECS

Oligonucleotide GeneChip Microarray