cs491jh: data mining in bioinformatics introduction to microarray technology technology background...
TRANSCRIPT
![Page 1: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/1.jpg)
CS491JH: Data Mining in Bioinformatics
Introduction to Microarray Technology
•Technology Background
•Data Processing Procedure
•Characteristics of Data
•Data integration and Data mining
![Page 2: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/2.jpg)
Substrates for High Throughput Arrays
Nylon Membrane Glass SlidesGeneChip
Single label P33 Single label biotinstreptavidin
Dual labelCy3, Cy5
![Page 3: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/3.jpg)
GeneChip® Probe Arrays
24µm24µm
Millions of copies of a specificMillions of copies of a specificoligonucleotide probeoligonucleotide probe
Image of Hybridized Probe ArrayImage of Hybridized Probe Array
>200,000 different>200,000 differentcomplementary probes complementary probes
Single stranded, Single stranded, labeled RNA targetlabeled RNA target
Oligonucleotide probeOligonucleotide probe
**
**
*
1.28cm1.28cm
GeneChipGeneChip Probe ArrayProbe ArrayHybridized Probe CellHybridized Probe Cell
![Page 4: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/4.jpg)
GeneChip® Expression Array Design
GeneGeneSequenceSequence
Probes designed to be Probes designed to be Perfect MatchPerfect Match
Probes designed to be Probes designed to be MismatchMismatch
Multiple Multiple oligo probesoligo probes
5´5´ 3´3´
![Page 5: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/5.jpg)
Procedures for Target Preparation
cDNAcDNAFragmentFragment(heat, Mg(heat, Mg2+2+))
LL LL LL LL
Wash & StainWash & Stain
ScanScan
HybridizeHybridize
(16 hours)(16 hours)
Labeled transcriptLabeled transcript
Poly (A)Poly (A)++//TotalTotal RNARNA
AAAAAAAA
IVTIVT
(Biotin-UTP(Biotin-UTPBiotin-CTP)Biotin-CTP)
Labeled fragmentsLabeled fragments
LL LL
LL
LL
CellsCells
![Page 6: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/6.jpg)
Microarray Technology
![Page 7: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/7.jpg)
NSF Soybean Functional GenomicsSteve Clough / Vodkin Lab
Printing Arrays on 50 slides
![Page 8: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/8.jpg)
Cells from condition ACells from condition ACells from condition ACells from condition A Cells from condition BCells from condition BCells from condition BCells from condition B
mRNA
Label Dye 2
NSF / U of IllinoisMicroarray Workshop-Steve Clough / Vodkin Lab
Ratio of expression of genes from two sources
Label Dye 1
cDNA
equal over under
Mix
Totalor
![Page 9: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/9.jpg)
GSI Lumonics
NSF Soybean Functional GenomicsSteve Clough / Vodkin Lab
![Page 10: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/10.jpg)
Beta Actin
PKG
HPRT
Beta 2 microglobulin
RubiscoAB binding protein
Major latex proteinhomologue (MSG)
Cattle and Soy Controls
Array of cattle and soy spiking controls. 50 ug of cattle brain total RNA was labeled with Cy3 (green).1 ul each of in vitro transcribed soy Rubisco (5 ng), AB binding protein (0.5 ng) and MSG (0.05 ng) were labeled with Cy5. The two labeled samples were cohybridized on superamine slides (Telechem, Inc.). To the right of each set of spots are five negative controls (water).
![Page 11: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/11.jpg)
IgM
IgM heavy chain
MYLK
COL1A2 COL1A2
MYLK
IgM
Fetal Spleen-Cy3 Adult Spleen-Cy5
IgM heavy chain
![Page 12: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/12.jpg)
Placenta vs. Brain – 3800 Cattle Placenta Array cy3 cy5
GenePix Image Analysis Software
![Page 13: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/13.jpg)
GeneFilter Comparison Report GeneFilter 1 Name: GeneFilter 1 Name:O2#1 8-20-99adjfinal N2#1finaladj
INTENSITIESRAW NORMALIZED
ORF NAME GENE NAME CHRM F G R GF1 GF2 GF1 GF2 DIFFERENCE RATIOYAL001C TFC3 1 1 A 1 2 12.03 7.38 403.83 209.79 194.04 1.92YBL080C PET112 2 1 A 1 3 53.21 35.62 "1,786.11" "1,013.13" 772.98 1.76YBR154C RPB5 2 1 A 1 4 79.26 78.51 "2,660.73" "2,232.86" 427.87 1.19YCL044C 3 1 A 1 5 53.22 44.66 "1,786.53" "1,270.12" 516.41 1.41YDL020C SON1 4 1 A 1 6 23.80 20.34 799.06 578.42 220.64 1.38YDL211C 4 1 A 1 7 17.31 35.34 581.00 "1,005.18" -424.18 -1.73YDR155C CPH1 4 1 A 1 8 349.78 401.84 "11,741.98" "11,428.10" 313.88 1.03YDR346C 4 1 A 1 9 64.97 65.88 "2,180.87" "1,873.67" 307.21 1.16YAL010C MDM10 1 1 A 2 2 13.73 9.61 461.03 273.36 187.67 1.69YBL088C TEL1 2 1 A 2 3 8.50 7.74 285.38 220.01 65.37 1.30YBR162C 2 1 A 2 4 226.84 293.83 "7,614.82" "8,356.39" -741.57 -1.10YCL052C PBN1 3 1 A 2 5 41.28 34.79 "1,385.79" 989.41 396.38 1.40YDL028C MPS1 4 1 A 2 6 7.95 6.24 266.99 177.34 89.65 1.51YDL219W 4 1 A 2 7 16.08 11.33 539.93 322.20 217.74 1.68YDR163W 4 1 A 2 8 19.13 14.19 642.17 403.56 238.61 1.59YDR354W TRP4 4 1 A 2 9 62.24 40.74 "2,089.48" "1,158.64" 930.84 1.80YAL018C 1 1 A 3 2 10.72 8.81 359.75 250.60 109.15 1.44YBL096C 2 1 A 3 3 10.91 8.98 366.40 255.40 111.00 1.43YBR169C SSE2 2 1 A 3 4 17.33 27.81 581.80 790.84 -209.05 -1.36YCL060C 3 1 A 3 5 17.99 24.75 603.96 703.75 -99.79 -1.17YDL036C 4 1 A 3 6 14.22 8.86 477.39 251.94 225.44 1.89YDL227C HO 4 1 A 3 7 25.61 31.52 859.71 896.46 -36.75 -1.04YDR171W HSP42 4 1 A 3 8 102.08 98.37 "3,426.83" "2,797.58" 629.25 1.22YDR362C 4 1 A 3 9 16.32 12.95 547.96 368.39 179.57 1.49YAL026C DRS2 1 1 A 4 2 11.32 7.97 379.85 226.53 153.33 1.68YBL102W SFT2 2 1 A 4 3 55.88 63.74 "1,875.82" "1,812.81" 63.02 1.03YBR177C 2 1 A 4 4 63.31 29.03 "2,125.20" 825.60 "1,299.60" 2.57YCL068C 3 1 A 4 5 8.33 4.47 279.51 127.16 152.35 2.20YDL044C MTF2 4 1 A 4 6 11.73 6.96 393.88 198.07 195.81 1.99YDL235C YPD1 4 1 A 4 7 38.71 30.20 "1,299.33" 858.83 440.50 1.51YDR179C 4 1 A 4 8 12.77 11.05 428.60 314.12 114.48 1.36YDR370C 4 1 A 4 9 16.70 15.30 560.62 435.13 125.49 1.29YAL034C FUN19 1 1 A 5 2 20.89 24.21 701.32 688.59 12.73 1.02YBL111C 2 1 A 5 3 22.38 13.67 751.39 388.69 362.70 1.93YBR185C MBA1 2 1 A 5 4 38.42 19.96 "1,289.61" 567.78 721.83 2.27YCLX03C 3 1 A 5 5 8.69 3.66 291.77 104.11 187.66 2.80YDL052C SLC1 4 1 A 5 6 52.37 49.87 "1,758.05" "1,418.33" 339.73 1.24YDL243C 4 1 A 5 7 15.56 12.95 522.24 368.30 153.94 1.42YDR186C 4 1 A 5 8 16.48 15.01 553.30 426.75 126.55 1.30YDR378C 4 1 A 5 9 31.13 28.08 "1,045.01" 798.50 246.50 1.31YAL040C CLN3 1 1 A 6 2 126.65 107.34 "4,251.70" "3,052.61" "1,199.08" 1.39YBR006W 2 1 A 6 3 22.74 11.10 763.49 315.55 447.94 2.42YBR193C 2 1 A 6 4 14.81 15.55 497.07 442.20 54.87 1.12YCLX11W 3 1 A 6 5 161.96 175.34 "5,436.86" "4,986.41" 450.44 1.09YDL060W 4 1 A 6 6 29.84 37.13 "1,001.65" "1,055.98" -54.34 -1.05YDR003W 4 1 A 6 7 23.99 23.22 805.48 660.25 145.22 1.22YDR194C MSS116 4 1 A 6 8 66.58 47.16 "2,235.07" "1,341.29" 893.78 1.67YDR386W 4 1 A 6 9 11.27 5.75 378.27 163.46 214.81 2.31YAL047C 1 1 A 7 2 15.54 11.30 521.74 321.28 200.46 1.62YBR012W-B 2 1 A 7 3 54.70 79.97 "1,836.29" "2,274.15" -437.86 -1.24YBR201W DER1 2 1 A 7 4 21.67 19.57 727.49 556.64 170.85 1.31YCR007C 3 1 A 7 5 25.02 15.96 840.01 453.76 386.25 1.85YDL068W 4 1 A 7 6 18.32 13.11 614.83 372.78 242.05 1.65
![Page 14: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/14.jpg)
1. Experimental Design
2. Image Analysis – raw data
3. Normalization – “clean” data
4. Data Filtering – informative data
5. Model building
6. Data Mining (clustering, pattern recognition, et al)
7. Validation
Microarray Data Process
![Page 15: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/15.jpg)
Scatterplot of Normalized Data
Adult
Fet
al
![Page 16: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/16.jpg)
>0.3<-0.3
![Page 17: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/17.jpg)
Characteristics of Data
Data can be viewed as a NxM matrix (N >> M):
N is the number of genes
M is the number of data points for each gene
Or Nx(M+K)
K is the number of Features describing each gene(genome location, functional description, metabolic pathway et al)
![Page 18: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/18.jpg)
Model for Data Analysis
•Gene Expression is a Dynamic Process
•Each Microarray Experiment is a snap shot of the process
•Need basic biological knowledge to build model
For Example:
Assumption – In most of experiments, only a small set of genes (100s/1000s) have been affected significantly.
![Page 19: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/19.jpg)
Data Mining
•Data volumes are too large for traditional analysis methods
Large number of records and high dimensional data
•Only small portion of data is analyzed
•Decision support process becomes more complex
Functions of Data Mining
Need for Data Mining
Use the data to build predictors – prediction, classification, deviation detection, segmentation
Generates more sophisticated summaries and reports to aid understanding of the data – find clusters, partitions in data
![Page 20: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/20.jpg)
Data Mining Methods
Classification, Regression (Predictive Modeling)
Clustering (Segmentation)
Association Discovery (Summarization)
Change and deviation detection
Dependency Modeling
Information Visualization
![Page 21: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/21.jpg)
Cholesterol Biosynthesis
Cell Cycle
Immediate Early Response
Signaling and Angiogenesis
Wound Healing and Tissue Remodeling
Clustered display of data from time course of serum stimulation of primary human fibroblasts.
Eisen et al. Proc. Natl. Acad. Sci. USA 95 (1998) pg 14865
![Page 22: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/22.jpg)
![Page 23: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/23.jpg)
![Page 24: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/24.jpg)
Self Organizing Maps
![Page 25: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/25.jpg)
Molecular Classification of Cancer
![Page 26: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/26.jpg)
![Page 27: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/27.jpg)
Gene Expression Profile of Aging and Its Retardation by Caloric Restriction
Cheol-Koo Lee, Roger G. Klopp, Richard Weindruch, Tomas A. Prolla
![Page 28: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/28.jpg)
Expression Landscape of cell-cycle regulated genes in yeast
![Page 29: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/29.jpg)
Multi-dimension data visualization
![Page 30: CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data](https://reader030.vdocument.in/reader030/viewer/2022032612/56649ef25503460f94c04382/html5/thumbnails/30.jpg)