for written notes on this lecture, please read chapter 14...
TRANSCRIPT
![Page 1: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/1.jpg)
For written notes on this lecture, please read chapter 14 of The Practical Bioinformatician.
CS2220: Introduction to Computational Biology
Lecture 4: Gene Expression Analysis
Limsoon Wong
![Page 2: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/2.jpg)
Copyright 2012 © Limsoon Wong
2
Plan
• Microarray background
• Gene expression profile classification
• Gene expression profile clustering
• Normalization
• Extreme sample selection
• Intersection Analysis
![Page 3: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/3.jpg)
Background on Microarrays
![Page 4: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/4.jpg)
Copyright 2012 © Limsoon Wong
4
What is a Microarray?
• Contain large number of DNA molecules spotted
on glass slides, nylon membranes, or silicon
wafers
• Detect what genes are being expressed or found
in a cell of a tissue sample
• Measure expression of thousands of genes
simultaneously
![Page 5: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/5.jpg)
Copyright 2012 © Limsoon Wong
5
Affymetrix GeneChip Array
![Page 6: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/6.jpg)
Copyright 2012 © Limsoon Wong
6
quartz is washed to ensure uniform
hydroxylation across its surface and to
attach linker molecules
exposed linkers become deprotected and
are available for nucleotide coupling
Making Affymetrix GeneChip Array
Exercise: What is the other commonly used
type of microarray? How is that one different
from Affymetrix’s?
![Page 7: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/7.jpg)
Copyright 2012 © Limsoon Wong
7
Gene Expression Measurement
by Affymetrix GeneChip Array
Click to
watch an
interesting
movie
explaining the
working of
microarray
![Page 8: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/8.jpg)
Copyright 2012 © Limsoon Wong
8
A Sample Affymetrix GeneChip
Data File (U95A)
![Page 9: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/9.jpg)
Copyright 2012 © Limsoon Wong
9
Some Advice on
Affymetrix Gene Chip Data
• Ignore AFFX genes
– These genes are control genes
• Ignore genes with “Abs Call” equal to “A” or “M”
– Measurement quality is suspect
• Upperbound 40000, lowerbound 100
– Accuracy of laser scanner
• Deal with missing values Exercise: Suggest 2 ways
to deal with missing value
![Page 10: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/10.jpg)
Copyright 2012 © Limsoon Wong
10
Type of Gene Expression Datasets
Class Gene1 Gene2 Gene3 Gene4 Gene5 Gene6 Gene7 .....
Sample1 Cancer 0.12 -1.3 1.7 1.0 -3.2 0.78 -0.12
Sample2 Cancer 1.3
.
~Cancer
SampleN ~Cancer
1000 - 100,000 columns
100-500 rows
Gene-Conditions or Gene-Sample (numeric or discretized)
Gene-Sample-Time Gene-Time
time
expre
ssio
n le
ve
l
![Page 11: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/11.jpg)
Copyright 2012 © Limsoon Wong
11
Type of Gene Expression Datasets
1000 - 100,000 columns
100-500 rows
Gene-Conditions or Gene-Sample (numeric or discretized)
Class Gene1 Gene2 Gene3 Gene4 Gene5 Gene6 Gene7 .....
Sample1 Cancer 1 0 1 1 1 0 0
Sample2 Cancer 1
.
~Cancer
SampleN ~Cancer
Gene-Sample-Time Gene-Time
time
expre
ssio
n le
ve
l
![Page 12: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/12.jpg)
Copyright 2012 © Limsoon Wong
12
Application: Disease Subtype Diagnosis
???
malign
malign
malign
malign
benign
benign
benign
benign
??? ???
genes
sam
ple
s
![Page 13: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/13.jpg)
Copyright 2012 © Limsoon Wong
13
Application: Treatment Prognosis
???
NR
NR
NR
NR
R
R
R
R
??? ???
genes
sam
ple
s
![Page 14: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/14.jpg)
Copyright 2012 © Limsoon Wong
14
Type of Gene Expression Datasets
1000 - 100,000 columns
100-500 rows
Gene-Conditions or Gene-Sample (numeric or discretized)
Gene1 Gene2 Gene3 Gene 4 Gene5 Gene6 Gene7
Cond1 0.12 -1.3 1.7 1.0 -3.2 0.78 -0.12
Cond2 1.3
.
CondN
Gene-Sample-Time Gene-Time
time
expre
ssio
n le
ve
l
![Page 15: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/15.jpg)
Copyright 2012 © Limsoon Wong
15
Application: Drug Action Detection
Normal
Normal
Normal
Normal
Drug
Drug
Drug
Drug
genes
condit
ions
Which group of genes are the drug affecting on?
![Page 16: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/16.jpg)
Gene Expression Profile Classification
Diagnosis of Childhood Acute
Lymphoblastic Leukemia and Optimization
of Risk-Benefit Ratio of Therapy
![Page 17: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/17.jpg)
Copyright 2012 © Limsoon Wong
17
• The subtypes look similar
• Conventional diagnosis
– Immunophenotyping
– Cytogenetics
– Molecular diagnostics
• Unavailable in most
ASEAN countries
Childhood ALL
• Major subtypes: T-ALL,
E2A-PBX, TEL-AML, BCR-
ABL, MLL genome
rearrangements,
Hyperdiploid>50
• Diff subtypes respond
differently to same Tx
• Over-intensive Tx
– Development of
secondary cancers
– Reduction of IQ
• Under-intensiveTx
– Relapse
![Page 18: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/18.jpg)
Copyright 2012 © Limsoon Wong
18
Mission
• Conventional risk assignment procedure requires
difficult expensive tests and collective judgement
of multiple specialists
• Generally available only in major advanced
hospitals
Can we have a single-test easy-to-use platform
instead?
![Page 19: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/19.jpg)
Copyright 2012 © Limsoon Wong
19
Single-Test Platform of
Microarray & Machine Learning
![Page 20: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/20.jpg)
Copyright 2012 © Limsoon Wong
20
Overall Strategy
• For each subtype, select
genes to develop
classification model for
diagnosing that subtype
• For each subtype, select
genes to develop
prediction model for
prognosis of that subtype
Diagnosis
of subtype
Subtype-
dependent
prognosis
Risk-
stratified
treatment
intensity
![Page 21: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/21.jpg)
Copyright 2012 © Limsoon Wong
21
Subtype Diagnosis by PCL
• Gene expression data collection
• Gene selection by 2
• Classifier training by emerging pattern
• Classifier tuning (optional for some machine
learning methods)
• Apply classifier for diagnosis of future cases by
PCL
![Page 22: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/22.jpg)
Copyright 2012 © Limsoon Wong
22
Childhood ALL Subtype
Diagnosis Workflow
A tree-structured
diagnostic
workflow was
recommended by
our doctor
collaborator
![Page 23: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/23.jpg)
Copyright 2012 © Limsoon Wong
23
Training and Testing Sets
![Page 24: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/24.jpg)
Copyright 2012 © Limsoon Wong
24
Signal Selection Basic Idea
• Choose a signal w/ low intra-class distance
• Choose a signal w/ high inter-class distance
![Page 25: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/25.jpg)
Copyright 2012 © Limsoon Wong
25
Signal Selection by 2
![Page 26: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/26.jpg)
Copyright 2012 © Limsoon Wong
26
Emerging Patterns
• An emerging pattern is a set of conditions
– usually involving several features
– that most members of a class satisfy
– but none or few of the other class satisfy
• A jumping emerging pattern is an emerging
pattern that
– some members of a class satisfy
– but no members of the other class satisfy
• We use only jumping emerging patterns
![Page 27: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/27.jpg)
Copyright 2012 © Limsoon Wong
27
Examples
Reference number 9: the expression of gene 37720_at > 215
Reference number 36: the expression of gene 38028_at 12
Patterns Frequency (P) Frequency(N)
{9, 36} 38 instances 0
{9, 23} 38 0
{4, 9} 38 0
{9, 14} 38 0
{6, 9} 38 0
{7, 21} 0 36
{7, 11} 0 35
{7, 43} 0 35
{7, 39} 0 34
{24, 29} 0 34
Easy interpretation
![Page 28: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/28.jpg)
Copyright 2012 © Limsoon Wong
28
PCL: Prediction by Collective Likelihood
![Page 29: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/29.jpg)
Copyright 2012 © Limsoon Wong
29
PCL Learning
Top-Ranked EPs in
Positive class
Top-Ranked EPs in
Negative class
EP1P (90%)
EP2P (86%)
.
.
EPnP (68%)
EP1N (100%)
EP2N (95%)
.
.
EPnN (80%)
The idea of summarizing multiple top-ranked EPs is intended
to avoid some rare tie cases
![Page 30: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/30.jpg)
Copyright 2012 © Limsoon Wong
30
PCL Testing
ScoreP = EP1P’ / EP1
P + … + EPkP’ / EPk
P
Most freq EP of pos class
in the test sample
Most freq EP of pos class
Similarly,
ScoreN = EP1N’ / EP1
N + … + EPkN’ / EPk
N
If ScoreP > ScoreN, then positive class,
Otherwise negative class
![Page 31: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/31.jpg)
Copyright 2012 © Limsoon Wong
31
Accuracy of PCL (vs. other classifiers)
The classifiers are all applied to the 20 genes selected
by 2 at each level of the tree
![Page 32: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/32.jpg)
Copyright 2012 © Limsoon Wong
32
Understandability of PCL
• E.g., for T-ALL vs. OTHERS, one ideally
discriminatory gene 38319_at was found,
inducing these 2 EPs
• These give us the diagnostic rule
![Page 33: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/33.jpg)
Copyright 2012 © Limsoon Wong
33
Multidimensional Scaling Plot
for Subtype Diagnosis
Obtained by performing PCA on the 20 genes chosen for each level
![Page 34: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/34.jpg)
Copyright 2012 © Limsoon Wong
34
Childhood ALL Cure Rates
• Conventional risk
assignment procedure
requires difficult
expensive tests and
collective judgement of
multiple specialists
Not available in less
advanced ASEAN
countries
![Page 35: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/35.jpg)
Copyright 2012 © Limsoon Wong
35
Childhood ALL Treatment Cost
• Treatment for childhood ALL over 2 yrs
– Intermediate intensity: US$60k
– Low intensity: US$36k
– High intensity: US$72k
• Treatment for relapse: US$150k
• Cost for side-effects: Unquantified
![Page 36: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/36.jpg)
Copyright 2012 © Limsoon Wong
36
Current Situation
(2000 new cases/yr in ASEAN)
• Intermediate intensity
conventionally applied in
less advanced ASEAN
countries
• Over intensive for 50% of
patients, thus more side
effects
• Under intensive for 10% of
patients, thus more relapse
• US$120m (US$60k * 2000)
for intermediate intensity tx
• US$30m (US$150k * 2000 *
10%) for relapse tx
• Total US$150m/yr plus un-
quantified costs for dealing
with side effects
![Page 37: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/37.jpg)
Copyright 2012 © Limsoon Wong
37
Using Our Platform
• Low intensity applied to
50% of patients
• Intermediate intensity to
40% of patients
• High intensity to 10% of
patients
Reduced side effects
Reduced relapse
75-80% cure rates
• US$36m (US$36k * 2000 *
50%) for low intensity
• US$48m (US$60k * 2000 *
40%) for intermediate
intensity
• US$14.4m (US$72k * 2000 *
10%) for high intensity
• Total US$98.4m/yr
Save US$51.6m/yr
![Page 38: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/38.jpg)
Copyright 2012 © Limsoon Wong
38
A Nice Ending…
• Asian Innovation
Gold Award 2003
![Page 39: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/39.jpg)
Gene Expression Profile Clustering
Novel Disease Subtype Discovery
![Page 40: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/40.jpg)
Copyright 2012 © Limsoon Wong
40
Is there a new subtype?
• Hierarchical
clustering of
gene expression
profiles reveals a
novel subtype of
childhood ALL
Exercise: Name and describe
one bi-clustering method
![Page 41: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/41.jpg)
Copyright 2012 © Limsoon Wong
41
Hierarchical Clustering
• Assign each item to its own cluster
– If there are N items initially, we get N clusters,
each containing just one item
• Find the “most similar” pair of clusters, merge
them into a single cluster, so we now have one
less cluster
– ―Similarity‖ is often defined using
• Single linkage
• Complete linkage
• Average linkage
• Repeat previous step until all items are clustered
into a single cluster of size N
![Page 42: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/42.jpg)
Copyright 2012 © Limsoon Wong
42
Single, Complete, & Average Linkage
Single linkage defines distance
betw two clusters as min distance
betw them
Complete linkage defines distance
betw two clusters as max distance betw
them
Exercise: Give definition of “average linkage”
Image source: UCL Microcore Website
![Page 43: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/43.jpg)
Copyright 2012 © Limsoon Wong
43
Some Patient Samples
• Does Mr. A have cancer?
malign
malign
malign
malign
benign
benign
benign
benign
genes
sam
ple
s
??? Mr. A:
![Page 44: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/44.jpg)
Copyright 2012 © Limsoon Wong
44
Let’s rearrange the rows…
• Does Mr. A have cancer?
genes
sam
ple
s
malign
malign
malign
malign
benign
benign
benign
benign
??? Mr. A:
![Page 45: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/45.jpg)
Copyright 2012 © Limsoon Wong
45
and the columns too…
malign
malign
malign
malign
benign
benign
benign
benign
genes
sam
ple
s
??? Mr. A:
![Page 46: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/46.jpg)
Normalization
![Page 47: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/47.jpg)
Copyright 2012 © Limsoon Wong
47
Sometimes, a gene expression study
may involve batches of data collected
over a long period of time…
0
10
20
30
40
50
60
70
Jan-0
4
Apr-
04
Jul-04
Oct-
04
Jan-0
5
Apr-
05
Jul-05
Oct-
05
Jan-0
6
Apr-
06
Jul-06
Oct-
06
Jan-0
7
Apr-
07
Jul-07
Oct-
07
Jan-0
8
Apr-
08
Jul-08
Oct-
08
Jan-0
9
Apr-
09
Jul-09
Oct-
09
Jan-1
0
Time Span of Gene Expression Profiles
Image credit: Dong Difeng
![Page 48: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/48.jpg)
Copyright 2012 © Limsoon Wong
48
In such a case, batch effect may be
severe… to the extent that you can
predict the batch that each sample
comes!
Need normalization to correct for batch effect
Image credit: Dong Difeng
![Page 49: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/49.jpg)
Copyright 2012 © Limsoon Wong
49
Approaches to Normalization
• Aim of
normalization:
Reduce variance
w/o increasing bias
• Scaling method
– Intensities are scaled
so that each array
has same ave value
– E.g., Affymetrix’s
• Xform data so that
distribution of
probe intensities is
same on all arrays
– E.g., (x ) /
• Quantile
normalization
![Page 50: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/50.jpg)
Copyright 2012 © Limsoon Wong
50
Quantite Normalization
• Given n arrays of length p,
form X of size p × n where
each array is a column
• Sort each column of X to
give Xsort
• Take means across rows
of Xsort and assign this
mean to each elem in the
row to get X’sort
• Get Xnormalized by arranging
each column of X’sort to
have same ordering as X
• Implemented in some
microarray s/w, e.g.,
EXPANDER
![Page 51: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/51.jpg)
Copyright 2012 © Limsoon Wong
51
After quantile
normalization
![Page 52: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/52.jpg)
Selection of Patient Samples and Genes
for Disease Prognosis
![Page 53: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/53.jpg)
Copyright 2012 © Limsoon Wong
53
Gene Expression Profile
+ Clinical Data
Outcome Prediction
• Univariate & multivariate Cox survival analysis (Beer et al 2002, Rosenwald et al 2002)
• Fuzzy neural network (Ando et al 2002)
• Partial least squares regression (Park et al 2002)
• Weighted voting algorithm (Shipp et al 2002)
• Gene index and “reference gene” (LeBlanc et al 2003)
• ……
![Page 54: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/54.jpg)
Copyright 2012 © Limsoon Wong
54
Our Approach
“extreme”
sample
selection
ERCOF
![Page 55: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/55.jpg)
Copyright 2012 © Limsoon Wong
55
Short-term Survivors v.s. Long-term Survivors
T: sample
F(T): follow-up time
E(T): status (1:unfavorable; 0: favorable)
c1 and c2: thresholds of survival time
Short-term survivors
who died within a short
period
F(T) < c1 and E(T) = 1
Long-term survivors
who were alive after a
long follow-up time
F(T) > c2
Extreme Sample Selection
![Page 56: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/56.jpg)
Copyright 2012 © Limsoon Wong
56
ERCOF Entropy-
Based Rank
Sum Test &
Correlation
Filtering
Remove genes with
expression values w/o
cut point found (can’t
be discretized)
Calculate Wilcoxon rank
sum w(x) for gene x.
Remove gene x if w(x)
[clower, cupper]
Group features by
Pearson Correlation
For each group, retain
the top 50% wrt class
entropy
![Page 57: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/57.jpg)
Copyright 2012 © Limsoon Wong
57
Linear Kernel SVM regression function
bixTKyaTG i
i
i ))(,()(
T: test sample, x(i): support vector,
yi: class label (1: short-term survivors; -1: long-term survivors)
Transformation function (posterior probability)
)(1
1)(
TGe
TS
))1,0()(( TS
S(T): risk score of sample T
Risk Score Construction
![Page 58: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/58.jpg)
Copyright 2012 © Limsoon Wong
58
Diffuse Large B-Cell Lymphoma
• DLBC lymphoma is the
most common type of
lymphoma in adults
• Can be cured by
anthracycline-based
chemotherapy in 35 to 40
percent of patients
DLBC lymphoma
comprises several
diseases that differ in
responsiveness to
chemotherapy
• Intl Prognostic Index (IPI)
– age, ―Eastern Cooperative
Oncology Group‖ Performance
status, tumor stage, lactate
dehydrogenase level, sites of
extranodal disease, ...
• Not very good for stratifying
DLBC lymphoma patients for
therapeutic trials
Use gene-expression
profiles to predict outcome
of chemotherapy?
![Page 59: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/59.jpg)
Copyright 2012 © Limsoon Wong
59
Rosenwald et al., NEJM 2002
• 240 data samples
– 160 in preliminary group
– 80 in validation group
– each sample described by 7399 microarray
features
• Rosenwald et al.’s approach
– identify gene: Cox proportional-hazards model
– cluster identified genes into four gene signatures
– calculate for each sample an outcome-predictor
score
– divide patients into quartiles according to score
![Page 60: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/60.jpg)
Copyright 2012 © Limsoon Wong
60
Knowledge Discovery from Gene
Expression of ―Extreme‖ Samples
“extreme”
sample
selection:
< 1 yr vs > 8 yrs
knowledge
discovery
from gene
expression
240
samples
80
samples 26 long-
term survivors
47 short-
term survivors
7399
genes
84
genes
T is long-term if S(T) < 0.3
T is short-term if S(T) > 0.7
![Page 61: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/61.jpg)
Copyright 2012 © Limsoon Wong
61
73 25 47+1(*) Informative
160 72 88 Original DLBCL
Alive Dead
Total Status Data set Application
Number of samples in original data and selected informative training set.
(*): Number of samples whose corresponding patient was dead at the end
of follow-up time, but selected as a long-term survivor.
Discussions: Sample Selection
![Page 62: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/62.jpg)
Copyright 2012 © Limsoon Wong
62
84(1.7%) Phase II
132(2.7%) Phase I
4937(*) Original
DLBCL Gene selection
Number of genes left after feature filtering for each phase.
(*): number of genes after removing those genes who were
absent in more than 10% of the experiments.
Discussions: Gene Identification
![Page 63: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/63.jpg)
Copyright 2012 © Limsoon Wong
63
p-value of log-rank test: < 0.0001
Risk score thresholds: 0.7, 0.3
Kaplan-Meier Plot for 80 Test Cases
![Page 64: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/64.jpg)
Copyright 2012 © Limsoon Wong
64
(A) IPI low,
p-value = 0.0063
(B) IPI intermediate,
p-value = 0.0003
Improvement Over IPI
![Page 65: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/65.jpg)
Copyright 2012 © Limsoon Wong
65
(A) W/o sample selection (p =0.38) (B) With sample selection (p=0.009)
No clear difference on the overall survival of the 80 samples in the validation
group of DLBCL study, if no training sample selection conducted
Merit of ―Extreme‖ Samples
![Page 66: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/66.jpg)
Copyright 2012 © Limsoon Wong
66
About the Inventor: Huiqing Liu
• Huiqing Liu
– PhD, NUS, 2004
– Currently Senior
Scientist at Centocor
– Asian Innovation
Gold Award 2003
– New Jersey Cancer
Research Award for
Scientific Excellence
2008
– Gallo Prize 2008
![Page 67: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/67.jpg)
Beyond Disease Diagnosis & Prognosis
![Page 68: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/68.jpg)
Copyright 2012 © Limsoon Wong
68
Beyond Classification of
Gene Expression Profiles
• After identifying the candidate genes by feature
selection, do we know which ones are causal
genes, which ones are surrogates, and which are
noise? Diagnostic ALL BM samples (n=327)
3 -3 -2 -1 0 1 2 = std deviation from mean
Ge
nes
fo
r c
las
s
dis
tin
cti
on
(n
=2
71
)
TEL-AML1 BCR-
ABL
Hyperdiploid >50 E2A-
PBX1
MLL T-ALL Novel
![Page 69: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/69.jpg)
Copyright 2012 © Limsoon Wong
69
Gene Regulatory Circuits
• Genes are “connected”
in “circuit” or network
• Expr of a gene in a
network depends on
expr of some other
genes in the network
• Can we “reconstruct”
the gene network from
gene expression and
other data? Source: Miltenyi Biotec
![Page 70: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/70.jpg)
Copyright 2012 © Limsoon Wong
70
• Each disease subtype has underlying cause
There is a unifying biological theme for genes
that are truly associated with a disease subtype.
• Uncertainty in reliability of selected genes can be
reduced by considering molecular functions and
biological processes associated with the genes
• The unifying biological theme is basis for
inferring the underlying cause of disease subtype
Hints to extend reach of prediction
![Page 71: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/71.jpg)
Copyright 2012 © Limsoon Wong
71
Intersection Analysis
• Intersect the list of
differentially expressed
genes with a list of genes
on a pathway
• If intersection is
significant, the pathway is
postulated as basis of
disease subtype or
treatment response
Caution:
• Initial list of differentially
expressed genes is
defined using test
statistics with arbitrary
thresholds
• Diff test statistics and diff
thresholds result in a diff
list of differentially
expressed genes
Outcome may be unstable Exercise: What is a good test
statistics to determine if the
intersection is significant?
![Page 72: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/72.jpg)
Gene Interaction Prediction
![Page 73: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/73.jpg)
Copyright 2012 © Limsoon Wong
73
Beyond Classification of
Gene Expression Profiles
• After identifying the candidate genes by feature
selection, do we know which ones are causal
genes and which ones are surrogates?
Diagnostic ALL BM samples (n=327)
3 -3 -2 -1 0 1 2 = std deviation from mean
Ge
nes
fo
r c
las
s
dis
tin
cti
on
(n
=2
71
)
TEL-AML1 BCR-
ABL
Hyperdiploid >50 E2A-
PBX1
MLL T-ALL Novel
![Page 74: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/74.jpg)
Copyright 2012 © Limsoon Wong
74
Gene Regulatory Circuits
• Genes are “connected” in
“circuit” or network
• Expression of a gene in a
network depends on
expression of some other
genes in the network
• Can we reconstruct the
gene network from gene
expression data?
![Page 75: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/75.jpg)
Copyright 2012 © Limsoon Wong
75
Key Questions
• For each gene in the network:
• Which genes affect it?
• How they affect it?
– Positively?
– Negatively?
– More complicated ways?
![Page 76: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/76.jpg)
Copyright 2012 © Limsoon Wong
76
Some Techniques
• Bayesian Networks
– Friedman et al., JCB 7:601--620, 2000
• Boolean Networks
– Akutsu et al., PSB 2000, pages 293--304
• Differential equations
– Chen et al., PSB 1999, pages 29--40
• Classification-based method
– Soinov et al., ―Towards reconstruction of gene
network from expression data by supervised
learning‖, Genome Biology 4:R6.1--9, 2003
![Page 77: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/77.jpg)
Copyright 2012 © Limsoon Wong
77
A Classification-Based Technique Soinov et al., Genome Biology 4:R6.1-9, Jan 2003
• Given a gene expression matrix X
– each row is a gene
– each column is a sample
– each element xij is expression of gene i in sample j
• Find the average value ai of each gene i
• Denote sij as state of gene i in sample j,
– sij = up if xij > ai
– sij = down if xij ai
![Page 78: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/78.jpg)
Copyright 2012 © Limsoon Wong
78
A Classification-Based Technique Soinov et al., Genome Biology 4:R6.1-9, Jan 2003
• To see whether the state of
gene g is determined by
the state of other genes
– see whether sij | i g
can predict sgj
– if can predict with high
accuracy, then ―yes‖
– Any classifier can be
used, such as C4.5, PCL,
SVM, etc.
• To see how the state of
gene g is determined by
the state of other genes
– apply C4.5 (or PCL or
other ―rule-based‖
classifiers) to predict sgj
from sij | i g
– and extract the decision
tree or rules used
![Page 79: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/79.jpg)
Copyright 2012 © Limsoon Wong
79
Advantages of this method
• Can identify genes affecting a target gene
• Don’t need discretization thresholds
• Each data sample is treated as an example
• Explicit rules can be extracted from the classifier
(assuming C4.5 or PCL)
• Generalizable to time series
![Page 80: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/80.jpg)
Concluding Remarks
![Page 81: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/81.jpg)
Copyright 2012 © Limsoon Wong
81
Bcr-Abl
• Targeted drug dev
– Know what
molecular effect you
want to achieve
• E.g., inhibit a
mutated form of a
protein
– Engineer a
compound that
directly binds and
causes the desired
effect
• Gleevec (imatinib)
– 1st success for real drug
– Targets Bcr-Abl fusion
protein (ie, Philadelphia
chromosome, Ph)
– NCI summary of clinical
trial of imatinib for ALL
at
http://www.cancer.gov/clini
caltrials/results/ALLimati
nib1109/print
![Page 82: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/82.jpg)
Copyright 2012 © Limsoon Wong
82
What have we learned?
• Technologies
– Microarray
– PCL, ERCOF
• Microarray applications
– Disease diagnosis by supervised learning
– Subtype discovery by unsupervised learning
• Important tactics
– Extreme sample selection
– Intersection analysis, Gene network reconstruction
![Page 83: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/83.jpg)
Copyright 2012 © Limsoon Wong
83
Useful Gene Expression
Analysis Packages
• EXPANDER (EXPression Analyser & DisplayER)
– Ron Shamir @ Tel Aviv University
– http://acgt.cs.tau.ac.il/expander
• BRB-Array Tools
– Biometric Research Branch @ National Cancer
Center, USA
– http://linus.nci.nih.gov/BRB-ArrayTools.html
![Page 84: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/84.jpg)
Any Question?
![Page 85: For written notes on this lecture, please read chapter 14 ...wongls/courses/cs2220/2012/Lect4_2220_gene...For written notes on this lecture, please read chapter 14 of The Practical](https://reader031.vdocument.in/reader031/viewer/2022030620/5ae6a0bd7f8b9a29048dc213/html5/thumbnails/85.jpg)
Copyright 2012 © Limsoon Wong
85
References
• E.-J. Yeoh et al., ―Classification, subtype discovery, and
prediction of outcome in pediatric acute lymphoblastic leukemia
by gene expression profiling‖, Cancer Cell, 1:133--143, 2002
• H. Liu, J. Li, L. Wong. Use of Extreme Patient Samples for
Outcome Prediction from Gene Expression Data. Bioinformatics,
21(16):3377--3384, 2005.
• L.D. Miller et al., ―Optimal gene expression analysis by
microarrays‖, Cancer Cell 2:353--361, 2002
• J. Li, L. Wong, ―Techniques for Analysis of Gene Expression‖,
The Practical Bioinformatician, Chapter 14, pages 319—346,
WSPC, 2004
• B. Bolstad et al. ―A comparison of normalization methods for
high density oligonucleotide array data based on variance and
bias‖. Bioinformatics, 19:185–193. 2003