using functional genomic units to corroborate user...

32
Using Functional Genomic Units to Corroborate User Experiments with the Rosetta Compendium Duke Bioinformatics Shared Resource Duke University Medical Center Simon M. Lin* Patrick McConnell* Department of Electronic Engineering Duke University Xuejun Liao* Lawrence Carin Department of Cardiology Duke University Medical Center Korkut Vita* Pascal Goldschmidt (* Authors contributed equally to the work)

Upload: others

Post on 17-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Using Functional Genomic Unitsto Corroborate User Experimentswith the Rosetta Compendium

Duke Bioinformatics Shared ResourceDuke University Medical Center

Simon M. Lin*Patrick McConnell*

Department of Electronic EngineeringDuke UniversityXuejun Liao*

Lawrence Carin

Department of CardiologyDuke University Medical Center

Korkut Vita*Pascal Goldschmidt

(* Authors contributed equally to the work)

Page 2: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Contributions

� Can we use biological knowledge in exploratory data analysis?�Context-sensitive Clustering

�Designed a Java Application

� Can we computationally find the coordinated gene groups? Canwe use them to simplify our analysis?

�Functional Genomic Units (will be available to academic groups)

�Utilized an ICA Implementation in MatLab

� Can we use Rosetta data to explain our own experiments?�Conducted an Affymetrix measurement of RacC/A yeast strain

�Explained results from different Labs/Instrumentation setups

Page 3: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Knowledge Should not beIgnored in the MicroarrayAnalysis Process

Scientist

Data

Knowledge Publication

Experiment

Page 4: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Context-driven Clustering� Clustering is unsupervised learning. No

previous knowledge is necessary.

� Even with its exploratory nature, it stilldepends on your point of view.

� You previous knowledge will help you onfeature selection.

Page 5: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Why clustering should be done ina given context (a Toy Example)

Features

Obj ects

3400290422000Person1

0300180432900Person2

……………………

0500380712500Person10000

# ofclaims:autoaccident

Autoinsurancepremium

# of carsin thehousehold

BloodPressure

Fiberintake

Saltintake

Calorieintake

3400290422000Person1

0300180432900Person2

……………………

0500380712500Person10000

# ofclaims:autoaccident

Autoinsurancepremium

# of carsin thehousehold

BloodPressure

Fiberintake

Saltintake

Calorieintake

Page 6: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Same is true for genomics

2…422000Experiment 1

1…432900Experiment 2

………………

3…712500Experiment 300

Gene 10000

…Gene3

Gene2

Gene1

Features

Obj ects

Page 7: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

“Kitchen-sink” Clustering

Page 8: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Genomic Knowledge Organizedin a Tree Structure

Page 9: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Integrated withthe ExpressionBrowser forClustering

Page 10: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Clustering in the lipid-metabolismContext

Page 11: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Independent Component Analysis (ICA) of the GeneExpression Profiles

• Statement of the ICA problem

Axy =

y - the observed random vector of N components

x - is a random vector with M independent components (IC)

A - mixing matrix

Q - separating matrix

• ICA Signal Model

• Objective

Find Q and A such that the components in x are asindependent as possible

or Qyx =

Page 12: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• ICA Solution of the Blind Source Separation Problem

--- An Illustration

Page 13: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• ICA Model of the Microphone Array Signals

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

• Audio signals of two independent speakers

Speaker 1

Speaker 2

0 1 0 2 0 3 0 4 0 5 0 6 0- 1

- 0 . 5

0

0 . 5

0 1 0 2 0 3 0 4 0 5 0 6 0- 0 . 5

0

0 . 5

0 1 0 2 0 3 0 4 0 5 0 6 0- 0 . 2

0

0 . 2

Time indices

• Mixed audio signals received at the microphone array

Microphonearray

Page 14: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Original signals

ICA signals

PCA signals

Time indices

• Extraction of the two speakers’ audio signals via ICA and PCA

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

1 0 2 0 3 0 4 0 5 0 6 0

- 1

- 0 . 5

0

0 . 5

1

Page 15: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• ICA Model of the DNA Microarray (Gene Expression) Profiles

Functional Event 1eg, Cell Proliferation

Functional Event 2eg, Detoxification

• Gene expression profiles versus experiment (condition) received at the DNA microarray

• Expression versus experiment (condition) measurements of two mutually independentFunctional Events Experiments (conditions)

Gen

es

DNA Microarray

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 10 11 12-0.4-0.200.20.40.60.8

1 2 3 4 5 6 7 8 9 10 11 12012

1 2 3 4 5 6 7 8 9 10 11 12-1.5-1-0.500.5

1 2 3 4 5 6 7 8 9 10 11 12-0.1

-0.050

0.05

1 2 3 4 5 6 7 8 9 10 11 12-0.200.20.40.6

1 2 3 4 5 6 7 8 9 10 11 12-1.5

-1-0.50

1 2 3 4 5 6 7 8 9 10 11 12-1.5

-1-0.5

00.5

Page 16: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• Extraction of the two mutually independent Functional Events via ICA and PCA

Original events

Functionaleventsrecoveredfrom ICA

Functionaleventsrecoveredfrom PCA

Experiments (conditions)

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

- 1

- 0 . 5

0

0 . 5

1

Page 17: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• ICA Model of the DNA Microarray (Gene Expression) Profiles

= ×Expression

Profile of

Genes

Experiment indexIndependentComponents

ExpressionProfiles ofFunctionalUnits

Experiment indexElement (i, j)DenotesFuzzyMembershipof Gene iBelongingtoFunctionalunit j

Expression

Profile of Genes=

Memberships of Genes

Belonging to Function Units×

ExpressionProfile ofFunctionalUnits

GenomicFunctionalUnits

Representing Impactsof Experiments interms of Genes

Representing Impacts ofExperiments in terms ofGenomic Functional Units

xAy ⋅= ICA Model

or

Page 18: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• Definition of a Genomic Functional Unit

0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 00

0 .0 2

0 .0 4

0 .0 6

0 .0 8

0 .1

0 .1 2M e m b e r s h i p fu n c ti o n o f G e n o m i c F u n c t i o n a l U n i t # 6 9

YFL026W'

YFL053W

YML007W

YLR307W

YAL067C

DR218CYIL037C

YLR296W

YPL121C

YPR116W

Fuzzy membership function of Unit # 69, which is responsible for oxidative stress response

A Genomic Functional Unit is a fuzzy set defined on the genes in consideration.

It generally contains genes that work together to accomplish a certain biological function

Page 19: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• Principles of the Independent Component Analysis Algorithm

1. Measure of statistical independence

• Mutual information

• Original definitionA random vector x has independent components xi if

∏=

=N

iixx upp

i1

)()(u

joint pdf marginal pdf

Kullback-Leibler distance between joint pdf and marginal pdf

� ∏∏ ===

uu

u dup

ppppkldpI

iix

xx

N

ixxx

i

i )(

)(ln)(),()(

1

Page 20: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Differential entropy of x

�−= uuu dpppS xxx )(ln)()(

Negentropy of x

)()()( xyx pSSpJ −= φφY(u) - Gaussian distribution with equal covariance matrix to px(u)

J(px) ≥ 0 with equality iff px(u ) = φY(u ). This is so because Gaussiandistribution has the largest entropy among the pdf’s having a givencovariance matrix

IMPRTANT: J(px) is invariant under general invertible linear transforms

because AxAx detln)()( += pSpS

cancel out in J(px)

Page 21: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

V

VpJpJpI ii

N

ixxx i det

ln2

1)()()(

1

∏� +−==

Proof.

])()([)(

)(ln)(

])(ln)()([)(ln)()(

)]()([)()()()(1

�� ∏

� ��

��

−+=

−−+=

−−−=−=

ixx

iix

xx

iixxxxxx

ixxxx

N

ixx

i

i

ii

iii

SSdup

pp

duppSdppS

pSSpSSpJpJ

φφ

φφ

φφ

uu

u

uuuuu

∏=

iiiV

Vdetln

2

1

)( xpIVeS n

x det)2ln(2

1)( πφ =

• Representation of mutual information using the negentropy

Page 22: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

2. Basic Principals of the ICA algorithms

V

VpJpJpI ii

N

ixxx i det

ln2

1 )( )()(

1

∏� +−==

J(px) is invariant undergeneral invertiblelinear transforms

To bemaximized

Cancel out via standardization,which transforms x to with aunitary covariance matrix

x~

3 Examples of practical criterions of statistical independence

V

VpJpI ii

N

iiiiiiiiiiiiiiiiiixx det

ln21

)487

81

481

121

( )()(1

4222 ∏� ++−+−=

=

κκκκκ

),,,( iiiiiii xxxxcum=κwhere

3.1 Criterion based on approximation of negentropy

To be maximized

Page 23: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

3.2 Simple criterions based on cumulants

�QyzQ ==�

=

of cumulants the with)(1

2

'

KKN

iiiisir

�ψ

• ICA Results of the Rosetta Compendium Data Set

Rosetta Data Set --- Expression profiles of genes in 300 experiments

Page 24: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

ICA results I:Expression profiles of Independent Components (functional units) in 300experiments

Experiment indices

Ind

epe

nden

t Co

mp

one

nt in

dice

s

50 100 150 200 250 300

50

100

150

200

250

Page 25: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

PCA results for comparison:Expression profiles of the Principal Components in 300 experiments

Experiment indices

Prin

cipa

l Co

mpo

nent

indi

ces

50 100 150 200 250 300

50

100

150

200

250

-40

-30

-20

-10

0

10

Page 26: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• Functional Genomic Unit #6:

� Six of these genes are coding for isoforms of α-glucosidase (MAL62, MAL32, MAL12, FSP2, YIL172c, and YJL216c)

� Four of the genes are directly associated with cell-wallsynthesis and sporulation (sporulation specific homolog of csd4 (YER096w),sporulation specific cell wall maturation protein (YHR139c), first enzyme in dityrosine synthesis inthe outer layer of the spore wall pathway, converting L-tyrosine to N-formyl-L-tyrosine,(YDR403w), and Cell wall mannoprotein (YJR150c)).

� Five genes are involved in the glucose metabolism (glucoserepression regulatory protein-exhibits similarity to beta subunits of G proteins (TUP1), Highaffinity hexose transporter (YDL245c), High affinity hexose transporter (YEL069c), Hexosetransporter (YNR072w), and Hexose Transporter (YJR158w)).

ICA results II:Discovery or corroboration of genes’ functional unit

Page 27: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

• Definition of a Genomic Functional Unit

0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 00

0 .0 2

0 .0 4

0 .0 6

0 .0 8

0 .1

0 .1 2M e m b e r s h i p fu n c ti o n o f G e n o m i c F u n c t i o n a l U n i t # 6 9

YFL026W'

YFL053W

YML007W

YLR307W

YAL067C

DR218CYIL037C

YLR296W

YPL121C

YPR116W

Fuzzy membership function of Unit # 69, which is responsible for oxidative stress response

A Genomic Functional Unit is a fuzzy set defined on the genes in consideration.

It generally contains genes that work together to accomplish a certain biological function

Page 28: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

CCAGAAGTTGA 1 319 1CAAAAAGGTGT 1 647 0CCTGAAGTTGT 3 47 1CAAAAAGGTCA 3 362 1CCGGAAGGGGT 3 440 0CAGGAAGGTGA 4 81 1CAGGAAGTTGA 4 121 1CACAAAGGTGA 6 69 0CCTGAAGGTCA 7 169 0CCTGAAGGTTT 7 188 1** ********

Common 5’ -UTR

Page 29: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Concept of “Functional Genomic Unit”

� The set of gene found here is different fromthe “Pathways” in the traditional sense.

� Mathematical Point of View: LatentVariables constructed by IndependentComponent Analysis

� Biological Point of View: Coordinatedgenes to achieve a certain goal

Page 30: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Using FGU to explain userexperiments

FGUs

Ex p

e rim

ent s

Page 31: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Putative signal transduction pathways

Ras

Raf MKK4

JNK

ERK

MKK3/6

P38

c-fos c-jun

AP-1 REJun1/jun2

UVCytokine ReceptorsGrowth Factor receptors

Rac / CDC42

MEK

Rac / CDC42

NADPH Oxidase

ROS

+ +

c-jun ATF2C-fos promoter

Page 32: Using Functional Genomic Units to Corroborate User ...people.ee.duke.edu/~xjliao/talk/CAMDA2001_ICA_Simon.pdf• Functional Genomic Unit #6: Six of these genes are coding for isoforms

Summary of Findings� Incorporated biological knowledge in

exploratory data analysis

� Utilized ICA to model the yeast functionalgenomics behavior

� Proposed Functional Genomics Units

� Demonstrated the potentials of the“Compendium” approach