identification of genetic interactions using

31
Identification of genetic interactions using computational homology Javier Arsuaga Mathematics Molecular and Cellular Biology University of California, Davis

Upload: others

Post on 11-Jul-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identification of genetic interactions using

Identification of genetic interactions using

computational homology

Javier ArsuagaMathematics Molecular and Cellular BiologyUniversity of California, Davis

Page 2: Identification of genetic interactions using

Identification of genetic interactions using

computational homology

Javier ArsuagaMathematics Molecular and Cellular BiologyUniversity of California, Davis

V. Nanda

Page 3: Identification of genetic interactions using

Topological Molecular Biology Lab

Page 4: Identification of genetic interactions using

In cancer the structure of the genome can be heavily disrupted

Page 5: Identification of genetic interactions using

Amplifications and Deletions across the entire genome can be detected using array CGH

Page 6: Identification of genetic interactions using

Topological Analysis of array CGH

(TAaCGH)

DeWoskin et al. 2009DeWoskin et al. 2010Arsuaga et al. 2012Arsuaga et al. 2015

Page 7: Identification of genetic interactions using

We analyzed four breast cancer subtypes

Page 8: Identification of genetic interactions using

Clin Cancer Res; 16(2) January 15, 2010 663

Page 9: Identification of genetic interactions using

Our method identified the region of ERBB2 (Her2+ shown in blue)

●●

●●

●●

● ●●●●

●●●

●●●

●●

●●

●●●

●●●

●●●●●

●●● ●

●●●

●●●

●●●

●● ●

●● ●● ● ● ●

●●

3e+07 4e+07 5e+07 6e+07 7e+07 8e+07

−0.5

0.5

1.5

Patient X208 on 17q

● ●●

●● ●●

● ●● ●

●●

●●

●●●

● ●●●●●

●●

●●●

●●

●●●

● ●●

●●

●●

●●●●

●● ● ●

● ●●●●

● ● ● ● ●●●

3e+07 4e+07 5e+07 6e+07 7e+07 8e+07

−0.5

0.5

1.5

Patient X308 on 17qd)##

Page 10: Identification of genetic interactions using

Examples of ERBB2/Her2 patient profiles

●●

●●

●●

● ●●●●

●●●

●●●

●●

●●

●●●

●●●

●●●●●

●●● ●

●●●

●●●

●●●

●● ●

●● ●● ● ● ●

●●

3e+07 4e+07 5e+07 6e+07 7e+07 8e+07

−0.5

0.5

1.5

Patient X208 on 17q

● ●●

●● ●●

● ●● ●

●●

●●

●●●

● ●●●●●

●●

●●●

●●

●●●

● ●●

●●

●●

●●●●

●● ● ●

● ●●●●

● ● ● ● ●●●

3e+07 4e+07 5e+07 6e+07 7e+07 8e+07

−0.5

0.5

1.5

Patient X308 on 17q

●●

●●

●●

● ●●●●

●●●

●●●

●●

●●

●●●

●●●

●●●●●

●●● ●

●●●

●●●

●●●

●● ●

●● ●● ● ● ●

●●

3e+07 4e+07 5e+07 6e+07 7e+07 8e+07

−0.5

0.5

1.5

Patient X208 on 17q

● ●●

●● ●●

● ●● ●

●●

●●

●●●

● ●●●●●

●●

●●●

●●

●●●

● ●●

●●

●●

●●●●

●● ● ●

● ●●●●

● ● ● ● ●●●

3e+07 4e+07 5e+07 6e+07 7e+07 8e+07

−0.5

0.5

1.5

Patient X308 on 17qERBB2

Page 11: Identification of genetic interactions using

●●●

●●●

●●●●

●●●●●●

●●●●●

●●

●●

●●

●●●●●

● ●●

●●●●●●●

●●●

●●

●●

●●●

● ●●●●

●●

●●●●

●●●●

●●●●●●●

●●

●●

6.0e+07 8.0e+07 1.0e+08 1.2e+08

−0.5

0.5

1.0

1.5

Patient X167 on 11q

●●●

●●

●●●●●●●●

●●●●●●

●●●●●●●

●●●●

●●

●●●

●●●

●●

●●●●●

●●●●●

●●●

●●●●

●●●● ●●

●●●

●●●●●●●●●

●●●●●●●●

●●●●●●

6.0e+07 8.0e+07 1.0e+08 1.2e+08

−0.5

0.5

1.0

1.5

Patient X220 on 11q

Luminal A: An amplification at the site of theProgesterone Receptor gene

Ion channel

Progesterone receptor

Page 12: Identification of genetic interactions using

Regions detected in Basal Patients

Luminal B!

Luminal A! Her2!

TAaCGH Horlings DB!

Centers of Mass Horlings DB!

TAaCGH Bergamaschi DB!

Centers of Mass Bergamaschi DB!

Horlings Paper!

Basal!

TAaCGH Horlings DB!

Centers of Mass Horlings DB!

TAaCGH Bergamaschi DB!

Centers of Mass Bergamaschi DB!

Reported by!Horlings et al!

Page 13: Identification of genetic interactions using

Gain in 2p

●●●●

●●●●●

●●

●●●●●●●

●●

●●●●●●

●●

●● ●●●

●●

● ●●

●●●

●●●●●●

● ● ●●●

●●

●●● ●●

●●●● ●

●●●

●●

●●● ●● ●

0e+00 2e+07 4e+07 6e+07 8e+07

−0.5

0.5

1.5

Patient X324 on 2p

●● ●

●●●●●●

●●●●●

●● ● ● ●●●

●●●●

●●

●●

●●

●● ●●

●●

●●● ●● ●● ●

●●● ● ●

●●●

● ●●●

●●●●●● ●●

●●●●

●●●●

● ●

0e+00 2e+07 4e+07 6e+07 8e+07

−0.5

0.5

1.5

Patient X330 on 2p

Page 14: Identification of genetic interactions using

results from Arsuaga et al 2015

Page 15: Identification of genetic interactions using
Page 16: Identification of genetic interactions using

TAaCGH is a method within statistical genetics: topological genetics

Phenotype Markers: zj

1

0

A

A

B

A

B

A

.

.

.

.

.

.

Phenotype

1

0

.

.

....

Page 17: Identification of genetic interactions using

Can we include genetic interactions?

Phenotype Markers: zj

1

0

A

A

B

A

B

A

.

.

.

.

.

.

Phenotype

1

0

.

.

....

Page 18: Identification of genetic interactions using

Hypothesis: Genetic interactions in the form of co-amplifications/deletions can be detected by β1

Page 19: Identification of genetic interactions using

Computer simulations suggest that β1

curves can detect co-occurring copy number changes

Page 20: Identification of genetic interactions using

} 60% probes are in 8p/11q

} Test set: 9 patients contain the co-amplification (by inspection),

} Control set: no aberration set was generated artificially by mixing data

} Significant p=0.045

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●

●●●●

●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

0.0 0.5 1.0 1.5

01

23

4

Epsilon

B1

●●

●●

●●●●●

●●

●●●●●●●●●●●●●●

●●

●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

B1 curves for both8p11q (9 blue) vs Non−both8p11q (5 red) on 8p_11q.s1 in 2D for kwek8p11q data

Page 21: Identification of genetic interactions using

In order to detect co-occurring we need to be able to compute the cycles

Page 22: Identification of genetic interactions using

β1 also detects single amplifications

Page 23: Identification of genetic interactions using

Peaks detected do not necessarily persist

Page 24: Identification of genetic interactions using

Patterns may change during filtration

Page 25: Identification of genetic interactions using

Proposed statistical method for inverse problem

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040

0.045

12.4

14.6

15.6

17.3

18.4

19.3

20.6

22.0

23.3

24.5

24.7

25.2

27.4

27.7

27.9

29.0

29.6

29.7

32.2

32.5

Avg$Cum$W

idth

Mbp

Basal01p36.222p35.1

Control Diff:0Test2Control

Computational Homology of Breast Cancer 11

A

B

0.0

0.5

1.0

1.5

0 10 20 30 40

A

0.0

0.5

1.0

1.5

0 10 20 30 40

B

A

B

Fig. 5. Correspondence between CGH probes and generators Di↵erent valuesof the filtration parameter detects di↵erent generators which corresponds to di↵erentprobes in the genome. Panel A shows the profile of one patient and its associated pointcloud. The probes highlighted in blue correspond to the vertices of the single generator,also in blue. The filtration coe�cient was ✏ = 0.78. Panel B shows the same patientand point cloud for a di↵erent value of the filtration coe�cient ✏ = 0.83

the bottom ones the histograms for the Climent data set. The histograms on theleft are the control and the ones on the right correspond to the ERBB2+. Themost remarkable feature is the di↵erence between the control and the ERBB2data sets. While the control show no significant concentration of the probes thatbelong to cycles the ERBB2+ clearly show three regions of interest. 17q12 has asignificant concentration of cycle elements and corresponds to the position of thegene ERBB2. Two regions extend beyond the position of ERBB2 The first oneis in the boundary between 17q21.2 and 17q21.31. The Horlings data set suggeststhat the region of interest is more localized in 17q21.31 while the Climent dataset suggest a region contained in 17q21.2. The last region is located at 17q21.33and is common to both studies.

Since our simulations show that the first homology group can also identifysingle amplifications one may argue that the found amplifications correspond tosingle independent events. To address this problem we analyzed the distribution

Computational Homology of Breast Cancer 11

A

B

0.0

0.5

1.0

1.5

0 10 20 30 40

A

0.0

0.5

1.0

1.5

0 10 20 30 40

B

A

B

Fig. 5. Correspondence between CGH probes and generators Di↵erent valuesof the filtration parameter detects di↵erent generators which corresponds to di↵erentprobes in the genome. Panel A shows the profile of one patient and its associated pointcloud. The probes highlighted in blue correspond to the vertices of the single generator,also in blue. The filtration coe�cient was ✏ = 0.78. Panel B shows the same patientand point cloud for a di↵erent value of the filtration coe�cient ✏ = 0.83

the bottom ones the histograms for the Climent data set. The histograms on theleft are the control and the ones on the right correspond to the ERBB2+. Themost remarkable feature is the di↵erence between the control and the ERBB2data sets. While the control show no significant concentration of the probes thatbelong to cycles the ERBB2+ clearly show three regions of interest. 17q12 has asignificant concentration of cycle elements and corresponds to the position of thegene ERBB2. Two regions extend beyond the position of ERBB2 The first oneis in the boundary between 17q21.2 and 17q21.31. The Horlings data set suggeststhat the region of interest is more localized in 17q21.31 while the Climent dataset suggest a region contained in 17q21.2. The last region is located at 17q21.33and is common to both studies.

Since our simulations show that the first homology group can also identifysingle amplifications one may argue that the found amplifications correspond tosingle independent events. To address this problem we analyzed the distribution

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●

●●●●

●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

0.0 0.5 1.0 1.5

01

23

4

Epsilon

B1

●●

●●

●●●●●

●●

●●●●●●●●●●●●●●

●●

●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

B1 curves for both8p11q (9 blue) vs Non−both8p11q (5 red) on 8p_11q.s1 in 2D for kwek8p11q data

Page 26: Identification of genetic interactions using

• Her2+: 14 vs Others: 52• Her2+: 14 vs Others: 52

Clin Cancer Res; 16(2) January 15, 2010 663

Page 27: Identification of genetic interactions using

Co-amplifications detected in 17q across three different data sets

PHBJUN, CDK4,SLUG,WNT,TOP2

ERBB2

Ardanza et al. 2016

Page 28: Identification of genetic interactions using

Generators are the product of co-amplifications not of single CNAs

Life

of C

ycle

Cycles dispersed over the entire profile showing multiple co-ocurrences

Life

of C

ycle

Cycles dispersed over the entire profile showing multiple co-ocurrences

Life

of C

ycle

Cycles dispersed over the entire profile showing multiple co-ocurrences

Life

of C

ycle

Cycles dispersed over the entire profile showing multiple co-ocurrences

18 S. Ardanza-Trevijano et al.

Patient 20 Patient 26

Patient 53 Patient 66

0 10 20 30 40 0 10 20 30 40

genindex123456

Fig. 7. Distribution of cycles in CGH profiles Each plate corresponds to the CGHprofile of a patient and how the vertices of the cycles are mapped back to the profile.Di↵erent colors indicate di↵erent cycles and do not represent the same cycle in eachplate. The height of the bars represent the life of the cycle.

Page 29: Identification of genetic interactions using

TAaCGH suggests an interaction in 4q for basals

Page 30: Identification of genetic interactions using

Conclusions and future research} We are expanding TAaCGH to identify genetic interactions

} Genetic interactions are in the form of co-occurring copy number changes and/or the finer structure of copy number changes.

} In the ERBB2+ subtype we find co-expression of different regions of 17q. In Basal we find co-expression in 4q

} Next:

} Identify whether gene expression is regulated by these profiles

} Generalize to other situations in statistical genetics: topological genetics?

Page 31: Identification of genetic interactions using

Thank you