fehrman nat gen 2014 - journal club

38
Fehrman et al, Nat Gen 2014. Gene Expression analysis identifies global gene dosage sensitivity in cancer Giovanni JC 30 March 2015

Upload: giovanni-dallolio

Post on 12-Feb-2017

676 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Fehrman Nat Gen 2014 - Journal Club

Fehrman et al, Nat Gen 2014. Gene Expression analysis

identifies global gene dosage sensitivity in cancer

Giovanni JC 30 March 2015

Page 2: Fehrman Nat Gen 2014 - Journal Club

What is a PCA?

• PCA is a technique to reduce a dataset with 3+ variables to two or few dimensions

• Examples: – a dataset of

individuals age, height, weight, etc..

– a dataset of gene expression

height

age

weight

Page 3: Fehrman Nat Gen 2014 - Journal Club

heig

ht

age

weight

What is a PCA projection?

• In a PCA we rotate a 3+ dimensional plane, trying to find the best “projection” for observing separation between data points

• Implementation:– Find a line (PC1) that

separates the dataset in two groups, explaining most of the variance

– Find a second line (PC2) orthogonal to the first, to explain most of the remaining variance

PC1

PC2

Page 4: Fehrman Nat Gen 2014 - Journal Club

Variance explained by each PC

Page 5: Fehrman Nat Gen 2014 - Journal Club

PC coefficients• The PCA will produce a new set of data

“axes”, called Principal Components (PC)• Each PC is a combination of the original

variables, multiplied by a coefficient

Expression gene 1

Expression gene 2

Expression gene 3

Expression gene 4

Expression gene 5

PC1 PC2 PC3 PC4

* 5.4 * 3.2 *-0.4 * 0.0 *-0.2

Eigenvectorcoefficient

Page 6: Fehrman Nat Gen 2014 - Journal Club

Interpreting each PC

● Depending on which variables contribute to a PC, we can give a biological interpretation– If weight and height contribute to PC1 while

age does not, then PC1 describes the “size” of the individual

● In gene expression, PCs can represent a set of genes expressed in the same transcription profile– Thus we rename PCs as Transcriptional Components

(TCs)

Page 7: Fehrman Nat Gen 2014 - Journal Club

Gene Expression Dataset

• Expression data from Gene Expression Omnibus (Affymetrix, 4 datasets)

• Quality Control: – a PCA is applied to each dataset, obtaining a PC explaining

80-90% of data variance– This PC can be interpreted as probe- or platform- specific

variance, independently on the sample– All the samples that have a correlation <0.75 with this PC are

removed, as they are considered low quality

• Final dataset:– Human small: 17,309 samples– Human large: 32,427–Mouse: 17,081 – Rat: 6,023

Page 8: Fehrman Nat Gen 2014 - Journal Club

Copy Number data and samples annotation

• 470 tumor samples with array CGH data (Agilent), analyzed with DNACopy

– 51 ERBB2-amplified breast cancer, 173 inflammatory breast cancer, 246 multiple myeloma

• Sample annotation: text-mining to determine cancer/cell line/normal samples

Page 9: Fehrman Nat Gen 2014 - Journal Club

Number of probes and genes

Page 10: Fehrman Nat Gen 2014 - Journal Club

Datasets for Gene Set Enrichment Analysis

Page 11: Fehrman Nat Gen 2014 - Journal Club

PCA implementation

• Each of the 4 datasets was analyzed separately

• PCA done on the n-by-n correlation matrix, instead of co-variance matrix

– Reduces noise produced by samples with high variance

• Goal of the PCA is to identify Transcription Components, e.g. set of genes expressed in the same conditions

Page 12: Fehrman Nat Gen 2014 - Journal Club

Parameters of the PCA• TC size: order of the component– How much of the gene expression variance is represented by the TC

• TC setting: score of the component in a given sample– How much the expression profile represented by a TC is active in the sample

• TC wiring: PC coefficient– For every gene and for every expression profile (TC), how much the expression is

supposed to change

Page 13: Fehrman Nat Gen 2014 - Journal Club

How many Transcriptional Components there are?

● About 300 in Human

small, 600 in

HumanLarge

, …

● 2,206 TCs across all datasets

● The robustly estimated TCs (Cronbach's alpha > 0.7) captured 79-90% of the variance

Page 14: Fehrman Nat Gen 2014 - Journal Club

Do the TCs have biological meaning?

● All the TCs had at least one gene set enriched (GSEA), meaning that they represent biological phenomena

Page 15: Fehrman Nat Gen 2014 - Journal Club

Are the TCs different across the four datasets?

● Humansmall is very similar to Humanlarge,

● Mouse is similar to Rat

● Overall the most robust components are similar in all four datasets

Page 16: Fehrman Nat Gen 2014 - Journal Club

TC3 represents genes

expressed in the brain

Page 17: Fehrman Nat Gen 2014 - Journal Club

A TC-based gene network

● Constructed a gene regulation network with 19,997 genes– Two genes are connected if they are in the

same TC (co-expressed)

● This network can be used to predict gene function using “guilt-by-association” – A gene involved in a TC where 100 other

genes are associated to apoptosis is probably also associated to it

Page 18: Fehrman Nat Gen 2014 - Journal Club

Guilt by association

● Used the 2,206 TCs from the 4 datasets

● Calculated a GSEA Z-score in each TC for each gene set

● A gene with unknown function is associated to a gene set if its GSEA scores are correlated with its eigenvector coefficients

Page 19: Fehrman Nat Gen 2014 - Journal Club

Genes with similar function to BRCA1 and BRCA2

FEN1 is co-expressed with BRCA1 and BRCA2

The role of FEN1 in homologous recombination was not confirmed in mammals

Page 20: Fehrman Nat Gen 2014 - Journal Club

Involvement of FEN1 in Homologous Recombination

1b: siRNA silencing of FEN1

2C, top: if homologous recombination occurs, GFP is expressed

Page 21: Fehrman Nat Gen 2014 - Journal Club

FEM1 inhibition reduces homologous recombination

2d: chemical inhibition of FEM1 with MTT

2e: decrease of HR after inhibition of FEN1

Page 22: Fehrman Nat Gen 2014 - Journal Club

Inhibition of FEM1 and PARP1 increases DNA breaks

2f: PARP1 inhibition

2g: higher number of DNA breaks if when both PARP1 and FEN1 are inhibited

2h: higher sensitivity to PARP1 inhibition

Page 23: Fehrman Nat Gen 2014 - Journal Club

Identification of unstable samples

● A subset of human samples showed enrichment for genes mapping to the same chromosome band

● This is the effect of large SCNAs in cancer tissue or cancer cell lines

Page 24: Fehrman Nat Gen 2014 - Journal Club

Autocorrelation between TC and chromosome position

Autocorrelation: eigenvector coefficients of a gene is correlated with its neighbors

e.g. expression of gene is correlated with neighbors

Page 25: Fehrman Nat Gen 2014 - Journal Club

Identification of SCNAs from expression profiles

● Used 18,713 samples with no SCNAs to determine 718 non-genetic TCs, which are then applied to the other 18,714 samples

● SCNAs levels where correlated with residual expression (not explained by TCs), explaining 28% of variation

● This 28% variation is called Functional Genomic mRNA profile (FGM) and represent variation in gene expression that diverge from the physiological status status

Page 26: Fehrman Nat Gen 2014 - Journal Club

Identification of potential SCNA events from expression profiles

Page 27: Fehrman Nat Gen 2014 - Journal Club

Functional Genomic mRNA profile● FGM: Functional Genomic mRNA profile

– The portion of expression that can not be explained by the 718 physiological non-genetic PCs

● 20 trisomy samples clearly showed higher FGM expression

● In 470 cancer samples, FGM levels correlated with SCNA levels (aCGH), explaining 86% of variance

Page 28: Fehrman Nat Gen 2014 - Journal Club

Most genes are dosage-sensitiveto chromosome arm duplications

● They did another PCA on the FGM profile data, for every chromosome arm

– Describing if there are changes in the expression of all the genes in a chromosome arm, not due to physiological constraints (718 TCs)

● The PC1, representing the most prominent FGM pattern, described a complete duplication or deletion of the arm

● 91% of the probes were dosage-sensitive to the complete duplication/deletion of a chromosome arm

Page 29: Fehrman Nat Gen 2014 - Journal Club

More on dosage sensitivity

● Fig 4b: highly expressed genes are more dosage-sensitive

● Fig 4c: similar patterns are observed with an eQTL meta-analysis

Page 30: Fehrman Nat Gen 2014 - Journal Club

FGM profiling of 16,172 tumor samples

● Data preparation:– Excluded cell lines (text mining + similar TC

profile)

– Excluded genetically identical samples and related individuals (based on similarity of eQTL expression) (234 mix-ups)

– Only samples with high genomic instability (high auto-correlation) (potentially cancer samples)

Page 31: Fehrman Nat Gen 2014 - Journal Club

Hierarchical clustering of FGM

Most cancer types show samples with similarly altered expression

Some cancers have similar alteration patterns

Page 32: Fehrman Nat Gen 2014 - Journal Club

Amplifications and deletions in the regions involved in

the FGM profiles

● Used DNACopy to determine whether the regions in FGM profiles in cancer are amplified or deleted, based on change of expression patterns (no aCGH data)

Page 33: Fehrman Nat Gen 2014 - Journal Club

Distribution of genomic instability

● Genomic instability: autocorrelation between expression of a gene and its neighbors' – e.g. tendency of a sample to have a high number of regions

with altered expression, likely to be amplified/deleted

Page 34: Fehrman Nat Gen 2014 - Journal Club

Higher genomic instability corresponds to lower survival and higher grade

Page 35: Fehrman Nat Gen 2014 - Journal Club

Distribution of genomic instability across genome and genes

Page 36: Fehrman Nat Gen 2014 - Journal Club

Samples where

CDKN2A and ERBB2 have

altered expression

Page 37: Fehrman Nat Gen 2014 - Journal Club

Summary

● Used PCA to obtain 2,206 expression components

● Of these, 718 represent physiological non-genetic expression profiles

● The expression not explained by these 718 TCs (FGM profile) can be explained by SCNA alterations

● Most genes are dosage-sensitive, at least for arm-level alterations

Page 38: Fehrman Nat Gen 2014 - Journal Club