regulatory variation and its functional consequences chris cotsapas [email protected]

44
Regulatory variation and its functional consequences Chris Cotsapas [email protected] rg

Upload: briana-patterson

Post on 17-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Regulatory variation and its functional consequences

Chris [email protected]

Motivating questions

• How do phenotypes vary across individuals?– Regulatory changes drive cellular and organismal

traits– Likely also drive evolutionary differences

• How are genes (co)regulated?– Pathways, processes, contexts

Regulatory variation

• What do “interesting” variants do?• Genetic changes to:

– Coding sequence **– Gene expression levels– Splice isomer levels– Methylation patterns– Chromatin accessibility– Transcription factor binding kinetics– Cell signaling– Protein-protein interactions

~88% of GWAS hits are regulatory

Genetic variation alters regulation

• Protein levels – Maize (Damerval 94)

• Expression levels– Yeast, maize, mouse, humans (Brem 02, Schadt 03,

Stranger 05, Stranger 07)• RNA splicing

– Humans (Pickrell 12, Lappalainen 13)• Methylation and Dnase I peak strength

– Humans (Degner 12; Gibbs 12)

• cis-eQTL– The position of the eQTL maps near

the physical position of the gene.– Promoter polymorphism?– Insertion/Deletion?– Methylation, chromatin

conformation?

• trans-eQTL– The position of the eQTL does not

map near the physical position of the gene.

– Regulator?– Direct or indirect?

Modified from Cheung and Spielman 2009 Nat Gen

Genetics of gene expression (eQTL)

Cis- eQTL analysis: Test SNPs within a pre-defined distance of gene

1Mb 1Mb

SNPsgene

probe

1Mb window

QT association• Analysis of the relationship between a dependent or outcome

variable (phenotype) with one or more independent or predictor variables (SNP genotype)

Yi = b0 + b1Xi + ei

Number of A1 Alleles0 1 2

Conti

nuou

s Tr

ait V

alue

b0

Slope: b1

Linear Regression Equation

Logistic Regression Equation

= b0 + b1Xi + eiln( )pi

(1-pi)

gene 3

eQTL analysis: a GWAS for every gene

gene 2

gene N

gene 5

gene 4

gene 1

cis-eQTLs are rather common

Nica et al PLoS Genet 2011

Cis-eQTLs cluster around TSS

Stranger et alPLoS Genet 2012

trans hotspots (yeast)

Brem et al Science 2002

Yvert et al Nat Genet 2003

DOES REGULATORY VARIATION ALTER PHENOTYPE? APPLICATION TO GWAS

Candidate genes, perturbations underlying organismal phenotypes

Rationale

• How do disease/trait variants actually alter biology?

• If they change regulation, then:– Change in gene expression/isoform use– Phenotypic consequence*

Compare patterns of association

GWAS peak

eQTL for gene 1

eQTL for gene 2

Pearson’s covariance for windows of 51 SNPs between –log(p) in 2 traits

CD GWAS p

eQTL p

Detect a peak when effect is the sameNo peak when there are independent hits near each other

Crohn’s/eQTL analysis

• CD meta analysis (GWAS only)• CEU Hapmap LCL eQTL data• Overlapping SNPs only (eQTL data has 610K

SNPs, most in CD meta-analysis)• Test 133 associations (total 1054 tests)

GWAS peak

eQTL for gene 1

eQTL for gene 2

Crohn’s/eQTL analysisSNP CHR Gene

rs11742570 5 PTGER4

rs12994997 2 ATG16L1

rs11401 16 SPNS1

rs10781499 9 INPP5E

rs2266959 2 C22orf29

A peak implies that the same effect drives GWAS and eQTL

MS/eQTL analysisSNP CHR Gene

rs6880778 5 PTGER4

rs7132277 12 CDK2AP

rs7665090 4 CISD2

rs2255214 3 GOLGB1 & EAF2

rs201202118 12 METTL1 & TSFM

rs12946510 17 ORMDL3, STARD3 & ZPBP2

rs2283792 22 PPM1F

rs7552544 1 SLC30A7

rs34536443 19 SLC44A2

A peak implies that the same effect drives GWAS and eQTL

DOES REGVAR REVEAL CO-REGULATION? A.K.A. WHERE ARE THE TRANS eQTLS?

Open question

gene 3

Whole-genome eQTL analysis is an independent GWAS for expression of each

gene

gene 2

gene N

gene 5

gene 4

gene 1

Issues with trans mapping

• Power– Genome-wide significance is 5e-8

– Multiple testing on ~20K genes– Sample sizes clearly inadequate

• Data structure– Bias corrections deflate variance– Non-normal distributions

• Sample sizes– Far too small

But…

• Assume that trans eQTLs affect many genes…

• …and you can use cross-trait methods!

Association data

Z1,1 Z1,2 … … Z1,p

Z2,1

::

Zs,1 Zs,p

Cross-phenotype meta-analysis

SCPMA ~L(data | λ≠1)

L(data | λ=1)

Cotsapas et al, PLoS Genetics

CPMA for correlated traits

• Empirical assessment to account for correlation

• Simulate Z scores under covariance, recalculate CPMA

• Construct distribution of CPMA for dataset, call significance

with Ben Voight, U Penn

Experimental design

610,180 SNPs MAF >0.15 CEU and YRI

LD pruned (r2 < 0.2)

8368 transcriptsDetectable on Illumina arrays

108 CEU individuals*109 YRI individuals*

* Stranger et al Nat Genet 2007(LCL data; publicly available)

CEU p-values Transcript ~ SNP, sex

YRI p-values Transcript ~ SNP, sex

plink CPMA

CEU CPMA scores

YRI CPMA scores

>95%ile sim CPMA

Target sets of genes

• trans-acting variant: SNP with CPMA evidence• Target genes: genes affected by trans-acting

variant (i.e. regulon)

Prediction 1

• Allelic effects should be conserved between two populations– Binomial test on paired observations for all genes

P < 0.05 in at least one population

True for 1124/1311 SNPs (binomial p < 0.05)

Genes pCEU < 0.05

Genes pYRI < 0.05

CEU + + - - +

YRI + + - - +

YRI - - + + -

Prediction 2

• Target genes should overlap– Identify by mixture of gaussians classification– Empirical p from distribution of overlaps between

NCEU and NYRI genes across SNPs.

True for 600/1311 SNPs (empirical p < 0.05)

Genes pCEU < 0.05

Genes pYRI < 0.05

What about the target genes?

• Regulons:– Encode proteins more

connected than expected by chance

www.broadinstitute.org/mpg/dapple.phpRossin et al 2011 PLoS Genetics

What about the target genes?

• Regulons:– Encode proteins enriched for

TF targets (ENCODE LCL data)– 24/67 filtered TFs significant– Binomial overlap test

TF p-value

CEBPB 3.7 x 10-142

HDAC8 7.8 x 10-122

FOS 2.5 x 10-96

JUND 3.7 x 10-88

NFYB 3.3 x 10-71

ETS1 3.8 x 10-63

FAM48A 2.1 x 10-61

FOXA1 1.4 x 10-33

GATA1 4.6 x 10-33

HEY1 7.8 x 10-32

transtarget genes

CHiPseqLCL targetgenes

Summary

• Regulatory variation is common• It affects gene expression levels• Likely many other types:

– DNA accessibility, chromatin states– Transcript splicing, processing, turnover

• Has phenotypic consequences– GWAS– Some cellular assays (not discussed here)

Open questions

• Discover regulatory elements (cis)– Promoters, enhancers etc

• Gene regulatory circuits (trans)• Dynamics of regulation

– Splicing variation, processing, degradation• Phenotypic consequences

– Cellular assays required• Tie in to organismal phenotype

NEXT-GEN SEQUENCING DATARNAseq, GTEx

GTEx – Genotype-Tissue EXpressionAn NIH common fund project

Current: 35 tissues from 50 donors

Scale up: 20K tissues from 900 donors.

Novel methods groups: 5 current + RFA

How can we make RNAseq useful?

• Standard eQTLs – Montgomery et al, Pickrell et al Nature 2010

• Isoform eQTLs– Depth of sequence!

• Long genes are preferentially sequenced• Abundant genes/isoforms ditto• Power!?• Mapping biases due to SNPs

RNAseq combined with other techs

• Regulons: TF gene sets via CHiP/seq– Look for trans effects

• Open chromatin states (Dnase I; methylation)– Find active genes– Changes in epigenetic marks correlated to RNA– Genetic effects

• RNA/DNA comparisons – Simultaneous SNP detection/genotyping– RNA editing ???