mapping transcription mechanisms from multimodal genomic data

Post on 10-Jan-2016

30 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Mapping Transcription Mechanisms from Multimodal Genomic Data. Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni. Children ’ s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology Harvard Medical School March 10, 2010. - PowerPoint PPT Presentation

TRANSCRIPT

1

Harvard Medical School

Mapping Transcription Mechanisms from Multimodal Genomic Data

Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni

Children’s Hospital Informatics ProgramHarvard-MIT Division of Health Sciences and Technology

Harvard Medical School March 10, 2010

2

Harvard Medical School

Information Flow in Multimodal Genomic Data

• Genetic Variants– 100k – 1000k SNPs– 250k copy number

variations (CNVs)– 250k methylation

measurements

• Transcripts– 50k mRNA expression levels– 50k microRNA expression levels– 1.5M exon expression / splicing

Information

Information

3

Harvard Medical School

Expression Quantitative Trait Loci (eQTLs)• Connection from variant to expression is an

information channel– A DNA locus is modulating the expression level of

a gene = eQTL• Cis(Trans) eQTLs are the genetic variants

located close to (far away) genes.• Identifying cis-eQTLs is easier

– Focusing on cis-eQTL reduces search space– trans eQTLs?

4

Harvard Medical School

• Cancer: based on genetic modification (variants) and cellular malfunction (gene expression)

• Identification of eQTLs helps understand molecular mechanisms in cancer and provides biological insight.

• Clinical study of Acute lymphoblastic leukemia (ALL)– The most common malignancy in children, nearly one third of all

pediatric cancers.– A few cases are associated with inherited genetic syndromes (i.e., Down

syndrome, Bloom syndrome, Fanconi anemia), but the cause remains unknown.

• Data– 29 patients.– Genotyped 100,000 SNPs (Affymetrix Human Mapping 100K).– Profiled 50,000 gene expressions (Affymetrix HG-U133 Plus 2.0).

Clinical Study on Pediatric Leukemia

5

Harvard Medical School

Challenges in Finding eQTLs

• Compare the distribution of each Variant to the levels of each expression measurement– Computational

• All pairs of variants vs. expressions is costly• Usually discretize expression levels (Pensa et al., BioKDD, 2004)

– Multiple testing considerations• Understanding

– Too many associations to test via laboratory science• Computational methods of biological discovery• Want to summarize main informational (biological) pathways

• Answer: Use transcriptional information

6

Harvard Medical School

Transcriptional Information Channel

X Y

SNPs are modeled as binomial variables.

Expressions are modeled as log-normal variables.

• Mutual Information quantifies information flow:

• Higher MI is achieved by larger σ2 and smaller σk2 , i.e., when expression level Y is more likely modulated by SNP X.

Transcription Channel

• Info Theory:measures Entropy,H(X)

7

Harvard Medical School

• Transcript Y is modulated by SNP X:

• Transcript Y is independent of SNP X:

8

Harvard Medical School

Transcriptional Information Map

X1 Y1

Y2

X4 Y4

X5 Y5

X6

X7 Y7

Y8

X9 Y9

X8

Y3

Y6

9

Harvard Medical School

ALL Transcriptional Information Map of Chr21

10

Harvard Medical School

Cluster Genes and SNPs into Networks

X1 Y1

X2 Y2

X3

X4 Y4

X5 Y5

X6

X7 Y7

Y8

X9 Y9

X8

Y3

Y6

11

Harvard Medical School

X1 Y1

Y2

X3

X4

Y9

X8

Cluster Genes and SNPs into Networks

• We can further infer the optimal modulation patterns using Bayesian networks.

12

Harvard Medical School

• Bayesian networks are directed acyclic graphs: – Nodes correspond to random variables.– Directed arcs encode conditional probabilities of the target nodes on the source nodes.

– p(X) depends on (A,B)– p(Z|X,Y) independent of (A,B)

Bayesian Networks

A

B

X

Y Z

13

Harvard Medical School

Infer Bayesian Networks in Individual Clusters

Y1

Y2

Y9• Step 1: Use TIM as the initial network.• Step 2: Bayesian network infers SNP-SNP connections.

14

Harvard Medical School

A Bayesian Network Inferred from Chr21 TIM

15

Harvard Medical School

Information Theoretic Network Analysis

• Find hubs, motifs, guilds, etc.– Abstract edges– Global patterns -> local patterns– Reveal emergent properties– Information theoretic approach using Data

Compression

• Alterovitz G, and Ramoni MF, “Discovering biological guilds through topological abstraction,” AMIA Annu Symp Proc, pp. 1-5, 2006.

16

Harvard Medical School

Identified Fundamental Components

Reference: Alterovitz and Ramoni, AMIA Annu Symp Proc, pp. 1-5, 2006.

17

Harvard Medical School

Identification of Cis- and Trans eQTL

• RIPK4, 21q22.3– Related to Downs

Syndrome– RIPK4 has 5

(trans) SNPs in q11.2 (shown as blue in the figure) affecting its expression.

RIPK4

18

Harvard Medical School

Identification of Cis and Trans eQTL• CYYR1, 21q21.1

– Recently discovered. – Encodes a cysteine and

tyrosine-rich protein.– Recent study found a

correlation with neuroendocrine tumors.

– TIM shows CYYR1 modulated by SNPs across the q arm of chromosome 21.

– DSCAM related to Down’s syndrome

– DSCAM-CYYR1 interaction leads to ALL?

DSCAM

19

Harvard Medical School

Complete TIM Algorithm

Infer Network in Individual

Clusters

Cluster 1

Cluster N

...

...

...

...

...

Compute Transcriptional

Information

...

...

...

...

Genetic Variant Transcript

Group Linked SNPs and Transcripts

Cluster 1

Cluster N

. . .

Network Topology

Analysis and Summary

20

Harvard Medical School

Transcriptional Information Maps

• Make large multimodal genetic dataset amenable to transcriptional analysis

• Identifies– Modulation patterns between genetic variants

and transcripts.– CIS and TRANS eQTL.

• Analysis of pediatric ALL helps identify biological hypotheses regarding connection to Down’s syndrome

21

Harvard Medical School

Questions?Thanks to

Prof. Marco F. Ramoni, Dr. Hsun-Hsien Chang, Dr. Gil Alterowitz, Children’s

Hospital Informatics Program, Brigham and Women’s Hospital

top related