computational mechanisms and information coding by the non...

60
Computational mechanisms and information coding by the non-coding transcriptome Georges St. Laurent Brown University and St. Laurent Institute Providence, Rhode Island [email protected] ................................ .....

Upload: others

Post on 28-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Computational mechanisms and information coding by the non-coding transcriptome

Georges St Laurent

Brown University and St Laurent Institute

Providence Rhode Island

Georges_st_Laurentbrownedu

The non-coding transcriptome plays pervasive roles throughout the functional systems of the nervous system

catalyzes a new paradigm of computational complexity and information processing in the nervous system

May solve the dilemma of Genomics and Neuroscience

Three Goals of todayrsquos talk

Francis Crickrsquos Central Dogma of biological information flow

1) Wherersquos the complexity

How many genes

2) Whatrsquos in the DNA junk

Why keep it around

3) Evolutionof regulatory networks

Regulator requirements in protein

networks increase quadratically

Problems with the Central Dogma

Taft and Mattick Exp Biol (2007) 2101526-1547

Ratio of non-coding DNA to total genomic DNA content

6

Mattick J S J Exp Biol 20072101526-1547

ldquoA simplified view of the evolution of organismal complexityrdquo

7

Relations among the transcribed bases in the nonrepeat portions of the human genome

P Kapranov et al Science 316 1484 -1488 (2007)

8

9

10

Regionally enriched expression of ncRNAs in the hippocampus cerebral cortex and cerebellum

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 1

11

Expression of ncRNAs associated with protein-coding genes

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 2

12

Subcellular localization of ncRNAs

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 4

13

Fig S5

Long ncRNAs lower expression levels but higher spatial variation

Conclusion ~20K long ncRNAs expressed in human brain

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 2: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

The non-coding transcriptome plays pervasive roles throughout the functional systems of the nervous system

catalyzes a new paradigm of computational complexity and information processing in the nervous system

May solve the dilemma of Genomics and Neuroscience

Three Goals of todayrsquos talk

Francis Crickrsquos Central Dogma of biological information flow

1) Wherersquos the complexity

How many genes

2) Whatrsquos in the DNA junk

Why keep it around

3) Evolutionof regulatory networks

Regulator requirements in protein

networks increase quadratically

Problems with the Central Dogma

Taft and Mattick Exp Biol (2007) 2101526-1547

Ratio of non-coding DNA to total genomic DNA content

6

Mattick J S J Exp Biol 20072101526-1547

ldquoA simplified view of the evolution of organismal complexityrdquo

7

Relations among the transcribed bases in the nonrepeat portions of the human genome

P Kapranov et al Science 316 1484 -1488 (2007)

8

9

10

Regionally enriched expression of ncRNAs in the hippocampus cerebral cortex and cerebellum

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 1

11

Expression of ncRNAs associated with protein-coding genes

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 2

12

Subcellular localization of ncRNAs

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 4

13

Fig S5

Long ncRNAs lower expression levels but higher spatial variation

Conclusion ~20K long ncRNAs expressed in human brain

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 3: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Francis Crickrsquos Central Dogma of biological information flow

1) Wherersquos the complexity

How many genes

2) Whatrsquos in the DNA junk

Why keep it around

3) Evolutionof regulatory networks

Regulator requirements in protein

networks increase quadratically

Problems with the Central Dogma

Taft and Mattick Exp Biol (2007) 2101526-1547

Ratio of non-coding DNA to total genomic DNA content

6

Mattick J S J Exp Biol 20072101526-1547

ldquoA simplified view of the evolution of organismal complexityrdquo

7

Relations among the transcribed bases in the nonrepeat portions of the human genome

P Kapranov et al Science 316 1484 -1488 (2007)

8

9

10

Regionally enriched expression of ncRNAs in the hippocampus cerebral cortex and cerebellum

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 1

11

Expression of ncRNAs associated with protein-coding genes

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 2

12

Subcellular localization of ncRNAs

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 4

13

Fig S5

Long ncRNAs lower expression levels but higher spatial variation

Conclusion ~20K long ncRNAs expressed in human brain

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 4: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

1) Wherersquos the complexity

How many genes

2) Whatrsquos in the DNA junk

Why keep it around

3) Evolutionof regulatory networks

Regulator requirements in protein

networks increase quadratically

Problems with the Central Dogma

Taft and Mattick Exp Biol (2007) 2101526-1547

Ratio of non-coding DNA to total genomic DNA content

6

Mattick J S J Exp Biol 20072101526-1547

ldquoA simplified view of the evolution of organismal complexityrdquo

7

Relations among the transcribed bases in the nonrepeat portions of the human genome

P Kapranov et al Science 316 1484 -1488 (2007)

8

9

10

Regionally enriched expression of ncRNAs in the hippocampus cerebral cortex and cerebellum

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 1

11

Expression of ncRNAs associated with protein-coding genes

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 2

12

Subcellular localization of ncRNAs

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 4

13

Fig S5

Long ncRNAs lower expression levels but higher spatial variation

Conclusion ~20K long ncRNAs expressed in human brain

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 5: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Taft and Mattick Exp Biol (2007) 2101526-1547

Ratio of non-coding DNA to total genomic DNA content

6

Mattick J S J Exp Biol 20072101526-1547

ldquoA simplified view of the evolution of organismal complexityrdquo

7

Relations among the transcribed bases in the nonrepeat portions of the human genome

P Kapranov et al Science 316 1484 -1488 (2007)

8

9

10

Regionally enriched expression of ncRNAs in the hippocampus cerebral cortex and cerebellum

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 1

11

Expression of ncRNAs associated with protein-coding genes

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 2

12

Subcellular localization of ncRNAs

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 4

13

Fig S5

Long ncRNAs lower expression levels but higher spatial variation

Conclusion ~20K long ncRNAs expressed in human brain

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 6: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

6

Mattick J S J Exp Biol 20072101526-1547

ldquoA simplified view of the evolution of organismal complexityrdquo

7

Relations among the transcribed bases in the nonrepeat portions of the human genome

P Kapranov et al Science 316 1484 -1488 (2007)

8

9

10

Regionally enriched expression of ncRNAs in the hippocampus cerebral cortex and cerebellum

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 1

11

Expression of ncRNAs associated with protein-coding genes

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 2

12

Subcellular localization of ncRNAs

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 4

13

Fig S5

Long ncRNAs lower expression levels but higher spatial variation

Conclusion ~20K long ncRNAs expressed in human brain

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 7: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

7

Relations among the transcribed bases in the nonrepeat portions of the human genome

P Kapranov et al Science 316 1484 -1488 (2007)

8

9

10

Regionally enriched expression of ncRNAs in the hippocampus cerebral cortex and cerebellum

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 1

11

Expression of ncRNAs associated with protein-coding genes

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 2

12

Subcellular localization of ncRNAs

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 4

13

Fig S5

Long ncRNAs lower expression levels but higher spatial variation

Conclusion ~20K long ncRNAs expressed in human brain

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 8: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

8

9

10

Regionally enriched expression of ncRNAs in the hippocampus cerebral cortex and cerebellum

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 1

11

Expression of ncRNAs associated with protein-coding genes

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 2

12

Subcellular localization of ncRNAs

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 4

13

Fig S5

Long ncRNAs lower expression levels but higher spatial variation

Conclusion ~20K long ncRNAs expressed in human brain

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 9: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

9

10

Regionally enriched expression of ncRNAs in the hippocampus cerebral cortex and cerebellum

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 1

11

Expression of ncRNAs associated with protein-coding genes

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 2

12

Subcellular localization of ncRNAs

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 4

13

Fig S5

Long ncRNAs lower expression levels but higher spatial variation

Conclusion ~20K long ncRNAs expressed in human brain

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 10: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

10

Regionally enriched expression of ncRNAs in the hippocampus cerebral cortex and cerebellum

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 1

11

Expression of ncRNAs associated with protein-coding genes

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 2

12

Subcellular localization of ncRNAs

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 4

13

Fig S5

Long ncRNAs lower expression levels but higher spatial variation

Conclusion ~20K long ncRNAs expressed in human brain

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 11: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

11

Expression of ncRNAs associated with protein-coding genes

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 2

12

Subcellular localization of ncRNAs

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 4

13

Fig S5

Long ncRNAs lower expression levels but higher spatial variation

Conclusion ~20K long ncRNAs expressed in human brain

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 12: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

12

Subcellular localization of ncRNAs

Mercer T R etal PNAS 2008105716-721copy2008 by National Academy of Sciences

Fig 4

13

Fig S5

Long ncRNAs lower expression levels but higher spatial variation

Conclusion ~20K long ncRNAs expressed in human brain

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 13: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

13

Fig S5

Long ncRNAs lower expression levels but higher spatial variation

Conclusion ~20K long ncRNAs expressed in human brain

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 14: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

ncRNAs comprise a Computational Matrix

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 15: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

ncRNAs are unique as information processors

Capacity to couple the digital Information dimension of sequence homology to the analog information dimension of macromolecular shape

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 16: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Unique sensory features

Reversibility

Sensitivity

Plasticity

HSF1

HSF

Sensory signaling changes ncRNA secondary structure in-vivo

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 17: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

ncRNA Information Theory and Thermodynamics more information for less thermal energy

Information Content

Thermal Dissipation=

Σ pi ln pi_________________________

Σ ln1

w

1

w

Codable Degrees of Freedom

Thermal Degrees of Freedom

=

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 18: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

ncRNA features an enhanced Shannon

Entropy to Thermal entropy ratio

Σ pi ln pi_________________________

Σ ln 1

w1

w

Information ContentThermal Dissipation =

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 19: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

ncRNAs can regulate the early pathogenisis of complex disease

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 20: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

szlig-secretase = BACE-1 (szlig-site APP cleaving enzyme)

Gurney et al Nature 402533 1999

Rate limiting for Aszlig 1-42

generation

Finely tuned

Stress responsive

Up-regulated in Alzheimerrsquos

disease

Inhibitor disease modifying

therapy

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 21: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

BACE1 (beta-site APP cleaving enzyme)

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 22: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

BACE1 Genomic Locus

miR-485-5p binding site is located in the overlapping region of BACE1

and BACE1-AS transcripts

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 23: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

FISH images show a nuclear enrichment pattern

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 24: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

0

50

100

150

200

250

BACE-1

BACE-1-AS

Control AD

Human brain samples (group 1 10 AD 10 controlregion)

RN

A t

ran

scri

pt

( o

f co

ntr

ol a

vera

ge)

BACE1-AS is elevated in Alzheimer‟s disease

brain (as is BACE1 itself)

Hippocampus (n=40 each)

Faghihi et al Nature Medicine 14723 2008

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 25: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Dual and synergistic BACE1 regulation by BACE1-AS

miR-485-5p

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 26: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

000

1000

2000

3000

4000

5000

6000

0 -0005 0005 -001

001 - 002 002 - 005 005 - 01 01 - 02 02 - 05 05 - 1 1

Coding

NC

siRNA Screen for NAT modulation of cell viability

700 NATs ndash 2000 siRNAs

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 27: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

0

1

2

3

4

5

6

Singlets Duplets Triplets Total

Coding

Non Coding

Percentage of probes with p lt 005

ncNATs score almost as high as coding NATs in

cell viability screen

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 28: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Validation in approx 60 of hits

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 29: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

0

20

40

60

80

100

120

1 -

100

0

100

1 -

200

0

200

1 -

300

0

300

1 -

40

00

40

01

-50

00

500

1 -

60

00

60

01

-70

00

700

1 -

80

00

80

01

-9

00

0

90

01

-10

00

0

100

01

-11

00

0

110

01

-12

00

0

120

01

-13

00

0

130

01

-14

00

0

140

01

-15

00

0

150

01

-16

00

0

160

01

-17

00

0

170

01

-18

00

0

180

01

-19

00

0

190

01

-20

00

0

200

01

-20

826

Proteome Wide Prediction of RNA Binding Affinity

RNAbinding proteins

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 30: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

31

RNA Protein Complex Hi Throughput Pipeline

Proteome wide Prediction of RNA Binding regions

Endogenous Flag Tag of Predicted RNA Binding proteins

Cryogenic Flag IP of RNA ndash Protein Complexes

RNA Deep Sequencing and Peptide Mass Spec

Bioinformatics Identification of RNAs and proteins

Systems Biology Analysis of Datasets Network construction

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 31: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

32

Cryogenic ImmunoprecipitationTechnique

ENTROPY

Cells

Proteomics RNA-Seq or Chip-SeqAnalysis

Freezing

Immunoprecipitation

bullPC12 cells

bullprimary cells

bullneuronal progenitors

Treatments

Depolarization

Stress

Inflammation

Cytokines

Drugs

Create cell

bdquogrindate‟

30 min

ENTROPY

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 32: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

33

Difficult Timescales for RNP Immunoprecipitation

In vivostate

100

90

80

70

60

50

40

30

20

10

Specific binding

Nonspecific binding

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 33: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

34

Features of Cryogenic IP for RNP Studies

bullRapid technique prevents RNA degradation and loss of

transient macromolecular interactions

bull Rearrangement is not a significant problem

bull Yields of gt 90 for bait protein and associated RNA

bull Does not depend on a particular protein tag

bull No cross-linking necessary

bull Able to capture weak interactions

An ideal technique for studying maturing RNP complexes

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 34: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Transfected

with

150 kDa

100 kDa

75 kDa

50 kDa

375 kDa

25 kDa

250 kDa

Cryogenic Immunoprecipitation of RNA ndash Protein Complexes

NF90 = 90 kDa

HuR = 37kDa

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 35: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

36

Helicos single molecule sequencing

36

SamplePreparation

HeliScopetradeSingle Molecule

Sequencer

BioinformaticAnalysisEngine

gtGATAGCTAGCTAGCTACACAGAGAT gtGATAGACACACACACACACAGCGCA gtGTACTACACACAGCGACACAGTCTA gtGTCGAACACACATGAACACATGAGC gtGTGTCACACACGACTACACATGCAT gtTAGTGACACACGTAGACACGACAGT gtTCTCGACACACTATCACACGACTCAgtTGCACACACACTCGTACACGAGACG

Output

Capacity = 10 billion nucleotides run

High-throughput tools for ncRNA Systems Biology

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 36: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

37

HuR Associated Transcriptome isolated by Cryo-IP

bullAffy All Exon Array = 11155 called probes

bullIllumina Deep Sequencing = 6 million total sequence tags

bullCoding and non-coding represented in top 3000

bullNatural Antisense RNAs such as HIF1α-AS represented in

top 100

bull60 overlap between top 3000 sequence tags and Affy

bullHelicos comparison pending (permits very small sample

sizes)

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 37: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

38

RNA Motif 1 for HuR association

UGUG

U

Lopez de Silanes et al (2004) PNAS ldquoIdentification of a target RNA motif for RNA-binding protein HuRrdquo MyriamGorospersquos Lab+

Found the Gorospe motif in 4536 of 11150 sequences hellip hellip (versus 3521 in a mononucleotide shuffled control)

Z-score =~ 2069

Loop 3-8 bp

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 38: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

39

Ma et al (1996) JBC ldquoCloning and Characterization of HuR a Ubiquitously Expressed Elav-like Proteinrdquo

A U UUUU A

RNA Motif 2 for HuR association

Found the Ma motif in 2230 of 11150 sequences hellip hellip (versus 1267 in a dinucleotide shuffled control)

Z-score =~ 2972

Both motifs informative but suggests HuR responds to a wider range of information signals

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 39: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

40

Deciphering information content defining HuR

interactions withthe Transcriptome

RNA

scan window

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 40: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

41

Determine clusters of similar structures

hellip

hellip

po

sitiv

e

ne

ga

tive

helliphellip

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 41: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

42

Calculate clusters size distribution

(for scanning window length = 50 45 and 40)

window length = 50 window length = 45 window length = 40Structures 45-50 bp length Structures 40-45 bp length Structures 35-40 bp length

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 42: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

43

Structures (length 4550) which constitute

biggest clusters of positive set

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))(((((((())))))))

(((((((())))))))(((((((())))))))

(((((((((())))))))))((((()))))

((((((()))))))((((((()))))))

(((((((())))))))((((((()))))))

((((((()))))))((((((()))))))

(((((())))))(((((((())))))))

(((((((((((())))))))))))

((((((((((((()))))))))))))

((((((((((()))))))))))

((((((((()))))))))

(((((((((((((())))))))))))))

(((((((((())))))))))

(((((((((((())))))))))))

(((((((((((())))))))))))

((((((((((()))))))))))

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 43: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

44

HUR-binding transcripts have ~3 times more special local

secondary structures than HuR non-binding transcripts

bull Z-score ge 38

HUR

HUR

RNA

HUR HUR

RNA

po

sitiv

en

eg

ative

po

sitiv

e

ne

ga

tive

2673 810density

Signal to noise decays with decreasing structure size

1482 sequences

2818 sequences

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 44: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

HuR Associated Transcriptome isolated by Cryo-IP

Affy All Exon Array yields 11155 called probes

Antisense transcripts comprise 50 of associated RNAs

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 45: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Multiplexed Computation of Gene Expression

Another example lin 28 - let 7 interactions

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 46: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

47

Cytoplasmic P Bodies ndash Supercomputing Warehouse for RNA

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 47: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Scaffolding Machineries regulate synaptic translation

Bramham and Wells (2007)

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 48: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

ncRNAs modulate synaptic translation machineries

Information content supplied from a range of ncRNAs may modulate these machineries to produce many ldquoColors and Flavorsrdquoof LTP and LTD

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 49: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Dinger et al (2008)

RNA as an intercellular communicator

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 50: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Sid2 Expression in

Mammalian Brain

Dinger et al (2008)

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 51: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Editing may play an active role in the computational matrix

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 52: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

The Transcriptome as a computational Matrix

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 53: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

ADAR participates in ncRNA information processing

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 54: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

ADAR participates in Inflammation

Cascade Feedback Loops

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 55: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

ncRNA ndash protein machineries mediate two way information flow

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 56: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Conclusions

1 Non-coding Regions directly correlate with organismal

complexity across evolution

2 ncRNAs are differentially expressed processed and localized in

cell types tissues and biological processes

3 ncRNAs play functional roles in processes such as development

stress response and disease

4 ncRNAs have unique information coding and processing

capabilities density range and flexibility

5 Therefore in mammalian cells the combinatorial space of RNA ndash

protein interactions likely functions as a molecular supercomputer

impacting the great majority of pathways and cellular functions

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 57: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

Project Team Members Mo Heydarian Dennis Vorobiev Dmitry Schtokalo Sergey Nechkin based in Novosibirsk Russia Andrey Polyanov

Collaborators Mohammad Faghihi Scripps Florida Claes Wahlestedt Scripps Florida Rob Reenan Brown University Tim McCaffrey GWU

Acknowledgements

59

Q amp A

Page 58: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information

59

Q amp A

Page 59: Computational mechanisms and information coding by the non ...conf.nsc.ru/files/conferences/BGRSSB2010/29261/Georges St. Laurent III.pdf · Computational mechanisms and information