big 2018 slides - lpantano.github.io · clean lex neb nf1 nf2 nf3 nf4 peb qia som tri tru ts1 ts2...

12-06-2018

miRNA wars THE isomiRS menace

Lorena Pantano, P.h. D https://lpantano.github.io

@lopantano

https://www.vinylinfo.org/news/population-growth-sustainability-and-our-future

150 human liver samples with small RNAseq data

miRNAsGene regulation by imperfect complementary between seed region in miRNA and 3’UTR in the targeted RNA molecule.

https://www.sciencedirect.com/science/article/pii/S002228361300154X

isomiRs https://www.flickr.com/photos/ajc1/4401411500

What is the best way to analyze this data?

Main projects: - a format - a way to compare

By Kieran O’Neill

Format derived from GFF3: mirGFF3

Python API: mirtop

What is the best way to analyze this data?

Samples suitable for benchmarking

Giraldez et al.

Mestdagh et al.

Dan-Dascot et al.

+ 6 synthetic RNAs

Tewari synthetic

Tewari custom

Different protocols

Protocols

https://rnaseq.uoregon.edu/

Variations

3 adapter 5 adapter PEG

TrueSeq

NEBNext Yes

NextFlex 4N 4N Yes

CleanTag Methylphosphonate 2’Ome

CATS poly-T TSO with rX

SMARTer poly-T TSO

Methods and Metrics

Sample

Library size = 20

Sample

miRNA 2 = 10

miRNA 1 = 10

Library size = 20

Sample

miRNA 2 = x10 isomiR 2.1 = x3 (match perfectly spike-in)

isomiR 2.2 = x5 (NOT match perfectly spike-in)


miRNA 1 = x10 isomiR 1.1 = x7 (match perfectly spike-in)


Library size = 20

Sample

Library size = 20

There are 20 reads in total (lines), but 5 unique sequences (colors)

Sample

Library size = 20

isomiR 1.1 is 7/20=35% of the reads and 1/5=20% of the sequences

miRNA 1 = 10x 2 unique isomiRs

miRNA 2= 10x 3 unique isomiRs

Sample

Library size = 20

isomiR 1.1 is 1/2=50% of the miRNA1 sequences

isomiR 1.1 is 7/10=70% of the miRNA1 reads

miRNA 2 = 10 3 isomiRs


Sample

Library size = 20

PCT = isomiR 1.1 is 1/5=20% of the sequences

IMPORTANCE = isomiR 1.1 is 7/10=70% of the miRNA1 reads



Pipeline razers3 + mirtop

Reference Trimmed Data

Razers3

Mirtop GFF3

Mirtop tabular

snakemake

Data analysis - Filters

Human?

Is the spike-in detected?

Any sequence in group mapped to

other miRNA?

NO YES

NO YES

NO YES

All sequences in a miRNA

Tewari-synthetic: library size

56K99K

200K

86K

113K

196K

53K

88K

81K

77K

84K

30K

85K

166K

188K

134K

153K160K

127K

12K

91K

0e+00

2e+07

4e+07

6e+07

clean_ta

g_lab

5

neb_next_

lab1

neb_next_

lab3

neb_next_

lab4

neb_next_

lab5

neb_next_

lab9

tru_seq_la

b1

tru_seq_la

b2

tru_seq_la

b3

tru_seq_la

b5

tru_seq_la

b6

tru_seq_la

b8

tru_seq_la

b9

x4n_a_lab

5

x4n_b_lab

2

x4n_b_lab

4

x4n_b_lab

6

x4n_c_lab

6

x4n_d_lab

1

x4n_nex_tfle

x_lab

8

x4n_xu_la

b5

sample

reads

protocol clean neb tru x4n

40 million reads and 85K different sequences.

Tewari-synthetic: spike-ins are the top sequence?

667

626 735659 712

752 885712

691 690 701645

692

741 792

768

804

860

751

578 701

0

20

40

60

80

clean_tag_lab5

neb_next_lab1

neb_next_lab3

neb_next_lab4

neb_next_lab5

neb_next_lab9

tru_seq_lab1

tru_seq_lab2

tru_seq_lab3

tru_seq_lab5

tru_seq_lab6

tru_seq_lab8

tru_seq_lab9

x4n_a_lab5

x4n_b_lab2

x4n_b_lab4

x4n_b_lab6

x4n_c_lab6

x4n_d_lab1

x4n_nex_tflex_lab8

x4n_xu_lab5

sample

pct


70% (692 being total detected miRNAs) of miRNAs with top sequence to be the expected one.

Most abundant type of isomiR for each miRNA without the spike-in as the most abundant

0

200

400

600

clean

_tag_

lab5

neb_

next_

lab1

neb_

next_

lab3

neb_

next_

lab4

neb_

next_

lab5

neb_

next_

lab9

tru_s

eq_la

b1

tru_s

eq_la

b2

tru_s

eq_la

b3

tru_s

eq_la

b5

tru_s

eq_la

b6

tru_s

eq_la

b8

tru_s

eq_la

b9

x4n_

a_lab

5

x4n_

b_lab

2

x4n_

b_lab

4

x4n_

b_lab

6

x4n_

c_lab

6

x4n_

d_lab

1

x4n_

nex_

tflex_

lab8

x4n_

xu_la

b5

sample

# of m

iRNA

s

IMPORTANCE add3padd3p + other

shift3pshift5p

shift5p shift3psnp

snp + other

Tewari-synthetic: Importance of the ‘isomiRs’ detected

snp snp + other

shift5p shift5p shift3p

reference shift3p

add3p add3p + other

clean

_tag

_lab

5ne

b_ne

xt_l

ab1

neb_

next

_lab

3ne

b_ne

xt_l

ab4

neb_

next

_lab

5ne

b_ne

xt_l

ab9

tru_s

eq_l

ab1

tru_s

eq_l

ab2

tru_s

eq_l

ab3

tru_s

eq_l

ab5

tru_s

eq_l

ab6

tru_s

eq_l

ab8

tru_s

eq_l

ab9

x4n_

a_la

b5x4

n_b_

lab2

x4n_

b_la

b4x4

n_b_

lab6

x4n_

c_la

b6x4

n_d_

lab1

x4n_

nex_

tflex

_lab

8x4

n_xu

_lab

5

clean

_tag

_lab

5ne

b_ne

xt_l

ab1

neb_

next

_lab

3ne

b_ne

xt_l

ab4

neb_

next

_lab

5ne

b_ne

xt_l

ab9

tru_s

eq_l

ab1

tru_s

eq_l

ab2

tru_s

eq_l

ab3

tru_s

eq_l

ab5

tru_s

eq_l

ab6

tru_s

eq_l

ab8

tru_s

eq_l

ab9

x4n_

a_la

b5x4

n_b_

lab2

x4n_

b_la

b4x4

n_b_

lab6

x4n_

c_la

b6x4

n_d_

lab1

x4n_

nex_

tflex

_lab

8x4

n_xu

_lab

5

0

1

2

3

4

5

0

1

2

3

0

1

2

3

0

20

40

60

0.0

0.5

1.0

1.5

0

1

2

3

4

0

1

2

3

4

0

20

40

60

sample

PCT

IMPORTANCE<0.1

0.1−1

1−5

5−10

10−20

20−50

>50

isomiRs importance by type

4200 5361 60765583 5752 5780

48224209 4315 4188 4107

3940

4015 4135 4045 4297 4681 5025 4040

2980

3948

0

30

60

90

clean

_tag

_lab

5

neb_

next

_lab

1

neb_

next

_lab

3

neb_

next

_lab

4

neb_

next

_lab

5

neb_

next

_lab

9

tru_s

eq_l

ab1

tru_s

eq_l

ab2

tru_s

eq_l

ab3

tru_s

eq_l

ab5

tru_s

eq_l

ab6

tru_s

eq_l

ab8

tru_s

eq_l

ab9

x4n_

a_la

b5

x4n_

b_la

b2

x4n_

b_la

b4

x4n_

b_la

b6

x4n_

c_la

b6

x4n_

d_la

b1

x4n_

nex_

tflex

_lab

8

x4n_

xu_l

ab5

sample

% re

mov

ed


From 85K to 4K unique sequences. 90% of sequences are removed.

snp snp + other

shift5p shift5p shift3p

reference shift3p

add3p add3p + other

clean

_tag

_lab

5ne

b_ne

xt_l

ab1

neb_

next

_lab

3ne

b_ne

xt_l

ab4

neb_

next

_lab

5ne

b_ne

xt_l

ab9

tru_s

eq_l

ab1

tru_s

eq_l

ab2

tru_s

eq_l

ab3

tru_s

eq_l

ab5

tru_s

eq_l

ab6

tru_s

eq_l

ab8

tru_s

eq_l

ab9

x4n_

a_la

b5x4

n_b_

lab2

x4n_

b_la

b4x4

n_b_

lab6

x4n_

c_la

b6x4

n_d_

lab1

x4n_

nex_

tflex

_lab

8x4

n_xu

_lab

5

clean

_tag

_lab

5ne

b_ne

xt_l

ab1

neb_

next

_lab

3ne

b_ne

xt_l

ab4

neb_

next

_lab

5ne

b_ne

xt_l

ab9

tru_s

eq_l

ab1

tru_s

eq_l

ab2

tru_s

eq_l

ab3

tru_s

eq_l

ab5

tru_s

eq_l

ab6

tru_s

eq_l

ab8

tru_s

eq_l

ab9

x4n_

a_la

b5x4

n_b_

lab2

x4n_

b_la

b4x4

n_b_

lab6

x4n_

c_la

b6x4

n_d_

lab1

x4n_

nex_

tflex

_lab

8x4

n_xu

_lab

5

0

1

2

3

0

5

10

15

0

1

2

3

0

5

10

15

0.0

0.5

1.0

1.5

0

5

10

15

010203040

0

20

40

60

sample

PCT

IMPORTANCE 1−5 5−10 10−20 20−50 >50

isomiRs importance by type with pct > 1

snp snp + other

shift3p shift5p shift5p shift3p

add3p add3p + other reference

clean neb tru x4n

clean neb tru x4n

clean neb tru x4n

0

5

10

15

0.0

0.5

1.0

1.5

0

1

2

0

5

10

15

20

0

4

8

12

0.00

0.25

0.50

0.75

0

3

6

9

12

0

20

40

protocol

PCT

IMPORTANCE 1−5 5−10 10−20 20−50 >50

Tewari-synthetic: Other tools

shift3p shift5p snp

bcbiomirG

erazer3

clean neb tru x4n

clean neb tru x4n

clean neb tru x4n

0

20

40

0

20

40

0

20

40

protocol

PCT

IMPORTANCE 1−5 5−10 10−20 20−50

Tewari-synthetic: Other data

shift3p shift5p snp

tewaritewari_custom

vandijk

clean neb

nf1

nf2

nf3

nf4 tru ts1

ts2

ts3

ts4

ts5

x4n

clean neb

nf1

nf2

nf3

nf4 tru ts1

ts2

ts3

ts4

ts5

x4n

clean neb

nf1

nf2

nf3

nf4 tru ts1

ts2

ts3

ts4

ts5

x4n

0

20

40

60

0

20

40

60

0

20

40

60

protocol

PCT

IMPORTANCE 1−5 5−10 10−20 20−50

Tewari: synthetic vs human plasma

shift3p shift5p snp

human plasm

atewari

clean neb tru x4n

clean neb tru x4n

clean neb tru x4n

0

20

40

0

20

40

protocol

PCT

IMPORTANCE 1−5 5−10 10−20 20−50

Tewari-synthetic: Shift type

●

●

●●●●

●

●

●

●

●

●

●●● ●

●

●

●●●● ●●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

● ●

●●

●

●●

●● ●●●●

●

●

1 0 2 0 3 0 4 0

0 1 0 2 0 3 0 4

0 −1 0 −2 0 −3 0 −4

−1 0 −2 0 −3 0 −4 0cl

ean

neb tru x4n

clea

n

neb tru x4n

clea

n

neb tru x4n

clea

n

neb tru x4n

0

5

10

0

5

10

0

5

10

0

5

10

protocol

% o

f iso

miR

s w

ith s

hifti

ng e

vent

s

pct_cat<0.1

0.1−1

1−5

5−10

10−20

20−50

shift type

0

25

50

75

100

clea

n_ta

g_la

b5

neb_

next

_lab

1

neb_

next

_lab

3

neb_

next

_lab

4

neb_

next

_lab

5

neb_

next

_lab

9

tru_s

eq_l

ab1

tru_s

eq_l

ab2

tru_s

eq_l

ab3

tru_s

eq_l

ab5

tru_s

eq_l

ab6

tru_s

eq_l

ab8

tru_s

eq_l

ab9

x4n_

a_la

b5

x4n_

b_la

b2

x4n_

b_la

b4

x4n_

b_la

b6

x4n_

c_la

b6

x4n_

d_la

b1

x4n_

nex_

tflex

_lab

8

x4n_

xu_l

ab5

sample

% o

f miR

NAs

with

shi

ft ev

ents


Summary

✓ Each miRNA generates a diversity of isomiRs:

✓ 90% contribute to < 1% of the miRNA abundance

✓ 10% enrichment on truncations at both sides

✓ Independent of pipelines and data sets

✓ 90% of miRNAs affected

✓ Custom 4N protocols perform worst

https://github.com/miRTop/mirGFF3

https://github.com/miRTop/mirtop

https://github.com/miRTop/incubator/tree/master/projects/tewari_equimolar

https://github.com/miRTop/mirGFF3

https://github.com/miRTop/mirtop

Thomas Desvignes Phillipe Loher Karen EIlbeck Bastian Fromm Gianvito Urgese Isidore Rigoutsos Michael Hackenberg Ioannis S. Vlachos Marc K. Halushka

Jeffery Ma, Jason Sydes, Yin Lu, Ernesto Aparicio-Puerta, Shruthi Bandyadka, Victor Barrera, Peter Batzel,

Rafa Allis, Roderic Espin

Kieran O’Neill, Eric Londin, Aristeidis G. Telonis, Elisa Ficarra, John H. Postlethwait,

miRNA wars: THE NEW Hope

Summary

0

10

20

30

40

A C G Tnt

pct

position first last

●

●

●

●

●

●

● ●

●

●●

●

●

● ●

●

●

●● ●● ●●

● ●

●

●

●

●

●

●

●

●●

● ●

●●

−1 0 0 −1

<0.10.1−1

1−55−10

10−2020−50

clea

n

neb tru x4n

clea

n

neb tru x4n

05

101520

05

101520

05

101520

05

101520

05

101520

05

101520

protocol

% o

f iso

miR

s

nt a c g t

shift3p shift5p snp

dsrghum

an plasma

tewaritewari_custom

vandijk

clean le

xne

bnf

1nf

2nf

3nf

4pe

bqi

aso

m tri tru ts1

ts2

ts3

ts4

ts5

x4n

clean le

xne

bnf

1nf

2nf

3nf

4pe

bqi

aso

m tri tru ts1

ts2

ts3

ts4

ts5

x4n

clean le

xne

bnf

1nf

2nf

3nf

4pe

bqi

aso

m tri tru ts1

ts2

ts3

ts4

ts5

x4n

0

25

50

75

0

25

50

75

0

25

50

75

0

25

50

75

0

25

50

75

protocol

PCT

IMPORTANCE 1−5 5−10 10−20 20−50

big 2018 slides - lpantano.github.io · clean lex neb nf1 nf2 nf3 nf4 peb qia som tri tru ts1 ts2...

Documents