random rna interactions control protein expression in prokaryotes
TRANSCRIPT
Random RNA interactions control protein expression inprokaryotes
Paul Gardner
University of CanterburyChristchurchNew Zealand
Feel free to share what you hear
These slides are available at: http://www.slideshare.net/ppgardne/presentations
The hard work of Sinan Umu, Ant Poole & Ren Dobson
mRNA levels are imperfectly correlated with protein levels
Lu et al. (2007) Nature biotechnology.
Determinants of protein concentration
Protein concentration depends on mRNA concentration, translation anddegradation rates
DNA[D]
RNA[R]
Protein[P]
ktranscription ktranslation
kmRNA degradation kprotein degradation
0 1
AT GGCTA
AGGGGCA
ATC
TT
TACA A
GATCCGTTCCTG
AACGCAC
T GCGT CGGGA ACGTGT
T CCAGTTTCTATTTATTT
G G T G A A T G GTATTA A G C T GCAAG
G GC
AAAT
CG
AGT
CT
TTTG
AT
CAGT
TCG
TGA
TC
CT
GT
TGA A
AAACACGGTCA GC
CAG
ATGGT TT
AC
AAGCAC
GCGATT
T C T AC
TGT
T G T C C CG T CTCG C C C G G T T T CTCATCACAGTAACAACGCCGGTGGCGGTA
CCAGCAGTAA
C T A C C A TCA
TGGTAGCAG
CGC
GC A
GA A
TACT
TCC
GC
GC
AACAGGAC
AG
CGAAGAAACCG
AA
TAA
de Sousa Abreu, Penalva, Marcotte & Vogel (2009) Global signatures of protein and mRNA expression levels. MolecularBioSystems.
Two general models describe variation in translation rate
I 1. Codon usage (Ikemura, 1981)
Figure from: Tuller & Zur (2015) Nucl. Acids Res.
Two general models describe variation in translation rate
I 2. mRNA structure (Pelletier & Sonenberg, 1987)
Figure from: Tuller & Zur (2015) Nucl. Acids Res.
We think we have a third general model...
http://dx.doi.org/10.7554/eLife.13479
http://dx.doi.org/10.7554/eLife.20686
Non-coding RNAs are abundant
●●
●
●
●
●
●
●
01
23
45
log 1
0(M
ean
Rea
d D
epth
)
Core ncRNA genesCore protein coding genes
Lindgreen, Umu et al. (2014) PLOS Computational Biology.
Bacterial non-coding RNA function
Hfq
AUGSD
XRibosome
sRNA
AUG
RNase Erecruitment
AUGSD
Ribosome
Anti-antisense mechanism
Selective mRNA stabilisation
AUG
RNase E
Shine-Dalgarno sequence
Sequestration of ribosome binding site
Induction of mRNA decay
SD =
Figure by Bethany Jose
Checking for mRNA:ncRNA interactions
I Looking for regulatory interactions which are specific and small innumber, off-targets are non-specific and large in number
I Compare 5′ ends of CDS & ncRNAsI Looking for a bump on the left...
−15 −10 −5 0
0.00
0.05
0.10
0.15
0.20
0.25
Binding Energy (kcal/mol)
Den
sity
Checking for mRNA:ncRNA interactions
−15 −10 −5 0
0.00
0.05
0.10
0.15
0.20
0.25
Binding Energy (kcal/mol)
NativeShuffled (P = 7.69−52)
Checking negative controls!
−15 −10 −5 0
0.00
0.05
0.10
0.15
0.20
0.25
Binding Energy (kcal/mol)
NativeShuffled (P = 7.69−52)Different phylum (P = 0 )Downstream (P = 2.66−124)Rev. complement (P = 6.51−57)Intergenic (P = 6.16−93)
Do ubiquitous and abundant RNAs influence translation?
I Given that ncRNAs are among the most abundant RNAs in the cell([ncRNA] >> [mRNA])
I AND that RNAs frequently hybridiseI THEN maybe stochastic interactions with mRNAs inhibit translation
Corley & Laederach (2016) Bioinformatics: Selecting against accidental RNA interactions. eLife.
How can this hypothesis be tested?
I We predict that:
1. There is selection against mRNA:ncRNA interactions2. That stochastic mRNA:ncRNA interactions influence [protein]:[mRNA]
ratios
I For consistency: focus on 6 ncRNA families & 114 mRNAs/proteinsthat are highly conserved & expressed; And first 21 nts of CDS.
I Tested 1,582 bacterial & 118 archaeal genomes
Are mRNA:ncRNA interactions selected against?
−15 −10 −5 0
−0.0
10−0
.005
0.00
00.
005
0.01
00.
015
Binding Energy (kcal/mol)
Den
sity
Diff
eren
ceActinobacteria (n:163) P = 9.8x10−69
Bacteroidetes (n:60) P = 8.7x10−148
Chlamydiae (n:38) P = 1.4x10−193
Cyanobacteria (n:40) P = 3.8x10−11
Firmicutes (n:378) P = 0
Proteobacteria (n:756) P = 0
Spirochaetes (n:38) P = 1.6x10−98
Archaea (n:118) P = 4.2x10−177
Background (n:100)
More stable interactions
Nat
ive in
tera
ctio
nsSh
uffle
d in
tera
ctio
ns
Act
Bac
Chl
Cya Fi
rPr
oSp
iAr
c
010
2030
40
−log
10P
Do mRNA:ncRNA interactions influence proteinexpression?
●
●
●
●
●
●●●
●
●●●●
●●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
● ● ●●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
2.0
2.5
3.0
3.5
4.0
−300 −250 −200 −150
Rs=0.65
log 1
0(flu
ores
cenc
e)
Avoidance (kcal/mol)Expression data from: Kudla et al. (2009) Science.
Do mRNA:ncRNA interactions influence proteinexpression?
I Testing the relationship between protein abundance estimates andavoidance, mRNA secondary structure, codon usage and mRNAabundance
GFP datasets Mass-Spec datasets
E.coli
(n=
52
)G
FP/q
PC
R
E.coli
(n=
15
4)
GFP
/Nort
hern
E.coli
(n=
14
,23
4)
mC
herr
y/R
NA
seq
E.coli
(n=
38
9)
MS
/mic
roarr
ay
E.coli
(n=
3,3
01
)M
S/m
icro
arr
ay
P.aeru
ginosa
(n=
5,4
79
)M
S/m
icro
arr
ay
P.aeru
ginosa
(n=
1,1
48
)M
S/m
icro
arr
ay
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*P < 0.05
0.0 0.60.2 0.4-0.2
Correlation CoefficientAvoidance
Secondary Structure
Codon
[mRNA]
Testing the extremes of expression
0.1
0.5
0.8
1.2
1.6
1.9
2.3
2.6 3
3.3
3.7
4.1
4.4
4.8
Freq
0
20
40
60
80
100
120
A
log10([Protein]/[mRNA])
Freq
uenc
y
low expression (n=10)high expression (n=10)
B
Avoi
danc
e
Cod
on
Sec.
Str.
Nul
l
Sec.
Str.
Cod
on
Avoi
danc
e
−2
−1
0
1
2
*
*
Z sc
ore
low expression (n=10)high expression (n=10)
I E. coli genes (n = 389)
Designing mRNAs
I 239aa GFP can be encoded by 7.62x10111 synonymous mRNAs
I Extremes of avoidance have a stronger effect than codon usage orsecondary structure
●
● ●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
4.2
4.3
4.4
4.5
4.6
4.7
0.60 0.65 0.70 0.75 0.80 0.85CAI
log 1
0(flu
ores
cenc
e)
Rs=0.29
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
4.2
4.3
4.4
4.5
4.6
4.7
−15 −10 −5 0Folding Energy (kcal/mol)
Rs=0.34
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
4.2
4.3
4.4
4.5
4.6
4.7
−350 −300 −250 −200 −150 −100Binding Energy (kcal/mol)
Rs=0.56
hi low●
●
●
●
●
●
AvoidFoldCodonOptimal●
Avoidance in 3D on the ribosome
I Protein binds to regions with low avoidance (green) while exposedregions are high avoidance (blue): P = 9.3x10−15, Fishers exact test
Further Work
I Further work:I Testing adaptation with experimental evolution experimentsI Do mRNA:ncRNA interactions influence eukaryotic gene expression?
I Number of possible interactions increases quadratically with number ofgenes. May require spatial & temporal separation of genes
I Does avoidance drive compartmentalisation and increases in nucleotidebinding proteins?
I Do mRNA:ncRNA interactions influence viral infection, hybridisation,HGT & transformation expts?
I Are protein, DNA and protein:nucleotide interactions also avoided?
And now for something completely different...
Bioinformaticians are horrible!
I Bioinformaticians are bad, impatient & intolerantI Build a phylogenetic tree: which of the 172 methods do you use?
MBIOREANC-GENEBAli-PhyBAMBEBayesPhylogeniesBEASTBESTBio++bms_runnerburntreesCadenceCruxIMa2MesquiteMrBayesMrBayesPluginMrBayes-tree-scannersMultidivtimep4SIMMAPPALtracerPAMLVanillaPHASEPHYLLABPhyloBayes
ARBBionumericsBIRCHBosqueBPAnalysisCAFCACRANNDAMBEEMBOSSTNTFootPrinterFreqparsGambitGAParsGelCompar-IIGeneTreegmaesHennig86IDEALVBMALIGNMEGAMesquiteMurka
NetworkNimbleTreeNONANotungParsimovPASTPAUP*PAUPRatPaupUpphangornPHYLIPPhyloNetPhylo_winPOYPRAPPSODARASeaViewSeqStateSimplotsogTCS
Parsimony Maximum Likelihood BayesianALIFRITZaLRTARBBio++BionumericsBIRCHBootPHYMLBosqueCodeAxeCoMETConcaterpillarCONSELCruxDAMBEDARTDarwindnaratesDPRMLDT-ModSelEMBOSSEREMfastDNAmlfastDNAmlRevFASTML
FastTreeGARLIGZ-GammaHY-PHYIQPNNIKakusan4LeaphyMac5McRateMesquiteMetaPIGAMixtureTreeModelfitModelGeneratorMOLPHYMrAICMrModeltestMrMTguiMultiPhylNEPALNHMLnhPhyMLNimbleTreep4
PALPAMLPARATPARBOOTPASSMLPAUP*PAUPRatPaupUpphangornPHYLLABPhyloCoCoPhylo_winPHYMLPhyML-MultiPhyNavPHYSIGPLATOPorn*PRAPPROCOVProtTestPTPr8s-bootstrapRate4Site
Rate-evolutionRAxMLraxmlGUIRevDNAratesrRNA-phylogenySeaViewSegminatorSEMPHYSeqPupSeqStateSIMMAPSimplotSLRSpectronetSpectrumSplitsTreeSSATipDateTreefinderTREE-PUZZLEVanilla
How can we choose software?
I Which methods do you use?
Approach software like a scientist
I Are any good controls available?I Positive: databases, publications,
simulation, ...I Negative: randomized, select
relevant negative data, ...
I Some common accuracy metrics:I Sensitivity (true positive rate)I Specificity (true negative rate)I Mathew’s correlation coefficientsI Area under an ROC curve
False positive rate
True
pos
itive
rat
e
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
DBS, PfamDBS, TreefamDBS, CustomPROVEANPolyphen−2SIFTFATHMM, weightedFATHMM, unweighted
Wheeler et al. (2016) A profile-based method foridentifying functional divergence of orthologous genesin bacterial genomes. Bioinformatics.
Benchmarks are useful, and fun...
Is there really a relationship between software speed &accuracy?
I Can we run a meta-analysis of bioinformatic benchmarks?I If speed isn’t related to accuracy, then what is?I Some possibilities:
I Software ageI Journal “impact” (IF & GoogleScholar H5)I Number of citationsI Corresponding author’s H-index & M-index
After some literature mining...
I found 43 matching articles.
I 102 benchmarksI Accuracy & speed ranks for 243 bioinformatic software tools
I Manually extracted IF, H, age, ...
I 65 journals (Bioinformatics, NAR, Genome Research, ...)
I 151 author GoogleScholar profilesabyss antepiseeker apg barry bellerophontes bfast bismark biss boost bowtie bowtie2 bowtiestar bratbw bsmap
bsmooth bsseeker buckycon buckymrbayes buckymrbayesspa buckypop buckyraxml builder bwa bwasw caml camp carma
ce celera clark clc clustalomega clustalw comus coprarna coral cosine crisp cro cromwell cufflinks cwt dali
de dexseq dialign dialign22 dialignt dialigntx diffsplice diginormvelvet dima djigsaw downhillsimplex dsgseq
ebi echo edenanonstrict edenastrict edit epimode ericscript erpin fa fasta fasttree fisherexacttest
fusioncatcher fusionmap gassst gatk genometa gojobori goldman gossamer gottcha greedyft gsnap heidge hitec
hmmer hshrec idbaud igtpduplossft inchworm infernal intarna jaffa kalign kbsps kraken kthse leidnl limpic
lmat lms lofreq lsqman mafft mafftfftns mafftfftns2 mafftlinsi mapsplice maq mats megan metaphlan metaphyler
methylkit methylsig mgrast minia mira mirdeep mireap mirena mirexpress mlclustalw mlclustalwquicktree mlmafft
mlmafftparttree mlmuscle mlopal mlprankgt modellerv mosaik motu mpest mpjclustalw mpsclustalw mrfast mrpml
mrpmp mrsfast msinspect multalin muscle musclemaxiters mzmine nbc ncbiblast nest newbler nfuse novoalign
oases onecodex openms pairfold paralign pass perm phylonetft phylopythias phymmbl piler poa poy poystar
pragcz probalign probcons probtree process pso pt qiime qsra quake raiphy ravenna raxml raxmllimited
rdiffparam repeatfinder repeatgluer repeatscout reptile rmap rnacofold rnaduplex rnahybrid rnaplex rnaup
rsearch rsmatch sam sate scro scwrl scwrlcons segemehl segmodencad seqgsea seqman seqmap sga sharcgs shrimp
simulatedannealing sl smalt snap snpruler snver soap soap2 soapdenovo soapec soapstar spades sparse
sparseassembler spcomp specarray spt srmapper ssaha ssake ssap ssearch ssm sst st starbeast strcutal
swissmodel taipan targetrna targetrna2 taxatortk tcoffee team tmap tophatfusion transabyss trinity upmes
varscan vcake velvet wmrpmp woodhams wublast xalign xcmswithcorrection xcmswithoutretentiontime zema
Nothing is correlated with accuracy!
Rel. ag
eYe
ar
Accurac
ySpe
ed JH5 JIF Cite
s
Rel. cit
es
H−inde
x
M−inde
x
Rel. ag
eYe
arAccurac
ySpe
edJH5JIF
CitesRel. cit
esH−inde
xM−in
dex
Rel. ag
eYe
ar
SpeedJH
5 JIFCites
Rel. cit
es
H−inde
x
M−inde
xX X X X X XX X X X X X X
XX
X X X XX X X X X X XX X X X X X XX X X X X X X XX X X X X X
X X X X X
Correlates with accuracy rank
Spea
rman
's rh
o
−0.2
−0.1
0.0
0.1
0.2
xxx
x
x
x
x
x
x
xxxx
x
x
x
x
x
xx
xx
xxx
x
xxxxx
xxx
xxx
x
x
xxx
xxx
x
x
xx
xxx
x
xxxx
xx
xxx
x
xx
xx
x
x
xx
x
x
x
xxx
xxx
xxxxxxxxxx
x
x
x
x
x
xxxx
x
x
x
xxxx
x
xxxx
xx
xxx
x
x
xxx
xx
xxx
xx
x
x
x
x
x
xx
x
xxx
x
xxxxxxx
x
xxxxxxxxx
xx
xxx
x
x
x
xxxxxxxxxxxx
xx
xxxxxxxx
x
x
x
xx
x
x
x
x
x
xx
x
xx
x
xx
x
xx
x
xxxxxx
xx
xxxxxxx
x
xxxxxx
x
x
x
x
x
xx
x
xx
x
x
x
xxxxx
x
x
x
xxxxx
x
x
x
x
x
xxx
x
x
xx
x
x
xx
x
xxxx
x
xx
x
x
x
xx
x
x
x
xxx
x
xx
x
xxx
x
x
x
x
xxx
x
xxx
x
xx
x
x
xx
xx
xx
xxxxx xxxxx
x
xxxxxxx
x
xxxx
xxxxxx
x
xxxxxxxx
x
x
xx
x
xx
xxxx
xx
x
xxxxxxx
x
xxx
x
xxxxxx
x
x
x
x
x
xxx
x
xx
x
xxxxx
x
xxx
xxxxxxx
x
xx
x
xxxxxxxxxxx
xxxxxxxxxxxx
x
x
xx
x
xx
x
xxxxxxxxxxxxxxxx
x
x
x
x
x
xx
x
xxxxxxxx
x
xx
xx
x
x
x
x
x
xxx
x
xxx
xx
xxxxxxxxxxx
x
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
x
xxxxxxxx
x
x
x
xxxxx
x
x
x
xxxxxxxxxx
x
x
x
xxxxx
x
x
x
xxxxxxxxxxx
xxxxxxx
xxxxxxxx
x
xxxxxxxxx
x
xxxxxxxxxxxxx
x
x
xx
xxxxxx
x
xxxxxxx
x
xxxxx
x
xxxxx
x
x
xxxxx
x
x
x
xxx
x
xxx
xx
xxxxxxxxxx
xxxxxxxx
x x
xx
x
x
x
x
xx
xx
xxxxxx
xx
x
xxxxx
xx
x
xxxxx
xxxxxxxxxxxxx
xxxxxxxxxxxxx
xx
x
x
xxxx
x
xxxxxxxxxxxxx
x
x
x
x
xx
x
x
xxx
x
xxxxxx
xxxxxx
x
x
xxxx
x
x
x
x
xxxxxxxxxx
x
xxxxx
xxxxx
x
x
xxx
x
xxx
x
xxxx
x
xx
x
xxx
x
x
xxx
x
x
x
xxxx
xx
x
xxxxx
x
x
xxx
x
x
xxxx
x
xx
x
xxxx
x
xx
xxx
xxx
x
xx
x
xxxxxxxxxxxxxxxx
x
xx
x
x
x
xxx
x
xxxxxxxxxxx
x
xx
x
xx
x
xx
xxxxxxxxxx
x
x
xxx
xx
x
x
x
x
x
x
xxxxx
x
xxxxxx
x
x
x
xxxxxx
x
xxx
x
x
x
xxxxx
x
x
xxx
xxx
xxx
x
xxx
xxx
xx
xx
xxx
x
x
x
x
x
x
xxxxx
x
xxxxxxx
xxx
x
x
x
xx
x
xxx
x
xxxxxxxxxxxxxx
x
xxx
x
xx
x
x
xxx
x
xxxxxxxxxxx
x
x
xx
x
xx
x
xxx
x
xxxxxxxxxxxxxx
x
x
x
xx
x
xxxxxx
xxx
x
xxxxxxx
xx
xxx
x
x
xx
xxxx
x
x
xxxxx
x
x
x
xxxx
x
xxxx
x
x
x
x
xx
x
x
x
x
xxx
x
x
xx
xx
x
xxxxxxxx
x
x
xx
x
xx
x
xxxx
xxx
x
x
xxxxxxxxxx
x
x
xxxx
x
xxxx
x
x
x
xx
x
xx
x
xx
x
x
xxxxxxxxxx
x
x
xx
x
xx
xx
x
xxxx
x
xxx
xx
x
x
x
x
xx
x
xxxx
x
x
x
xxx
x
xxxxxx
x
x
x
xxxxxx
x
xxxxxxxxx
x
xxxx
x
xxx
x
x
x
xx
xxxxx
xxxxxxx
xx
x
xxxxxxxx
x
x
x
x
xxxx
x
xxxx
x
x
x
xx
x
xxx
x
x
x
xx
x
xxxxxxx
x
xxxxxxxxxxxxxx
x
xxx
x
x
x
x
x
x
x
xxx
x
xxx
x
xx
x
x
xx
xxxx
x
xxxxx
x
xxxxxx
xx
x
x
xx
x
xx
xxx
xxx
x
xxxxxxxxxxx
x
xxxxx
xxx
x
xx
x
xx
xx
x
xx
x
x
x
x
x
x
xxxxxx
xxx
x
x
xx
x
x
x
x
x
x
xxx
x
x
x
xx
x
x
x
x
x
x
xxxx
x
xx
x
xxx
x
x
x
xx
xxxx
x
x
xx
x
x
xxx
x
xxx
x
x
x
x
x
x
x
xxxxxxxx
x
x
x
x
xx
x
x
x
xxxxx
x
xxx
xx
xxxx
x
xx
xxxxx
x
xxxxx
xx
x
x
xxxxxxx
x
x
x
x
x
xxxxx
x
x
xxxx
x
x
x
x
xx
x
x
x
x
xxx
xx
x
xxxxx
x
x
x
xxxxxx
xx
x
x
xx
xxx
x
x
xxx
x
x
x
x
x
x
xx
x
x
x
xxx
xxx
x
x
xx
xx
x
xx
xxxxxxxx
x
xxxxx
x
xxxxx
xx
xxxxx
x
xx
x
xxxx
x
xxxxx
x
x
xx
x
x
x
x
x
x
x
x
xx
x
x
x
xx
x
x
x
x
x
x
x
xxx
xxx
xxx
x
x
xx
x
xxxxxxx
x
xxx
xx
x
xx
x
xxxx
x
x
x
x
x
xx
xx
x
x
x
xxx
x
xx
xx
xx
xxx
x
x
xx
x
x
x
x
x
xx
xxx
xx
xx
x
x
x
xxx
xx
x
x
x
x
x
x
xxxxx
x
xx
x
x
xxxxx
x
xxxxx
xx
x
x
x
x
xx
x
x
x
xxx
xxxxx
x
x
xx
x
xx
x
x
x
xx
x
x
xxxxxxxx
x
xxxxxxxx
x
xxxxx
xx
xxxx
xxx
xxx
x
x
x
x
xxxx
x
xx
x
xx
x
xxx
x
x
xxx
xx
xxxx
xxxxx
xxxxx
x
xx
xxx
x
xxxx
xxxxx
x
x
xx
xx
xx
x
x
x
x
x
x
xx
x
x
xx
x
x
xxxxxxxx
x
x
x
x
xxx
x
xxx
x
xxx
xx
xxx
x
xx
x
xx
x
x
x
xx
x
xxxxx
x
x
x
x
x
xxxx
x
xx
x
xx
xx
xx
x
xx
xx
x
xx
x
xxxx
xx
xx
xx
x
xxx
x
x
x
xx
xx
xx
xxx
x
xx
x
xxx
x
x
x
xxx
x
xxx
x
xx
x
x
xxx
x
xxx
xx
xx
x
xx
xxx
xxx
x
xxx
xx
xxxxx
x
x
xxx
x
x
xx
x
xxx
xx
x
x
x
xxx
x
xxxx
x
x
x
x
xxxx
x
x
xxxxx
x
x
xxx
x
x
x
x
x
x
xx
x
x
x
xx
xx
xxxx
x
xxxxxxxx
x
xxxx
x
xxxx
x
xxxx
x
x
x
x
x
x
xxx
x
x
xxx
xx
xxx
x
xxx
x
x
xx
xx
xx
x
xxxxxx
x
x
xx
x
x
x
x
x
xx
x
xx
x
xxxxx
x
xx
x
xxx
x
xxx
x
x
xx
xx
xx
xxxxxxx
x
x
x
x
xxx
x
x
x
xx
x
x
xxxx
x
x
x
x
xx
xx
x
x
x
x
x
xxxxxxxx
xx
xxx
x
x
x
x
xx
xxxx
x
xx
x
xxx
xxxxx
x
xxxxxxxxx
x
x
x
xxxx
xx
x
x
xx
x
xx
x
x
xx
xx
x
xx
x
xx
xx
xx
xxxxx
xx
x
x
x
x
xxx
x
x
xxxxx
x
x
xxx
x
x
xxxx
xxxx
xxxxx
xx
x
xx
x
xx
x
xx
xx
xxx
x
xx
x
x
xx
xx
xxx
x
x
x
xx
x
xxxx
xxx
xxxxxx
x
x
xx
xx
x
x
xxx
xxxx
x
x
x
x
x
x
x
x
xxx
x
xxx
xxxxx
x
x
xxxx
x
x
xxxx
xx
xxx
x
x
xx
x
x
x
x
x
x
x
x
x
xxx
x
x
xx
x
x
xxx
x
xx
x
xxxxx
x
x
xxx
x
xx
xx
xx
xx
xxxx
xxxxxxxxxxxxxx
x
xx
xxx
xxx
x
xxx
x
xx
x
x
x
x
xxx
xxx
xxx
xxx
x
xxx
x
xxx
xx
x
x
xxxx
x
xxxxxx
x
x
xx
xxxxxxxx
x
x
xx
xx
x
xx
xx
xxx
xxx
x
x
xxx
x
xxxx
xx
xx
x
x
xx
xxx
x
xxxxxxxxxxx
xx
x
x
xx
x
x
xxxx
x
xxx
xxx
x
x
xx
x
xx
xxxxx
xx
xxxx
xx
x
x
x
xx
xxx
x
x
x
x
xx
x
xx
xx
x
x
x
x
x
xx
x
x
xx
x
x
xxxx
xx
x
x
x
xx
x
x
x
x
x
x
xx
x
xxxxxxx
x
x
xx
x
x
x
xx
xxxx
x
x
x
x
xxxxx
x
x
x
xxx
x
x
xxxx
x
xxx
x
xx
x
x
x
x
xxxxxx
x
x
x
x
xxxxx
xxx
xx
x
xxx
x
xxxxx
x
x
xx
x
x
x
xxx
x
xx
xx
x
xx
x
x
xx
x
xxx
x
x
x
xxxx
x
xxxxxx
x
x
x
x
x
xxx
x
x
x
xxxx
x
x
x
xx
x
x
x
x
xxxxx
x
xx
x
x
x
x
xxxxx
x
x
xxxx
x
xx
x
x
xxx
x
xxx
xx
x
x
xx
x
x
xx
x
xx
xx
x
x
xxx
x
xx
xx
x
xxxxxx
xx
xx
x
x
xx
xxx
x
x
xxx
xxx
x
x
x
x
x
x
x
x
xxx
x
xxxx
x
xx
x
xxx
x
x
xx
x
x
x
xxx
x
xxx
xx
x
x
x
xx
xx
xxx
x
x
xx
x
x
xx
xxx
x
xxx
x
x
xxx
x
x
x
x
x
xx
x
x
x
xx
xxxxxx
xxxxxx
xxx
xxx
x
x
x
x
xx
xxx
x
x
xx
x
xx
xx
xxx
xxxxx
x
xx
xx
xx
x
x
xx
xx
xxxxxx
x
x
x
xxx
xx
x
x
xx
x
x
x
xx
xxxxx
x
xx
xxx
x
x
x
x
xx
x
x
xx
x
xx
xx
x
x
xx
xx
xx
x
x
x
x
xx
x
x
x
x
xx
x
xxx
x
xx
x
x
xxx
xx
x
xxx
x
x
x
xxxxx
x
xxxx
x
x
x
x
x
x
x
xx
xxxx
x
xx
x
xx
xxxxx
xxxx
xx
x
xxx
x
x
xx
x
x
x
x
xx
xx
x
x
x
xxx
x
xx
xx
xxxx
xx
x
x
x
xxxx
x
x
x
x
xxxxx xxx
xxxxxxxxxx
x
xxxx
x
xxxx
x
xxxxxxxxxxxx
x
xx
xx
xxx
x
x
xx
x
xx
x
xx
x
xxx
x
xxxx
x
x
x
x
xx
x
xx
x
xx
x
xx
xxx
xxx
x
x
x
xx
x
xxxx
xx
xxxx
xx
x
xx
x
xx
xxx
x
xxxx
xx
x
x
x
xxx
xx
xx
xxxxx
x
x
x
x
x
xx
x
x
xx
xx
x
x
x
xxxx
x
x
x
x
xx
xxxx
xxx
xxx
x
x
x
x
x
xxxx
xxxxxx
xx
x
xx
xxxx
x
x
x
xx
x
x
x
xx
x
x
xxx
x
xx
x
x
xxx
x
xxxxx
x
xxx
xx
x
xxx
xxxx
x
x
x
x
x
xx
xx
xx
x
x
x
x
xx
x
xx
xxx
x
xx
xxxx
x
xx
xxxx
xxx
xx
xxx
x
xxxxx
x
x
xx
x
x
x
x
xxx
xx
x
x
x
xxxx
x
x
xx
x
xx
x
x
x
xxxxxxx
xx
xxx
x
x
x
xx
x
xxxxx
xx
xx
-1 0 1
Spearman's rho
A B
-3 30
Z-score
Speed
Accuracy
Freq
.0 6 12
010002000
Freq
.
0 6 12
010002000
Freq
.
0 20
05000
10000
10
Freq
.
0 6 12
010002000
Freq
.
0 6 12
010002000
X
X X
X X
X X
X X X X X
X X
X X
X X X
X X
Conclusions
I Speed is NOT reflective of accuracyI Neither is author/journal reputation, software age & # citations
I The only reasonable way to select software is by benchmarking
I Publication bias is influencing software accuracy
I It doesn’t matter how famous you are, you can still write great software!
Thanks!
I Avoidance: Sinan Umu, Anthony Poole & Renwick Dobson
I Meta-benchmark: James Paterson, Fatemeh Ashari Ghomi, Sinan Umu,Stephanie McGimpsey, Aleksandra Pawlik
Umu, Poole, Dobson & Gardner (2016) Avoidance of stochastic RNA interactions can be harnessed to control protein expressionlevels in bacteria and archaea. eLife.Gardner et al. (2017) A meta-analysis of bioinformatics software benchmarks reveals that publication-bias influences softwareaccuracy. In preparation.
These slides are available at: http://www.slideshare.net/ppgardne/presentations