comparative ribosome profiling reveals extensive ... · comparative ribosome profiling reveals...

15
Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages Juan-Jose ´ Vasquez 1 , Chung-Chau Hon 2,3 , Jens T. Vanselow 4 , Andreas Schlosser 4 and T. Nicolai Siegel 1, * 1 Research Center for Infectious Diseases, University of Wuerzburg, Wuerzburg 97080, Germany, 2 De ´ partement Biologie cellulaire et infection, Institut Pasteur, Unite ´ Biologie Cellulaire du Parasitisme, Paris 75015, France, 3 INSERM U786, Paris 75015, France and 4 Rudolf Virchow Center, University of Wuerzburg, Wuerzburg 97080, Germany Received November 7, 2013; Revised December 17, 2013; Accepted December 18, 2013 ABSTRACT While gene expression is a fundamental and tightly controlled cellular process that is regulated at multiple steps, the exact contribution of each step remains unknown in any organism. The absence of transcription initiation regulation for RNA polymer- ase II in the protozoan parasite Trypanosoma brucei greatly simplifies the task of elucidating the contri- bution of translation to global gene expression. Therefore, we have sequenced ribosome-protected mRNA fragments in T. brucei, permitting the genome-wide analysis of RNA translation and trans- lational efficiency. We find that the latter varies greatly between life cycle stages of the parasite and 100-fold between genes, thus contributing to gene expression to a similar extent as RNA stability. The ability to map ribosome positions at sub-codon resolution revealed extensive translation from upstream open reading frames located within 5 0 UTRs and enabled the identification of hundreds of previously un-annotated putative coding sequences (CDSs). Evaluation of existing proteomics and genome-wide RNAi data confirmed the translation of previously un-annotated CDSs and suggested an important role for >200 of those CDSs in parasite survival, especially in the form that is infective to mammals. Overall our data show that translational control plays a prevalent and important role in different parasite life cycle stages of T. brucei. INTRODUCTION Eukaryotic gene expression is regulated at multiple levels, including extensive and elaborate post-transcriptional regulation that affects RNA stability, protein translation and protein turnover (1). How much these individual levels contribute to the final outcome, the steady state level of a protein, is not known for any organism. The need for regulation beyond the level of transcription and RNA stability, for example at the level of translation or protein stability, has become evident from transcriptome and proteome studies that have revealed only weak correlations between mRNA and protein levels in various organisms, including yeast, humans and the proto- zoan parasite Leishmania major (2–6). Furthermore, mRNA translation has recently been found to be the most extensively regulated step in mammals (7). L. major and the related kinetoplastids Trypanosoma cruzi and Trypanosoma brucei are the causative agents of leishmaniasis, Chagas disease and sleeping sickness, which are significant human diseases leading to the death of tens of thousands of people worldwide every year (8,9). Interestingly, the genome organization of kinetoplastids differs from that of other eukaryotes in that most protein-coding genes are transcribed from long polycistronic transcription units (PTUs). These PTUs can encompass >100 mostly functionally unrelated genes (10–13). Maturation of polycistronic RNA precursors into functional mRNA transcripts involves trans-splicing of a 39 nt leader sequence to the 5 0 end and polyadenyla- tion of the 3 0 end (14–16). The absence of promoter sequence motifs, the organization of genes in PTUs and an apparently conserved open chromatin structure *To whom correspondence should be addressed. Tel: +49 931 31 88 499; Fax:+49 931 318 25 78; Email: [email protected] Published online 17 January 2014 Nucleic Acids Research, 2014, Vol. 42, No. 6 3623–3637 doi:10.1093/nar/gkt1386 ß The Author(s) 2014. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Upload: nguyennhi

Post on 12-Aug-2019

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

Comparative ribosome profiling reveals extensivetranslational complexity in different Trypanosomabrucei life cycle stagesJuan-Jose Vasquez1 Chung-Chau Hon23 Jens T Vanselow4 Andreas Schlosser4 and

T Nicolai Siegel1

1Research Center for Infectious Diseases University of Wuerzburg Wuerzburg 97080 Germany2Departement Biologie cellulaire et infection Institut Pasteur Unite Biologie Cellulaire du ParasitismeParis 75015 France 3INSERM U786 Paris 75015 France and 4Rudolf Virchow Center University ofWuerzburg Wuerzburg 97080 Germany

Received November 7 2013 Revised December 17 2013 Accepted December 18 2013

ABSTRACT

While gene expression is a fundamental and tightlycontrolled cellular process that is regulated atmultiple steps the exact contribution of each stepremains unknown in any organism The absence oftranscription initiation regulation for RNA polymer-ase II in the protozoan parasite Trypanosoma bruceigreatly simplifies the task of elucidating the contri-bution of translation to global gene expressionTherefore we have sequenced ribosome-protectedmRNA fragments in T brucei permitting thegenome-wide analysis of RNA translation and trans-lational efficiency We find that the latter variesgreatly between life cycle stages of the parasiteand 100-fold between genes thus contributing togene expression to a similar extent as RNA stabilityThe ability to map ribosome positions at sub-codonresolution revealed extensive translation fromupstream open reading frames located within 50

UTRs and enabled the identification of hundreds ofpreviously un-annotated putative coding sequences(CDSs) Evaluation of existing proteomics andgenome-wide RNAi data confirmed the translationof previously un-annotated CDSs and suggestedan important role for gt200 of those CDSs inparasite survival especially in the form that isinfective to mammals Overall our data show thattranslational control plays a prevalent and importantrole in different parasite life cycle stages ofT brucei

INTRODUCTION

Eukaryotic gene expression is regulated at multiple levelsincluding extensive and elaborate post-transcriptionalregulation that affects RNA stability protein translationand protein turnover (1) How much these individuallevels contribute to the final outcome the steady statelevel of a protein is not known for any organism Theneed for regulation beyond the level of transcription andRNA stability for example at the level of translation orprotein stability has become evident from transcriptomeand proteome studies that have revealed only weakcorrelations between mRNA and protein levels invarious organisms including yeast humans and the proto-zoan parasite Leishmania major (2ndash6) FurthermoremRNA translation has recently been found to be themost extensively regulated step in mammals (7)L major and the related kinetoplastids Trypanosoma

cruzi and Trypanosoma brucei are the causative agents ofleishmaniasis Chagas disease and sleeping sickness whichare significant human diseases leading to the death oftens of thousands of people worldwide every year (89)Interestingly the genome organization of kinetoplastidsdiffers from that of other eukaryotes in that mostprotein-coding genes are transcribed from longpolycistronic transcription units (PTUs) These PTUscan encompass gt100 mostly functionally unrelated genes(10ndash13) Maturation of polycistronic RNA precursorsinto functional mRNA transcripts involves trans-splicingof a 39 nt leader sequence to the 50 end and polyadenyla-tion of the 30 end (14ndash16) The absence of promotersequence motifs the organization of genes in PTUs andan apparently conserved open chromatin structure

To whom correspondence should be addressed Tel +49 931 31 88 499 Fax +49 931 318 25 78 Email nicolaisiegeluni-wuerzburgde

Published online 17 January 2014 Nucleic Acids Research 2014 Vol 42 No 6 3623ndash3637doi101093nargkt1386

The Author(s) 2014 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby30) whichpermits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

surrounding RNA polymerase II (pol II) transcriptionstart sites (1217) strongly indicate a lack of regulationof gene expression at the level of pol II transcriptioninitiation This lack of transcriptional regulation shouldgreatly facilitate the effort to quantify the extent towhich different post-transcriptional mechanisms contrib-ute to gene expressionDespite the lack of transcriptional control L major

T cruzi and T brucei are capable of tightly regulatinggene expression throughout their complex life cycleswhich requires the adaptation of the parasites to thelarge environmental differences found between the insectvectors and mammalian hosts (1819) In T brucei thesedifferences include among others a change in tempera-ture from 37C in the mammalian host to 27C in theinsect vector and a change in the availability of glucosethe preferred energy source of the parasite (2021)Consequently significant effort has been invested inelucidating the post-transcriptional mechanisms of genecontrol in these parasites Work from several laboratorieshas led to the identification of sequence motifs capable ofmodulating RNA stability or translational efficiency in alife cycle-specific manner (22ndash27) a bias in codon usagesuggested to affect translational efficiency (28) and agenome-wide analysis of mRNA stability has found half-lives among transcripts to range over two orders of mag-nitude in T brucei (29)A comparable study to evaluate the degree of regulation

at the level of protein translation has not been performedeven though multiple observations suggest translationalcontrol as an important regulator of gene expressionFor example kinetoplastid genomes encode moreproteins involved in translation initiation control thanthose of yeast or many metazoans including two tothree homologues for poly(A) binding proteins as well asfour isoforms of the translation initiation factor (eIF4E)and five eIF4G isoforms (30ndash34) Moreover RNA-sequencing analyses performed in T brucei revealed wide-spread alternative trans-splicing resulting in multipletranscripts with varying 50 untranslated region (UTR)lengths for a particular gene (133536) These findingssuggest that variations in 50 UTR lengths may lead tothe in- or exclusion of regulatory elements that influencetranslational efficiency (37) Such regulatory elementscould be small so-called upstream open reading frames(uORFs) or micro ORFs which are ORFs located in the50 UTR of mRNA In eukaryotes translation initiationinvolves the recruitment of a pre-initiation complex tothe 50 end of mRNA followed by scanning of the UTRto locate the first AUG start codon (38) uORFs can affectthe scanning process and have been shown to influencetranslation of the downstream coding sequence (CDS)by affecting both the rate and the site of translation initi-ation (3940) Based on RNA-sequencing data in T brucei20 of 50 UTRs contain at least one uORF (3536) Inaddition it has been shown for a luciferase reporterthat the removal of an upstream start-codon can lead toa 7-fold increase in protein levels (41)In the past approaches to identify and quantify

translated RNA transcripts have focused on the isolationof RNA transcripts from polyribosome fractions (42)

While this approach has been successfully used to demon-strate differential translation of RNA in T brucei (4344)we have adapted a higher resolution approach termedribosome profiling that was recently developed (45) Thisapproach is based on high-throughput sequencing ofribosome-protected RNA fragments so-called ribosomelsquofootprintsrsquo RNA not protected by ribosomes isremoved by nuclease digestion ribosome footprints areconverted into libraries of DNA molecules suitable forhigh-throughput sequencing and the abundance of indi-vidual footprints is determined by deep sequencing Inyeast mice and Escherichia coli measurements of theaverage ribosome occupancy (ribosome density profiles)have been successfully used to estimate the rates of trans-lation (rates of protein synthesis) (45ndash47) Ribosomedensity while not perfect has been shown to be a muchbetter predictor of protein levels than measurements ofmRNA levels (45)

Importantly this approach not only provides quantita-tive information on the number of RNA moleculesassociated with ribosomes it also yields position-specificinformation regarding the location of ribosomes onmRNA transcripts This information is critical as associ-ation of an RNA transcript with a ribosome does notnecessarily mean that the transcript is being translatedThe ribosome could be associated with the 50 UTR or itcould be stalled failing to produce functional protein Inaddition ribosome positions can be used to accurately mapthe correct CDSs and help identify novel CDSs that havebeen missed in previous genome annotations (48)

Here we report the adaptation of a ribosome profilingapproach to T brucei Comparative ribosome profilinganalyses of the bloodstream and procyclic parasite formsallowed us to generate a genome-wide picture of transla-tion We observed a 100-fold range in translationalefficiency among genes and life cycle-specific differencesin translational efficiency for a large number of genes Inaddition our ribosome profiling data enabled the identi-fication of hundreds of previously un-annotated CDSsand incorrectly annotated translation initiation sites andsuggest a regulatory role of uORFs

MATERIALS AND METHODS

T brucei culture and cell harvest

The procyclic form (PF) of T brucei strain Lister 427 werecultured at 27C in SDM-79 supplemented with 10 fetalbovine serum and hemin (75mgl) medium (21) up to adensity of 107 cellsml Wild-type bloodstream form (BF)of Lister 427 (MITat 12 clone 221) were cultured at 37Cin HMI-11 up to a cell density of 15 106 cellsml Twominutes before harvest cycloheximide was added to a finalconcentration of 100 mgml Cells were collected by centri-fugation at 3000 g 4C for 5min washed with polysomelysis buffer (10mM Tris-HCl pH 74 300mM KCl and10mM MgCl2) transferred to a 15-ml microcentrifugetube and pelleted To lyse the cells 360ml polysome lysisbuffer 40 ml of 10 n-octylglycoside and 20 units ofTURBO DNaseI (Ambion) were added per 109 cells andincubated for 30min on ice The lysate was centrifuged at

3624 Nucleic Acids Research 2014 Vol 42 No 6

16 000 g 4C for 10min the supernatant transferred to anew microcentrifuge tube and the OD260 was determinedusing a Nanodrop 2000

Preparation of RNA footprint and mRNA sequencinglibraries

For both BF and PF 200ml aliquots of the lysate(OD260=40) were digested with RNase I (Ambion) atRT (1200 units) or on ice (1600 units) After 1 h the diges-tions were stopped by adding 100 units of SUPERaseInRNase inhibitor (Ambion) to the RNase-treated samplesIn parallel 100 units of SUPERaseIn RNase inhibitorwere added to a 200 ml aliquot of lysate not containingRNase I (undigested control) Monosomes were purifiedusing sucrose gradients as described previously (45)

Total RNA from undigested lysates and the footprintscollected in the monosome fractions were purified usinghot acid PhenolndashChloroformndashIsoamyl alcohol (vvv25241) at 65C (49) To generate mRNA sequencinglibraries undigested RNA was polyA-enriched using aDynabeads mRNA Purification Kit (Ambion) accordingthe manufacturerrsquos instructions The polyA-enrichedRNA was fragmented by incubation with an RNAFragmentation Reagent (Ambion) at 70C for 30minSuccessful fragmentation was monitored on a 15denaturing-PAGE gel Both ribosome footprints andfragmented mRNA (26ndash34 nt) were size-selected by elec-trophoresis using a 15 denaturing-PAGE gel and twocustom-made (IDT) synthetic RNA markers [50-AUGUACACGGAGUCGAGCUCAACCCGCAACGCGA-(Phos)-30] and [50-AUGUACACGGAGUCGACCCAACGCGA-(Phos)-30]

Sequencing libraries were prepared as described (50)except for the omission of an rRNA depletion stepduring the footprint library preparation and the use ofKAPA HiFi polymerase (Kapa Biosystems) for the finalamplification step

Pre-processing and mapping of read

Reads from all libraries were processed and mapped usingthe same parameters Adapter sequence (ie CTGTAGGCACCATCAAT) was trimmed from the 30 end of readsusing cutadapt (httpcodegooglecompcutadapt) andreads shorter than 20 nt after trimming were removedTrimmed reads were mapped to the reference genomeusing bowtie-2 with default lsquolocal-sensitiversquo mode (51)Genome as well as gene annotations of strain 927version 42 were downloaded from EuPathDB (52) andused as the reference in all analyses

Calculation of abundance translational efficiency and read50-end periodicity

A read is considered to map to a region when its midpointfalls within the annotated range Abundance of a region ina library was defined as reads mapped per kilobase permillion non-structural RNA reads (rpkm) Non-structuralRNA refers to the reads mapped to the genome excludingannotated rRNA and tRNA regions Translational effi-ciency of a region was defined as its ratio of abundancein the ribosome profiling library to that of the control

RNA library of the same life cycle stage To investigatethe distinct 3-nt periodicity of the ribosome profilinglibraries 50 ends of the mapped reads were piled up andthe piled-up coverage of each position from each gene(using either start or stop codon as the reference point)were summed up in meta-gene analyses Genes with lessthan 10 reads within the plotted region were excluded

Definition of upstream ORFs and novel CDSs

A uORF was defined as any ORF 9 nt that was locatedwithin an annotated 50 UTR and was on the same strandas the annotated ORF Novel CDSs are defined as anytranslated ORF 30 nt located at least 20 nt away fromany existing annotations In cases in which multipleoverlapping novel ORFs were found on different framesof the same strand only the longest were retained AnORF is defined as translated when 70 of its CDS isbeing covered by the pooled ribosome profiling reads (iefour libraries) with 2 reads per nucleotide

Calculation of ribosome release score

Ribosome release scores (RRSs) of all ORFs includingannotated CDSs putative CDSs and uORFs werecalculated as described with slight modifications (53)Briefly a pseudo 30 UTR region (pUTR3) was firstassigned to all ORFs pUTR3 of an ORF was defined asthe region between its stop codon and a downstream startcodon in any of the three frames on the same strandReads mapped within the CDSs and pUTR3 perkilobase (ie CDSrpk and pUTR3rpk) in both ribosomeprofiling and RNA-sequencing control libraries werecalculated for each ORF The ratio of CDSrpk topUTR3rpk in both ribosome profiling and RNA-sequencing control libraries (ie riboratio and RNAratio)were calculated RRS of each ORF was then defined asthe ratio of riboratio to RNAratio

Identification of novel putative CDSs important forparasitersquos fitness

We reanalysed the data of a published genome-wide RNAinterference fitnessndashcosts association study (54) with ournewly defined CDSs Briefly short read data were down-loaded from European Nucleotide Archive under acces-sion number ERP000431 Reads were then mapped tothe genome using bowtie2 with default lsquolocal-sensitiversquomode The number of reads mapping to the annotatedand putative CDSs was counted To identify thoseputative CDSs that have significantly less reads mappedin the RNA interference libraries which potentially con-tribute to the parasitersquos fitness we performed differentialread count analyses on the pair-wise library comparisons(listed below) using three softwares including DESeq1(55) DESeq2 (httpwwwbioconductororgpackagesreleasebiochtmlDESeq2html) and EdgeR (56) withdefault settings CDSs that were identified to have signifi-cantly less reads in the RNA interference libraries by allthree algorithms (Plt 005 in each case) linear fold change 5 were considered to be potentially related to the loss offitness The pair-wise library comparisons include BFuninduced versus BF induced for 6 days BF uninduced

Nucleic Acids Research 2014 Vol 42 No 6 3625

versus differentiated cells induced and BF uninducedversus PF induced

Analysis of proteomics data

All mass spectrometric raw data files from Butter et al(57) were processed as described except that the data weresearched additionally with the previously un-annotatedCDSs listed in Supplementary Tables S4 and S5 BrieflyMaxQuant 1412 including the Andromeda search engine(5859) was used for processing raw data and databasesearching The experimental design template used was asdescribed before (57) The search was performed against acombination of three databases a T brucei database con-taining 19 103 annotated protein entries for strains 427and 927 (version 50 Tb427 and Tb927 downloadedfrom httptritrypdborg) a list of uORFs and putativeCDSs identified in this study (Supplementary Tables S4and S5) and a database containing typical contaminantsEnzyme search specificity was TrypsinP with threemiscleavages for tryptic digests and LysC with twomiscleavages for digests with lysyl endopeptidaseCarbamidomethylation on cysteines was set as fixed modi-fication methionine oxidation and protein N-terminalacetylation were considered as variable modificationsMass accuracy tolerances (after recalibration) were6 ppm for precursor ions and 05Da for CID MSMSand 20 ppm for HCD MSMS spectra respectivelyFalse discovery rate was fixed at 1 at peptide andprotein level with posterior error probability (PEP) assorting criterion Identifications were matched betweenruns within a 2-min window Protein groups identified inthe potential CDS database had at least one uniquepeptide and a PEPlt 00005

RESULTS

Ribosome footprints reveal ribosome position at sub-codonresolution

To obtain genome-wide information on protein synthesiswe adapted a ribosome profiling protocol published inyeast and mice (4546) for use in T brucei We sequencedribosome footprints and fragmented mRNA from the twotrypanosome stages that have been adapted to in vitroculture the PF of the tsetse fly midgut and theproliferating BF that lives in the blood of the mammalianhost Cultured cells were treated with cycloheximide toarrest translating ribosomes and harvested Becausecycloheximide has been shown to stabilize RNA tran-scripts in T brucei (60) the treatment was limited to2min and applied to both cells used for ribosomeprofililng and cells used for RNA-sequencing analysesTo maximize the accuracy of the ribosome footprintswe optimized the digestion efficiency by testing differentnuclease concentrations and reaction temperatures(lsquoMaterials and Methodsrsquo section and SupplementaryFigure S1) Next to map translated regions of RNA andto exclude RNase-protected regions that are scanned bythe 43S pre-initiation complex we used sucrose gradientsto specifically enrich for monosomes (SupplementaryFigure S1) The high reproducibility between libraries

prepared under different conditions demonstrated the ro-bustness of the ribosome profiling approach despite theelaborate library preparation protocol (SupplementaryFigure S2 BF digestion at 4C versus digestion at roomtemperature R2=09755)

To ensure that nuclease-resistant RNA fragments rep-resent ribosome footprints and not short RNA fragmentsprotected by other RNA-binding proteins we comparedthe genome-wide distribution of sequence reads obtainedfrom ribosome profiling and RNA-sequencing librariesAlignment of ribosome profiling reads revealed that thenuclease-protected reads predominantly mapped alongCDSs with very few reads aligning to intergenic regions(Figure 1A) The 50 nucleotide of the ribosome profilingreads aligned from 12 nt upstream of the ATG codon to18 nt upstream of the stop codon (Figure 1B) Given afootprint length of 28ndash30 nt the ribosome protects mRNAfrom 12 nt upstream of the start codon to 9ndash11 nt down-stream of the stop codon In contrast to ribosome foot-print reads RNA-sequencing reads were aligned along theCDSs and UTRs as expected (Figure 1A) In additionto the enrichment across CDSs the alignment ofribosome profiling reads revealed a distinct 3-nt period-icity with 68 of BF ribosome profiling reads starting atthe first nucleotide of a codon (Figure 1B and C andSupplementary Figure S3) No such 3-nt periodicity wasobserved for the mRNA sequence reads

Only two T brucei genes have introns one encoding apoly(A) polymerase and the other encoding a DNARNAhelicase (6162) Surprisingly there is almost no decreasein sequence reads detectable by RNA-sequencing acrossthe introns while the ribosome profiling data clearly showthat the intronic RNA is not translated (Figure 2A) Thesedata indicate that the process of cis-splicing must be un-usually inefficient in trypanosomes but that a tightcontrol mechanism ensures that only mature mRNAsenter the translational machinery

The strong enrichment of ribosome profiling readsacross CDSs the lack of ribosome profiling reads acrossintrons and the observed 3-nt periodicity are characteristicof ribosome-protected RNA footprints and argue againstunspecific RNA protection (45) Thus ribosome profilingreads indeed represent ribosome footprints and providea means to measure protein synthesis and in additionto identify both previously unannotated CDSs as well asnon-translated transcripts previously annotated as coding(for example Figure 1A Tb927102360) The latter maybe either incorrectly annotated or not translated in thetwo life cycle stages examined

Identification of translation initiation sites

Genome-wide mapping of spliced leader acceptor sites(SASs) ie the 50 end of the 50 UTR to which thespliced leader RNA is trans-spliced revealed hundredsof instances in which SASs mapped within annotatedCDSs or far upstream of annotated CDSs (133536)These findings indicated that for many genes the trueCDS was shorter than the annotated CDS while forothers it allowed the possibility of translation initiationat an AUG upstream of the annotated translation

3626 Nucleic Acids Research 2014 Vol 42 No 6

initiation site However while RNA-sequencing allowedthe unambiguous identification of mis-annotation formany genes translation initiation does not always occurat the first AUG downstream of the SAS Thus the use-fulness of RNA-sequencing to identify the true translationinitiation sites is limited and depends on the genomiccontext

In contrast to RNA-sequencing ribosome profilingallows the determination of the translational landscapeThus it allows the identification of CDS mis-annotationsand the determination of the most probable translationinitiation sites Exemplary for numerous genes forTb92751990 we detected no translation between thefirst and the second AUG (green bars) indicating thatthis region is not translated in the BF and PF and thattranslation initiates at the second AUG (Figure 2Bleft panel) For Tb927102720 we observed strong trans-lation upstream of the annotated CDS The fact thattranslation starts at an upstream AUG which is inframe with the annotated translation initiation sitesuggests that the CDS of Tb927102720 is longer thanannotated (Figure 2B right panel) These examplesshow that ribosome profiling data can be used for there-annotation of incorrectly annotated CDSs

Ribosome profiling reveals extensive translational control

While the role of RNA stability has been investigated on agenome-wide scale in T brucei (29) global regulation atthe levels of protein translation has not been analysedTherefore we decided to apply the ribosome profilingapproach to evaluate the degree of regulation occurringat the level of translation in trypanosomesTo estimate the rate of protein synthesis in trypano-

somes we determined the ribosome footprint density

AUG

28 nt - ribosome footprint

CDS position (nt from start)

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

P s

iteA

site

UAA

CDS position (nt from stop)

P s

iteA

site

ribosome footprint

frame

perc

enta

ge o

f rea

ds

0

20

0

40

60

80

1 2 0 1 2

mRNA

-24 -18 -12 -6 0 6 12 18 24 -24 -18 -12 -6 0 6 12

foot

prin

t (R

PM

)m

RN

A (

RP

M)

mR

NA

(R

PM

)

C

B

A

0

2

4

0

2

4

6

0

2

4mRNA density in PF

ribosome footprint density in PF

mRNA denstiy in BF

foot

prin

t (R

PM

)

0

0

02

04

1

2

3

600000

Tb927102320

Tb927102330

Tb927102340

Tb927102350

Tb927102370

Tb927102360

Chr X 602000 604000 606000 608000

ribosome footprint density in BF

Figure 1 Ribosome footprints reveal coding sequences at sub-codonresolution (A) Ribosome footprints are strongly enriched acrossCDSs mRNA densities and ribosome densities are shown as readsper nucleotide per million reads (RPM) to normalize for differencesin library size (B) Alignment of the 50 nucleotide from ribosome foot-print reads that map close to translation start or translation termin-ation sites Blue boxes mark the approximate size of the ribosomefootprint (C) Percentage of position of sequence reads relative toreading frame

00

10

20

813000Chr III Chr VIII

814000

815000

816000

00010203

0

2

4

490000

492000

494000

00

10

20

0

2

4

0

2

4

00

10

20

615500Chr V Chr X

616000

616500

617000

699000

700000

701000

00

02

04

mR

NA

(R

PM

)

mR

NA

(R

PM

)

foot

prin

t (R

PM

)

foot

prin

t (R

PM

)

mR

NA

(R

PM

)

mR

NA

(R

PM

)

foot

prin

t (R

PM

)

foot

prin

t (R

PM

)

Tb92751990 Tb927102720

Tb92733160 Tb92781510

B

A

Figure 2 Ribosome footprints reveal translated regions (A) Ribosomeprofiles for the two intron-containing genes Introns are represented asa dashed line (B) Ribosome profiles of two possibly mis-annotatedCDSs Black bars mark annotated CDSs Grey bars mark CDSs pre-dicted based on ribosome profiles Green boxes represent ATG codonsand red boxes represent stop codons

Nucleic Acids Research 2014 Vol 42 No 6 3627

across CDSs [measured as reads per million reads per kb(rpkm) see lsquoMaterials andMethodsrsquo section] Of 77 millionPF and 44 million BF ribosome footprint sequence reads28 (PF) and 14 (BF) could be aligned to annotatedCDSs while 70 (PF) and 85 (BF) mapped to knownstructural non-coding RNA (ncRNA) The high percent-age of structural ncRNA in the footprint fraction has beenreported previously (45) and results from the large amountof rRNA in the monosome fraction The lower percentageof CDS reads in the BF sample compared with the PFsample may be due to libraries being generated from thewild-type 427 strain while reads were aligned to the better-annotated genome of the TREU 927 strain In the BFsome of the most highly expressed genes encoding forvariant surface glycoproteins differ in sequence betweenthe 427 and 927 strains and footprints from these geneswere not aligned in this analysisNevertheless for 8072 genes (82 of annotated CDSs)

we were able to detect 10 ribosome footprint reads athreshold that we defined as translation (SupplementaryTable S1) We found the rate of translation to differgreatly among proteins In the PF we observed a11 448-fold difference in ribosome density (3548-fold dif-ference in mRNA levels) between the 1 most highly andthe 1 most weakly translated proteins while in the BFwe observed a 1623-fold difference in ribosome densityand a 262-fold difference in mRNA levels (Figure 3A)These findings suggest translational efficiency to signifi-cantly contribute to the regulation of gene expressionTo determine the translational efficiency for individual

genes we normalized for differences in mRNA abundanceie we divided the ribosome footprint density by the levelsof mRNA abundance Our ribosome profiling dataindicate a relatively unbiased distribution of ribosomedensity across CDSs in the BF and a very minor increasein density towards the 50 end of CDSs in the PF(Supplementary Figure S3) Nevertheless wheneverdetermining the translational efficiency we excludedmRNA and footprint reads that mapped to the first 40 ntof a CDS to avoid possible artifacts from ribosome stallingat translation initiation sites Between the 1 most effi-ciently and the 1 least efficiently translated proteins weobserved in the PF a 117-fold and in the BF a 64-fold rangein translational efficiency ie in the amount of proteinsproduced per transcript (Figure 3B and SupplementaryTable S2) This range in translational efficiency is similarto what has been observed in yeast and roughly 10-foldhigher than what has been described for mice (4546)Interestingly we observed no correlation between RNAabundance and translational efficiency (R2=003) sug-gesting that translational efficiency is regulated independ-ently of RNA stability

Life cycle-specific regulation of translational efficiency

Next we addressed the question if translational efficiencyis regulated in a life cycle-specific manner Ribosomedensity measurements only allow meaningful comparisonsof the rate of protein synthesis when the speed of transla-tion is assumed to remain constant This assumption issupported by measurements in mouse embryonic stem

cells (4664) however the speed of translation may verywell not be identical in two life cycle stages that live atdifferent temperatures In addition our current protocoldoes not allow the absolute quantification of mRNA and

-4 -2 0 2 4 6 8 10 12 140

1000

2000

3000

0

500

1000

1500

-80

-60

-40

-20 0

02

04

0

-6 -3 0 3 6 9 12 150

1000

2000

3000

0

1

2

1459000Chr IX

1461000

1463000

1459000Chr IX

1461000

1463000

0

2

4

0

1

2

0

2

4

translational efficiency (log2)

PF rank in translational efficiencyBF

ran

k in

tran

slat

iona

l effi

cien

cy

log2 rpkm log2 rpkm

num

ber

of g

enes

num

ber

of g

enes

0

500

1000

1500

num

ber

of g

enes

num

ber

of g

enesmRNA

footprintsmRNAfootprints

BF

BF PF

PF

BF mRNA density PF mRNA density

BF footprint density PF footprint density

high

all genes N = 7782

PUF (Pumillio) N = 10

cytochrome oxidases N = 8

alternative oxidase

enzymes requiredfor glycolysis N = 10

A

B

D

C

mR

NA

(R

PM

)fo

otpr

int (

RP

M)

mR

NA

(R

PM

)fo

otpr

int (

RP

M)

Tb092111070 Tb092111070

low

low

-80

-60

-40

-20 0

02

04

0

translational efficiency (log2)

Figure 3 Translational efficiency is regulated (A) Histograms ofmRNA abundance and ribosome density (rate of protein synthesis)for BF (left panel) and PF parasites (right panel) (B) Histogram oftranslational efficiency (ratio of ribosome footprint density to mRNAabundance) for BF (left panel) and PF parasites (right panel) (C) Pair-wise comparisons of translational efficiency in PF and BF CDSs wereranked based on translational efficiency (1=highest translational effi-ciency 7782= lowest translational efficiency) in BF and PF Ranksbetween life cycle stages show a correlation of R=07428 Genefamilies with developmentally regulated translational efficiencies arecolour coded Pumillio genes (red) cytochromes oxidase (blue) genesrequired for glycolysis (63) (green) and the alternative oxidase(Tb927107090 black) (D) Ribosome footprint profile of a gene withdevelopmentally regulated translational efficiency Green arrow indi-cates direction of transcription

3628 Nucleic Acids Research 2014 Vol 42 No 6

ribosome footprint levels in the two different life cyclestages Therefore we decided to compare the overallchanges in translational efficiency within each life cyclestage To this end we ranked the genes in each life cyclestage based on their relative translational efficiency andcompared their rank between the two life cycle stagesA pair-wise comparison indicated a generally positive cor-relation between translational efficiency in the two stages(Figure 3C Pearsonrsquos correlation coefficient=07428Plt 00001 95 CI 07327ndash07526) but it also revealeddistinct life cycle-specific differences for subsets of genesFor example 58 genes found in the lowest 25 in terms oftranslational efficiency in the PF were among the top 25most efficiently translated proteins in the BF Thesecontain various RNA binding proteins numerous phos-phatases and expression site-associated genes (Table 1 andSupplementary Table S3) For an example of a genemore efficiently translated in the BF than in the PF seeFigure 3D

Based on mRNA and protein level measurements differ-ential translational efficiency has been predicted foraconitase the cytochrome oxidases V and VI and procyclins(222565) For all of these genes our data indicate anincreased translational efficiency in the PF (Table 1) thusconfirming previous observations and validating our meas-urements For selected groups of proteins eg annotated ascytochrome oxidase or those required for glycolysis wenoticed the translational efficiency to correspond to the

changes that have been described for the glucosemetabolism(Figure 3C) Whereas the BF is entirely dependent on theglycolytic pathway to generate energy the PF living in theinsect midgut where glucose availability is uncommon orabsent has been shown to contain a much more elaborateenergy metabolism (66) In contrast the efficient translationof the group of PUF (Pumilio and FBF) proteins in PF wasunexpected (Figure 3C) PUF proteins regulate translationand mRNA stability by binding sequences in their targetRNAs and have been shown to affect gene expression inorganisms as divergent as Plasmodium falciparumT brucei humans and yeast (67ndash69) Interestingly wefound both poly(A) binding proteins all four isoforms ofeIF4Es and four of the five eIF4G isoforms all proteinsinvolved in control of translation initiation to be more effi-ciently translated in the PF than in the BF (SupplementaryTable S2)Given the large dynamic range and the life cycle-specific

regulation we propose that differences in translationalefficiency contribute substantially to the control of geneexpression in T brucei

uORFs are frequently translated and may regulatetranslational efficiency

The broad 100-fold range in translational efficiencyamong genes and the differences in translationalefficiencies between life cycle stages raise the question of

Table 1 Developmentally regulated translation

Gene ID Description BF rank PF rank Change inrank (PFndashBF)

Translation up-regulated in BFTb92773250 Expression site-associated gene 6 (ESAG6) protein putative 1372 7405 6033Tb92743980 Chaperone protein DNAj putative 1301 7248 5947Tb927104780 GPI inositol deacylase precursor (GPIdeAc) 390 5994 5604Tb92788140 Small GTP-binding rab protein putative 1217 6804 5587Tb92763480 RNA-binding protein putative (DRBD5) 1856 7262 5406Tb11014701 Membrane-bound acid phosphatase 1 precursor (MBAP1) 689 6065 5376Tb92714650 Cyclin-like F-box protein (CFB2) 10 5289 5279Tb92735660 UDP-Gal or UDP-GlcNAc-dependent glycosyltransferase putative 797 6041 5244Tb92726000 Glycosylphosphatidylinositol-specific phospholipase C (GPI-PLC) 1634 6793 5159Tb92745310 Serinethreonine-protein kinase a putative 1656 6776 5120

Translation up-regulated in PFTb9275440 trans-Sialidase putative 6785 418 6367Tb92776850 trans-Sialidase (TbTS) 7513 1445 6068Tb92777470 Receptor-type adenylate cyclase GRESAG 4 putative 7001 1268 5733Tb92712120 Calpain putative 6258 739 5519Tb9274360 12-Dihydroxy-3-keto-5-methylthiopentene dioxygenase putative 6587 1089 5498Tb091605550 Calpain-like cysteine peptidase putative 6936 1868 5068Tb92787690 Amino acid transporter (pseudogene) putative 6693 1731 4962Tb92781610 MSP-B putative 7199 2272 4927Tb11016650 Serinethreonine-protein kinase putative 5280 363 4917Tb92777110 Leucine-rich repeat protein (LRRP) putative 6501 1601 4900

Genes for which translational up-regulation in PF was previously shownTb92710280 Cytochrome oxidase subunit VI (COXVI) 2820 810 2010Tb091601820 Cytochrome oxidase subunit V (COXV) 935 291 644Tb9271014000 Aconitase (ACO) 657 31 626Tb9276510 GPEET2 procyclin precursor 3848 365 3483Tb9271010260 EP1 procyclin (EP1) 4682 34 4648Tb9271010220 Procyclin-associated gene 2 (PAG2) protein (PAG2) 6903 5375 1528Tb9275330 Receptor-type adenylate cyclase GRESAG 4 putative 3779 809 2970

List of genes with highest change in translational efficiency between PF and BF List does not include hypothetical genes

Nucleic Acids Research 2014 Vol 42 No 6 3629

how protein translation is regulated For a small numberof genes in T brucei sequence motifs in the 30 UTR havebeen linked to developmentally regulated changes in trans-lational efficiency but for the large majority of genes nosuch motifs have been identified (70) Numerous reportsfrom other eukaryotes demonstrated that uORFs canfunction as important regulators of gene expression (71)For example analysis of mRNA and protein levels fromgt10 000 mammalian genes revealed that the presence ofan uORF correlates with a significant reduction of expres-sion from the downstream CDS Furthermore usingreporter constructs it was estimated that the presence ofan uORF results in a decrease in gene expression of 30ndash80 with only a minor reduction of mRNA levels (72)No such analyses have been performed for T brucei but ithas been shown that the removal of a uATG leads to a 7-fold increase in protein levels of a luciferase reporter (41)In this manuscript we use the term ORF or uORF to referto regions of DNA sequences beginning with an ATG andending with a termination codon that may or may notencode for proteins The term coding sequence (CDS) isused to refer to ORFs that encode proteins MinimallyuORFs consist of 9 nt containing an upstream start codon(uATG) an additional sense codon and a terminationcodon whereby the termination codon may be locateddownstream of the start codon of the main CDS (73)Even though 50 UTRs have only been assigned for 4909

genes (927 version 42) applying the criteria listed abovewe identified 8310 uORFs and found 1092 (22) of 50

UTRs to contain at least one uORF (SupplementaryTable S4) Thus uORF are much less common inT brucei than in mammals where 40ndash50 of genescontain uORFs (7475) Nevertheless while uORFscannot explain the translational regulation for all tran-scripts they may represent one of several factors import-ant for the regulation of gene expression in T bruceiTo determine the degree to what uORFs are translated

a prerequisite for affecting translational efficiency of thedownstream CDS we calculated the ribosome densityacross uORFs and identified 1834 uORFs (22) with 2 read-coverage and 70 of the ORF coveredThe average length of these uORFs was 22 aa Just likefor annotated CDSs the alignment of ribosome profilingreads across these uORFs revealed a distinct 3-nt period-icity with 63 of BF ribosome profiling reads starting atthe first nucleotide of a codon (Figure 4A and B) Whilethe periodicity of ribosome footprints across uORFs wasslightly less pronounced than for annotated CDS ourdata suggest that a large proportion of uORFs istranslatedTo evaluate the regulatory potential of uORFs we

compared the ribosome densities between transcripts con-taining at least one uORF (N=1092) and those withoutuORFs (N=3817) In the BF median ribosome densityacross the CDSs of transcripts with uORFs was3031 rpkm compared with 4486 rpkm for genes withoutuORFs (Table 2) In the PF median ribosome densitieswere 2001 rpkm (genes with uORF) and 3650 rpkm(genes without uORF) Thus in both life cycle stages weobserved a higher ribosome density for genes withoutuORFs than for genes with uORFs (Plt 00001) While

we also observed a higher mRNA level for geneswithout uORFs compared with genes with uORFs(Plt 00001) the differences in mRNA levels were lowerthan the differences in ribosome density (Table 2) Thusour data indicate that median translational efficiencyis higher for genes without uORF than for genes withuORF suggesting that the presence of uORFs may nega-tively impact translation of downstream CDSs

Noteworthy while median ribosome density for CDSswas higher in the BF than in the PF (Table 2) we observedthe opposite for 50 UTRs (Plt 00001) For 50 UTRs wefound a higher ribosome density in the PF (median rpkm773) than in the BF (median rpkm 289 SupplementaryTable S4 for an example see Figure 4C) In addition wefound that in the BF 50 UTRs with uORF (408 rpkm)have a higher ribosome density than 50 UTRs withoutuORFs (254 rpkm Plt 00001) Unexpectedly thecontrary was true in PF In the PF we found 50 UTRswithout uORF (834 rpkm) to have a higher ribosomedensity than 50 UTRs with uORF (501 rpkmPlt 00001) For an example of a non-AUG uORF seeFigure 4C right panel Many of such non-AUG uORFshave been observed in yeast and mice embryonic stem cells(mESCs) Interestingly an increase in 50 UTR translationsimilar to what we observed in the PF compared with theBF was observed in yeast on starvation and a decreasein 50 UTR translation was observed on differentiation ofpluripotent mESCs into embryoid bodies (4546)

Genome-wide analyses to prove or disprove a directcorrelation between translation of uORFs and

00

04

08

411000Chr I

411500 412000

00

02

04

00

04

08

Chr I420000 420200 420400

0

2

4

foot

prin

t (R

PM

)fo

otpr

int (

RP

M)

BF BF

PF PF

Tb92711610 Tb92711670

-24 -18 -12 -6 0 6 12

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

uORF

uORF

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

A B

C

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

-24 -18 -12 -6 0 6

Figure 4 Ribosome footprints reveal translation of uORFs(A) Alignment of the 50 nucleotides from ribosome footprint readsthat map close to translation start or translation termination sites ofuORFs (B) Percentage of position of sequence reads relative to readingframe (C) Ribosome profiles of two genes with uORF (left panel) andwithout uORF (right panel) Narrow grey boxes represent 50 UTRsgreen box represents AUG-codon red box represents terminationcodon

3630 Nucleic Acids Research 2014 Vol 42 No 6

translational efficiency is complicated by the existence ofmultiple 50 UTR isoforms resulting from widespreaddifferential trans-splicing (353676) At the same timethe heterogeneity in UTR lengths raises the intriguingpossibility that inclusion or exclusion of an uORFduring alternative trans-splicing may represent a regula-tory mechanism to modulate translational efficiency

Ribosome footprints reveal the presence of hundreds ofpreviously un-annotated CDSs

Even though small proteins (lt200 aa) have been shown toplay major roles in plant and animal development (7778)the complexity of the short proteome remains largelyunexplored because algorithms to reliably predict shortCDSs are lacking (79) We observed that while almostall ribosome footprints aligned to annotated features inthe BF 101 197 reads (023) and in the PF 1 116 030reads (144) did not It is noteworthy that just as weobserved for 50 UTRs the percentage of reads not aligningto CDSs is higher in the PF than in the BF We suspectthat many of these reads originate from un-annotatedUTRs but some may stem from previously un-annotatedCDSs A previous RNA-sequencing-based analysisdetected 1114 new un-annotated transcripts in the PF1011 of which had the potential to encode one or morepeptides ( 25 aa) Using available proteomics data theauthors were able to confirm translation for 19 of the 1114transcripts (13) Given our ability to determine ribosomepositions at nucleotide resolution we looked for new pre-viously un-annotated CDSs

We mapped ribosome footprint reads from the BF andPF and determined the ribosome footprint density for allpotential CDSs at least 10 aa in length To avoid overlapwith annotated CDSs we only considered ORFs locatedat least 20 nt away from annotated features Thisapproach enabled us to identify a set of 2021 candidateCDSs with 2 read-coverage and 70 of the ORFcovered The candidate CDSs ranged in length from 10to 378 aa (average 30 aa Supplementary Table S5) For797 of the 1114 previously identified transcripts we foundthe average ribosome footprint read-coverage to be 2(Supplementary Table S6 Previously identified transcriptsfrom chromosome 10 were not considered because oursequencing reads were mapped to a newer version of the

chromosome 10 assembly and thus genomic coordinatesmight be incompatible)To determine whether our set of candidate CDSs is

translated we analysed published proteomics datasets(57) to search for protein products For 24 of the 2021candidate CDSs (average size 117 aa 13 kDa) proteinproducts could be identified however four of theseidentified peptides also matched annotated CDSs Inaddition we identified protein products for 31 uORFs(Supplementary Table S4) We suspect that the lownumber of identified protein products relative to thelarge number of candidate CDSs may be caused (i) bythe experimental set-up of the original proteomicsanalysis where proteins were separated by 1D-SDS-PAGE prior to in-gel protease digest and mass spectro-metric analysis whereby proteins with a size smaller than5ndash10 kDa typically run out of the gel (ii) by the limitedsensitivity of mass spectrometric analyses compared withDNA sequencing-based techniques and (iii) by the factthat not all ribosome-protected RNAs are translatedThe first point is supported by the fact that only 67

(3) of our candidate CDSs were 90 aa (10 kDa) butfor 28 of those large candidate CDSs we could identifypeptides Evidence supporting the possibility that not allribosome-protected RNAs are translated exists in micewhere even known long ncRNAs have been found to beassociated with ribosomes (46) To evaluate the codingpotential of a transcript and to separate long ncRNAfrom small coding genes a new metric was establishedthe so-called RRS (53) The RRS takes advantage of thefact that translating ribosomes are released on encounter-ing a stop codon This release results in a sharp decrease inribosome occupancy between protein-coding regions andthe subsequent 30 UTR Consequently the RRS wasdefined as the ratio of footprint reads in the putativeCDS to footprint reads in the corresponding 30 UTRdivided by the ratio of RNA reads in the putative CDSto RNA reads in the corresponding 30 UTR (seelsquoMaterials and Methodsrsquo section)Similarly to annotated CDSs and uORFs our averaged

ribosome footprint data indicated a well-defined 3-nt peri-odicity for sequence reads occurring near translationinitiation sites However we observed a less well-defineddrop in ribosome densities across translation terminationsites (Figure 5A and B) which may indicate a lack of

Table 2 Translational efficiency of genes with and without uORF

Transcripts withuORF (N=3817)

Transcripts withoutuORF (N=1092)

MannndashWhitney test

Bloodstream formRibosome density (median rpkm) 3031 4466 Plt 00001mRNA levels (median rpkm) 3083 3613 Plt 00001Translational efficiency (ribosome densitymRNA levels) 100 127 Plt 00001

Procyclic formRibosome density (median rpkm) 2001 3650 Plt 00001mRNA levels (median rpkm) 2404 3038 Plt 00001Translational efficiency (ribosome densitymRNA levels) 087 126 Plt 00001

Reads mapping within the first 40 nt of a CDS were not considered for the measurements of translational efficiency Number of genes with annotated50 UTR N=4909

Nucleic Acids Research 2014 Vol 42 No 6 3631

productive translation Because we observed an abruptdrop in footprint density downstream of the stop codonof annotated genes in T brucei (Figure 1A and B) wedecided to apply the same parameters to evaluate the like-lihood that ORFs with high footprint density are indeedbeing translated into functional proteins Again candidateCDSs were defined as ORFs with a minimum length of10 aa and an average ribosome footprint coverage of 2over at least 70 of the ORF To define the correspond-ing putative 30 UTR we applied the same parameters usedin mice (53) ie the 30 UTR was defined as the regionbeginning immediately downstream of the ORF andending at the first subsequent start codon (in anyreading frame) A limitation of the RRS is that it canonly be calculated for an ORF if at least one entire foot-print read and one RNA read map to the putative CDSand the corresponding 30 UTR Because many 30 UTRswere too short to fulfill this criterion or simply did notcontain any footprint reads a sign for efficient translationtermination we were able to determine the RRS for only

1445 of genes annotated in T brucei (SupplementaryTable S7) For these 1445 genes the median RRS was839 compared with an RRS of 019 for the subset ofgenes annotated as hypothetical unlikely (N=90) Thussimilar to the observation in mice the RRS appears to bea good predictor of productive translation Next wedetermined the RRS for the previously un-annotated can-didate CDSs that fulfilled the criteria for RRS calculation(N=265) The median RRS of these candidate CDSs waswith 399 lower than the median RRS of the annotatedgenes (839) Thirteen candidate CDSs had an RRS largerthan 839 and 98 candidate CDSs had an RRS higher thanthe well-described intron-containing poly(A) polymerase(RRS=963) The RRS could only be calculated forone of the candidate CDSs for which we had identifiedprotein products (RRS=5744)

Finally in an attempt to learn about the function of theputative CDSs we searched for known protein motifsin those candidate CDSs for which we had identifiedprotein products and for those with an RRS above 10However except for one candidate CDS located withinthe histone H2B gene array and which contained anH2B signature motif (Supplementary Table S5) nomotifs were identified

Taken together proteomics data and a well-definedribosome release (high RRS) suggest extensive translationbeyond the annotated CDSs in T brucei

Newly identified CDSs are important for parasite fitness

The combination of ribosome profiling data with previ-ously published mass spectrometric data allowed us toidentify a large number of new CDSs Nonetheless the bio-logical significance of theseCDSs remains to be determinedIn addition while ORFs with a high ribosome footprintdensity a low RRS and without mass spectrometricevidence may not be translated into functional proteinsthey may nevertheless play important regulatory roles

Therefore we decided to evaluate the importance of ourcandidate CDSs independent of RRS and mass spectro-metric evidence in parasite survival A previously pub-lished high-throughput phenotyping approach termedRIT-seq measured the fitnessndashcost associations usingRNA interference (RNAi) The RIT-seq data revealed asignificant loss in fitness on RNAi induction of 2724 CDSsin the BF 1972 CDSs in the PF and 2677 CDSs in inducedand differentiated parasites (54) The approach is based onthe integration of RNAi libraries into trypanosomes and acomparison of the recovery of RNAi targets from tryp-anosome populations before and after RNAi inductionThe RNAi library was generated using genomic DNAbut the effect on parasite fitness was only determined forannotated genes Re-analysing the same datasets we finda significant loss of fitness on RNAi induction for 214putative CDSs in the BF (6 days after RNAi induction)16 CDSs in the PF and 227 CDSs on differentiation(Supplementary Table S8) For examples of candidateCDSs potentially essential for viability in the BF seeFigure 5C It is important to note that the RNAi libraryused for the RIT-seq experiments contains a median insertsize of 600 bp (average 1 kb) thus somewhat limiting the

01234

0

4

8

0005101520

02468

05

101520

02468

Chr III

622400

622000

622800

0

4

8

Chr X

2577800

2578200

01234

mR

NA

fo

otpr

int

RN

Ai r

eads

R

NA

i rea

ds

A

C

B

putative CDS putative CDS

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

-24 -18 -12 -6 0 6 12 -24 -18 -12 -6 0 6

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

putative CDS

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

(RPM)

Figure 5 Ribosome footprints reveal previously un-annotated CDSs(A) Alignment of the 50 nucleotides from ribosome footprint reads thatmap close to translation start or translation termination sites of candi-date CDSs (B) Percentage of position of sequence reads relative toreading frame (C) Ribosome mRNA and RIT-seq (RNAi targetsequencing) profiles of two previously un-annotated putative CDSs

3632 Nucleic Acids Research 2014 Vol 42 No 6

resolution of the RIT-seq data (8081) Nevertheless thesefindings suggest an important biological role for a subsetof genes that have been missed in previous genome anno-tations Intriguingly the RNAi data reveal large develop-mental differences in the fitness costs associated with thedown-regulation of putative CDSs suggesting that the setof candidate CDSs may be more important during themammalian stage

DISCUSSION

In this study we report the first genome-wide analysis ofprotein synthesis and a strand-specific analysis of RNAtranscript levels for T brucei It is the first such analysisfor a eukaryotic pathogen and an organism without tran-scriptional control Our data reveal large differences intranslational efficiency among transcripts in the same lifecycle stage and between the same transcript in differentstages In addition the sequencing of ribosome-protectedRNA enabled us to identify transcripts that are likely tobe translated Hundreds of putative previously unidenti-fied CDSs appear to be essential for parasite fitness andthe analysis of available proteomics data confirmed theexistence of at least 20 of these previously unknown CDSs

Eukaryotic gene expression is regulated at multiplelevels with translation of mRNA into proteins beingone of the most important (17) The ability to determineboth translatome and transcriptome enabled us toevaluate the efficiency with which individual transcriptsare translated into proteins and suggests translationalcontrol to be an important regulatory mechanism inT brucei Under a single condition we observed transla-tional efficiency to vary over two orders of magnitudethus its importance equals that of RNA stability forwhich a similar range has been measured (82) Howeverwe observed no correlation between RNA abundance andtranslational efficiency suggesting that translational effi-ciency is regulated independently of RNA stability

To use ribosome density to estimate the rate of proteinsynthesis the speed of translation must be constant Thisassumption is supported by measurements in mouse em-bryonic stem cells that demonstrate the rate of translationto be consistent between different classes of mRNA thekinetics of elongation to be independent of length andprotein abundance and the speed of translation to be in-dependent of codon usage (4664) Nevertheless the speedof translation may very well be different between the twolife cycle stages of the parasite that live at 37C and 27CFurthermore our approach does not permit measurementof the absolute mRNA and footprint abundance for theindividual life cycle stages making direct comparisons oftranslation efficiencies for individual transcripts unreli-able Therefore we did not compare translational effi-ciency directly but rather determined the lsquorankrsquo intranslational efficiency for all genes in each life cyclestage assuming that a lsquochange in rankrsquo indicated lifecycle-dependent translational control While we see ageneral positive correlation in translational efficiencybetween the two life cycle stages (Pearson r=07428)for a large number of genes we observed developmental

regulation of translation Importantly the regulation weobserve agrees well with that reported in previous studiesfor a small subset of genes but also includes many proteinspreviously unknown to be developmentally regulated Inaddition the efficient translation of proteins annotated ascytochrome oxidases in the PF and of enzymes importantfor glycolysis in the BF is consistent with differences inenergy metabolism with the BF entirely dependent on gly-colysis to generate energyHow translational control is achieved in T brucei

remains to be seen but our analysis suggests that uORFmay be an important contributor Work in several organ-isms has shown that generally the presence of uORFs cor-relates with reduced gene expression (71) However forsome genes like the transcription factor GCN4 in yeastand the activating transcription factor ATF4 in mammalsthis correlation is reversed on stress (8384) Thus uORFcan exert positive and negative effects on translation ofdownstream CDSs Based on previous 50 UTR annota-tions we identified uORFs in 22 of T brucei genesbut it will be interesting to learn if the regulatory effectsof uORFs may be supplemented by translation from non-AUG uORFs Our ribosome profiling data indicate anincrease in translation from 50 UTRs and a more ubiqui-tous translation initiation from non-AUG sites in thePF compared with the BF Similarly an increase in thetranslation from 50 UTRs has been observed in yeast onstarvation (45) One of the most highly conserved stress-induced mechanisms to regulate translation involves theinactivation of the translation initiation factor eIF2aby phosphorylation at a conserved serine (8586)Phosphorylation of the T brucei and L major orthologsoccurs at the corresponding Thr residues (687) but thegeneration of a T brucei cell line exclusively expressing amutant form of eIF2a demonstrated that phosphorylationof eIF2a is not required for heat-induced translationalarrest or for the formation of heat shock stress granulesin T brucei (88) Given the finding that eIF2a may also beimportant in determining the stringency in initiator codonselection (89) it will be interesting to see whether eIF2aphosphorylation affects the selection of initiator codonsand whether this plays a role in the fine-tuning of geneexpressionBesides being a valuable tool to study translational

regulation ribosome profiling will also be invaluable forthe discovery of small peptides While only 2 of thehuman genome contains annotated CDSs (90) transcrip-tome analyses have revealed almost genome-wide tran-scription of RNA mostly considered to be non-protein-coding In lower organisms the percentage of thegenome encoding for proteins is higher (629192) butpervasive transcription of non-protein coding regionscan be found in essentially all organisms (93) The vastamount of ncRNA has triggered great interest inelucidating the biological significance of these transcriptsand has led to the identification of numerous regulatorymechanisms controlled by short and long ncRNAInterestingly the majority of transcription occurs aslong ncRNA in humans (94) many of which containORFs making it difficult to unambiguously classify a tran-script as non-protein-coding To distinguish long ncRNA

Nucleic Acids Research 2014 Vol 42 No 6 3633

from mRNA ORF length has served as the mostcommonly applied criterion because even without select-ive pressure short ORFs will occur frequently by chancewhile the probability that long ORFs occur by chance arelow Nevertheless recent findings reveal that eukaryoticgenomes contain a large number of small CDSs (95ndash97)and that small peptides play important biological roles(77789899) In addition numerous long ncRNAs con-taining ORFs longer than 100 aa have been found to as-sociate with ribosomes (4653) While these findingscomplicate the annotation of CDSs and demand for newapproaches to annotate genes they raise the exciting pos-sibility that a large number of unknown small CDSs existsand awaits discoveryFor this study RNA was obtained from a 427 strain

the most common laboratory strain and sequence readswere aligned to the more complete genome of the 927strain nevertheless RNA-sequencing analysis indicateswidespread transcription Thus as seen in other organ-isms active transcription is not necessarily a good pre-dictor of CDSs In contrast we found ribosomefootprint density to be highly enriched across annotatedCDSs excluding introns It should thus serve as a reliabletool to identify novel genes To search for new CDSs andto evaluate their biological function we combinedribosome profiling proteomics and genome-wide RNAidata While proteomics data allowed us to identifyprotein products for only 20 relatively large candidateCDSs the experimental set-up of the proteomic analysisthat involved the removal of small peptides made itimpossible to verify the existence of many small CDSsThus more important than the verification of putativeCDSs with proteomics data may be our finding thatgt200 of the putative CDSs appear to be essential forparasite fitness Furthermore RIT-seq data suggest thatcandidate CDSs are 13-fold less likely to be essentialfor viability in the PF than in the BF or during differen-tiation of cells from BF to PF These findings point to apossible role of small peptides in the adaptation of tryp-anosomes to life in the mammalian host Generally ourdata suggest the existence of a large yet to be exploredsmall proteome but further characterizations of thesmall proteome will have to include a more targetedproteomics analysisIn this analysis we focused on measuring translational

control and the identification of new CDSs Howeverribosome profiling data should also aid in the identifica-tion of bona fide ncRNAs For many annotated CDSs weobserved RNA transcripts but not ribosome footprintssuggesting the presence of ncRNA Though given thatwe have only analysed two life cycle stages theseputative ncRNAs may be translated during other stagesof the parasitersquos life cycle Thus a comprehensive searchfor ncRNA should include ribosome profiling analyses ofall life cycle stages

ACCESSION NUMBERS

All sequencing data have been deposited in the EuropeanNucleotide Archive PRJEB4801

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Stan Gorski Susanne Kramer Christian Janzenand Jose-Juan Lopez-Rubio for valuable discussions andcritical reading of the manuscript and Yanjie Chao andJan Medenbach for much appreciated advice ongenerating ribosome profiling libraries

FUNDING

Young Investigator Program of the Research Center ofInfectious Diseases (ZINF) of the University ofWuerzburg Germany German Research FoundationDFG [SI 16102-1] Human Frontier Science Program(to TNS) French National Research Agency [ANR-2010-GENM-011-01 GENAMIBE to CCH] Fundingfor open access charge German Research Foundation(DFG) and the University of Wuerzburg in the fundingprogramme Open Access Publishing

Conflict of interest statement None declared

REFERENCES

1 MooreMJ (2005) From birth to death the complex lives ofeukaryotic mRNAs Science 309 1514ndash1518

2 NagarajN WisniewskiJR GeigerT CoxJ KircherMKelsoJ PaaboS and MannM (2011) Deep proteome andtranscriptome mapping of a human cancer cell line Mol SystBiol 7 548

3 LuP VogelC WangR YaoX and MarcotteEM (2007)Absolute protein expression profiling estimates the relativecontributions of transcriptional and translational regulationNat Biotechnol 25 117ndash124

4 de GodoyLM OlsenJV CoxJ NielsenML HubnerNCFrohlichF WaltherTC and MannM (2008) Comprehensivemass-spectrometry-based proteome quantification of haploidversus diploid yeast Nature 455 1251ndash1254

5 McNicollF DrummelsmithJ MullerM MadoreE BoilardNOuelletteM and PapadopoulouB (2006) A combinedproteomic and transcriptomic approach to the study ofstage differentiation in Leishmania infantum Proteomics 63567ndash3581

6 LahavT SivamD VolpinH RonenM TsigankovPGreenA HollandN KuzykM BorchersC ZilbersteinDet al (2011) Multiple levels of gene regulation mediatedifferentiation of the intracellular pathogen LeishmaniaFASEB J 25 515ndash525

7 SchwanhausserB BusseD LiN DittmarG SchuchhardtJWolfJ ChenW and SelbachM (2011) Global quantificationof mammalian gene expression control Nature 473 337ndash342

8 FevreEM WissmannBV WelburnSC and LutumbaP (2008)The burden of human African trypanosomiasis PLoS Negl TropDis 2 e333

9 LozanoR NaghaviM ForemanK LimS ShibuyaKAboyansV AbrahamJ AdairT AggarwalR AhnSY et al(2012) Global and regional mortality from 235 causes ofdeath for 20 age groups in 1990 and 2010 a systematic analysisfor the Global Burden of Disease Study 2010 Lancet 3802095ndash2128

10 Martinez-CalvilloS YanS NguyenD FoxM StuartK andMylerPJ (2003) Transcription of Leishmania major Friedlinchromosome 1 initiates in both directions within a single regionMol Cell 11 1291ndash1299

3634 Nucleic Acids Research 2014 Vol 42 No 6

11 Martinez-CalvilloS NguyenD StuartK and MylerPJ (2004)Transcription initiation and termination on Leishmania majorchromosome 3 Eukaryot Cell 3 506ndash517

12 SiegelTN HekstraDR KempLE FigueiredoLMLowellJE FenyoD WangX DewellS and CrossGAM(2009) Four histone variants mark the boundaries of polycistronictranscription units in Trypanosoma brucei Genes Dev 231063ndash1076

13 KolevNG FranklinJB CarmiS ShiH MichaeliS andTschudiC (2010) The transcriptome of the human pathogenTrypanosoma brucei at single-nucleotide resolutionPLoS Pathog 6 e1001090

14 LeBowitzJH SmithHQ RuscheL and BeverleySM (1993)Coupling of poly(A) site selection and trans-splicing inLeishmania Genes Dev 7 996ndash1007

15 UlluE MatthewsKR and TschudiC (1993) Temporal order ofRNA-processing reactions in trypanosomes rapid trans splicingprecedes polyadenylation of newly synthesized tubulin transcriptsMol Cell Biol 13 720ndash725

16 MatthewsKR TschudiC and UlluE (1994) A commonpyrimidine-rich motif governs trans-splicing and polyadenylationof tubulin polycistronic pre-mRNA in trypanosomes Genes Dev8 491ndash501

17 WrightJR SiegelTN and CrossGAM (2010) Histone H3trimethylated at lysine 4 is enriched at probable transcriptionstart sites in Trypanosoma brucei Mol Biochem Parasitol 136434ndash450

18 ClaytonC and ShapiraM (2007) Post-transcriptional regulationof gene expression in trypanosomes and leishmanias MolBiochem Parasitol 156 93ndash101

19 MatthewsKR (2011) Controlling and coordinating developmentin vector-transmitted parasites Science 331 1149ndash1153

20 CrossGA KleinRA and LinsteadDJ (1975) Utilization ofamino acids by Trypanosoma brucei in culture L-threonine as aprecursor for acetate Parasitology 71 311ndash326

21 BrunR and SchonenbergerM (1979) Cultivation and in vitrocloning or procyclic culture forms of Trypanosoma brucei in asemi-defined medium Acta Trop 36 289ndash292

22 FurgerA SchurchN KurathU and RoditiI (1997) Elementsin the 30 untranslated region of procyclin mRNA regulateexpression in insect forms of Trypanosoma brucei bymodulating RNA stability and translation Mol Cell Biol 174372ndash4380

23 HehlA VassellaE BraunR and RoditiI (1994) A conservedstem-loop structure in the 30 untranslated region of procyclinmRNAs regulates expression in Trypanosoma brucei Proc NatlAcad Sci USA 91 370ndash374

24 HotzHR HartmannC HuoberK HugM and ClaytonC(1997) Mechanisms of developmental regulation in Trypanosomabrucei a polypyrimidine tract in the 30-untranslated region of asurface protein mRNA affects RNA abundance and translationNucleic Acids Res 25 3017ndash3026

25 MayhoM FennK CraddyP CrosthwaiteS and MatthewsK(2006) Post-transcriptional control of nuclear-encoded cytochromeoxidase subunits in Trypanosoma brucei evidence for genome-wide conservation of life-cycle stage-specific regulatory elementsNucleic Acids Res 34 5312ndash5324

26 WalradP PaterouA Acosta-SerranoA and MatthewsKR(2009) Differential trypanosome surface coat regulation by aCCCH protein that co-associates with procyclin mRNA cis-elements PLoS Pathog 5 e1000317

27 HelmJR WilsonME and DonelsonJE (2009) Differentialexpression of a protease gene family in African trypanosomesMol Biochem Parasitol 163 8ndash18

28 HornD (2008) Codon usage suggests that translational selectionhas a major impact on protein expression in trypanosomatidsBMC Genomics 9 2

29 ManfulT FaddaA and ClaytonC (2011) The role of the 50-30

exoribonuclease XRNA in transcriptome-wide mRNAdegradation RNA 17 2039ndash2047

30 YoffeY ZuberekJ LewdorowiczM ZeiraZ KeasarCOrr-DahanI Jankowska-AnyszkaM StepinskiJDarzynkiewiczE and ShapiraM (2004) Cap-binding activity ofan eIF4E homolog from Leishmania RNA 10 1764ndash1775

31 YoffeY ZuberekJ LererA LewdorowiczM StepinskiJAltmannM DarzynkiewiczE and ShapiraM (2006) Bindingspecificities and potential roles of isoforms of eukaryotic initiationfactor 4E in Leishmania Eukaryot Cell 5 1969ndash1979

32 De GaudenziJ FraschAC and ClaytonC (2005) RNA-bindingdomain proteins in Kinetoplastids a comparative analysisEukaryot Cell 4 2106ndash2114

33 DhaliaR ReisCR FreireER RochaPO KatzRMunizJR StandartN and de Melo NetoOP (2005)Translation initiation in Leishmania major characterisation ofmultiple eIF4F subunit homologues Mol Biochem Parasitol140 23ndash41

34 ZinovievA and ShapiraM (2012) Evolutionary conservation anddiversification of the translation initiation apparatus intrypanosomatids Comp Funct Genomics 2012 813718

35 SiegelTN HekstraDR WangX DewellS and CrossGAM(2010) Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicingand polyadenylation sites Nucleic Acids Res 38 4946ndash4957

36 NilssonD GunasekeraK ManiJ OsterasM FarinelliLBaerlocherL RoditiI and OchsenreiterT (2010) Spliced leadertrapping reveals widespread alternative splicing patterns in thehighly dynamic transcriptome of Trypanosoma brucei PLoSPathog 6 e1001037

37 SiegelTN GunasekeraK CrossGA and OchsenreiterT(2011) Gene expression in Trypanosoma brucei lessons fromhigh-throughput RNA sequencing Trends Parasitol 27 434ndash441

38 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principles ofits regulation Nat Rev Mol Cell Biol 11 113ndash127

39 KozakM (1984) Selection of initiation sites by eucaryoticribosomes effect of inserting AUG triplets upstream from thecoding sequence for preproinsulin Nucleic Acids Res 123873ndash3893

40 SomersJ PoyryT and WillisAE (2013) A perspective onmammalian upstream open reading frame function Int JBiochem Cell Biol 45 1690ndash1700

41 SiegelTN TanKS and CrossGAM (2005) Systematic studyof sequence motifs for RNA trans splicing in Trypanosomabrucei Mol Cell Biol 25 9586ndash9594

42 AravaY WangY StoreyJD LiuCL BrownPO andHerschlagD (2003) Genome-wide analysis of mRNA translationprofiles in Saccharomyces cerevisiae Proc Natl Acad Sci USA100 3889ndash3894

43 CapewellP MonkS IvensA MacgregorP FennKWalradP BringaudF SmithTK and MatthewsKR (2013)Regulation of trypanosoma brucei total and polysomal mRNAduring development within its mammalian host PLoS One 8e67069

44 BrechtM and ParsonsM (1998) Changes in polysome profilesaccompany trypanosome development Mol Biochem Parasitol97 189ndash198

45 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

46 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

47 OhE BeckerAH SandikciA HuberD ChabaR GlogeFNicholsRJ TypasA GrossCA KramerG et al (2011)Selective ribosome profiling reveals the cotranslational chaperoneaction of trigger factor in vivo Cell 147 1295ndash1308

48 MichelAM and BaranovPV (2013) Ribosome profiling aHi-Def monitor for protein synthesis at the genome-wide scaleWiley Interdiscip Rev RNA 349 4184ndash4188

49 IngoliaNT (2010) Genome-wide translational profiling byribosome footprinting Methods Enzymol 470 119ndash142

50 IngoliaNT BrarGA RouskinS McGeachyAM andWeissmanJS (2012) The ribosome profiling strategy formonitoring translation in vivo by deep sequencing ofribosome-protected mRNA fragments Nat Protoc 7 1534ndash1550

51 LangmeadB and SalzbergSL (2012) Fast gapped-readalignment with Bowtie 2 Nat Methods 9 357ndash359

Nucleic Acids Research 2014 Vol 42 No 6 3635

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 2: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

surrounding RNA polymerase II (pol II) transcriptionstart sites (1217) strongly indicate a lack of regulationof gene expression at the level of pol II transcriptioninitiation This lack of transcriptional regulation shouldgreatly facilitate the effort to quantify the extent towhich different post-transcriptional mechanisms contrib-ute to gene expressionDespite the lack of transcriptional control L major

T cruzi and T brucei are capable of tightly regulatinggene expression throughout their complex life cycleswhich requires the adaptation of the parasites to thelarge environmental differences found between the insectvectors and mammalian hosts (1819) In T brucei thesedifferences include among others a change in tempera-ture from 37C in the mammalian host to 27C in theinsect vector and a change in the availability of glucosethe preferred energy source of the parasite (2021)Consequently significant effort has been invested inelucidating the post-transcriptional mechanisms of genecontrol in these parasites Work from several laboratorieshas led to the identification of sequence motifs capable ofmodulating RNA stability or translational efficiency in alife cycle-specific manner (22ndash27) a bias in codon usagesuggested to affect translational efficiency (28) and agenome-wide analysis of mRNA stability has found half-lives among transcripts to range over two orders of mag-nitude in T brucei (29)A comparable study to evaluate the degree of regulation

at the level of protein translation has not been performedeven though multiple observations suggest translationalcontrol as an important regulator of gene expressionFor example kinetoplastid genomes encode moreproteins involved in translation initiation control thanthose of yeast or many metazoans including two tothree homologues for poly(A) binding proteins as well asfour isoforms of the translation initiation factor (eIF4E)and five eIF4G isoforms (30ndash34) Moreover RNA-sequencing analyses performed in T brucei revealed wide-spread alternative trans-splicing resulting in multipletranscripts with varying 50 untranslated region (UTR)lengths for a particular gene (133536) These findingssuggest that variations in 50 UTR lengths may lead tothe in- or exclusion of regulatory elements that influencetranslational efficiency (37) Such regulatory elementscould be small so-called upstream open reading frames(uORFs) or micro ORFs which are ORFs located in the50 UTR of mRNA In eukaryotes translation initiationinvolves the recruitment of a pre-initiation complex tothe 50 end of mRNA followed by scanning of the UTRto locate the first AUG start codon (38) uORFs can affectthe scanning process and have been shown to influencetranslation of the downstream coding sequence (CDS)by affecting both the rate and the site of translation initi-ation (3940) Based on RNA-sequencing data in T brucei20 of 50 UTRs contain at least one uORF (3536) Inaddition it has been shown for a luciferase reporterthat the removal of an upstream start-codon can lead toa 7-fold increase in protein levels (41)In the past approaches to identify and quantify

translated RNA transcripts have focused on the isolationof RNA transcripts from polyribosome fractions (42)

While this approach has been successfully used to demon-strate differential translation of RNA in T brucei (4344)we have adapted a higher resolution approach termedribosome profiling that was recently developed (45) Thisapproach is based on high-throughput sequencing ofribosome-protected RNA fragments so-called ribosomelsquofootprintsrsquo RNA not protected by ribosomes isremoved by nuclease digestion ribosome footprints areconverted into libraries of DNA molecules suitable forhigh-throughput sequencing and the abundance of indi-vidual footprints is determined by deep sequencing Inyeast mice and Escherichia coli measurements of theaverage ribosome occupancy (ribosome density profiles)have been successfully used to estimate the rates of trans-lation (rates of protein synthesis) (45ndash47) Ribosomedensity while not perfect has been shown to be a muchbetter predictor of protein levels than measurements ofmRNA levels (45)

Importantly this approach not only provides quantita-tive information on the number of RNA moleculesassociated with ribosomes it also yields position-specificinformation regarding the location of ribosomes onmRNA transcripts This information is critical as associ-ation of an RNA transcript with a ribosome does notnecessarily mean that the transcript is being translatedThe ribosome could be associated with the 50 UTR or itcould be stalled failing to produce functional protein Inaddition ribosome positions can be used to accurately mapthe correct CDSs and help identify novel CDSs that havebeen missed in previous genome annotations (48)

Here we report the adaptation of a ribosome profilingapproach to T brucei Comparative ribosome profilinganalyses of the bloodstream and procyclic parasite formsallowed us to generate a genome-wide picture of transla-tion We observed a 100-fold range in translationalefficiency among genes and life cycle-specific differencesin translational efficiency for a large number of genes Inaddition our ribosome profiling data enabled the identi-fication of hundreds of previously un-annotated CDSsand incorrectly annotated translation initiation sites andsuggest a regulatory role of uORFs

MATERIALS AND METHODS

T brucei culture and cell harvest

The procyclic form (PF) of T brucei strain Lister 427 werecultured at 27C in SDM-79 supplemented with 10 fetalbovine serum and hemin (75mgl) medium (21) up to adensity of 107 cellsml Wild-type bloodstream form (BF)of Lister 427 (MITat 12 clone 221) were cultured at 37Cin HMI-11 up to a cell density of 15 106 cellsml Twominutes before harvest cycloheximide was added to a finalconcentration of 100 mgml Cells were collected by centri-fugation at 3000 g 4C for 5min washed with polysomelysis buffer (10mM Tris-HCl pH 74 300mM KCl and10mM MgCl2) transferred to a 15-ml microcentrifugetube and pelleted To lyse the cells 360ml polysome lysisbuffer 40 ml of 10 n-octylglycoside and 20 units ofTURBO DNaseI (Ambion) were added per 109 cells andincubated for 30min on ice The lysate was centrifuged at

3624 Nucleic Acids Research 2014 Vol 42 No 6

16 000 g 4C for 10min the supernatant transferred to anew microcentrifuge tube and the OD260 was determinedusing a Nanodrop 2000

Preparation of RNA footprint and mRNA sequencinglibraries

For both BF and PF 200ml aliquots of the lysate(OD260=40) were digested with RNase I (Ambion) atRT (1200 units) or on ice (1600 units) After 1 h the diges-tions were stopped by adding 100 units of SUPERaseInRNase inhibitor (Ambion) to the RNase-treated samplesIn parallel 100 units of SUPERaseIn RNase inhibitorwere added to a 200 ml aliquot of lysate not containingRNase I (undigested control) Monosomes were purifiedusing sucrose gradients as described previously (45)

Total RNA from undigested lysates and the footprintscollected in the monosome fractions were purified usinghot acid PhenolndashChloroformndashIsoamyl alcohol (vvv25241) at 65C (49) To generate mRNA sequencinglibraries undigested RNA was polyA-enriched using aDynabeads mRNA Purification Kit (Ambion) accordingthe manufacturerrsquos instructions The polyA-enrichedRNA was fragmented by incubation with an RNAFragmentation Reagent (Ambion) at 70C for 30minSuccessful fragmentation was monitored on a 15denaturing-PAGE gel Both ribosome footprints andfragmented mRNA (26ndash34 nt) were size-selected by elec-trophoresis using a 15 denaturing-PAGE gel and twocustom-made (IDT) synthetic RNA markers [50-AUGUACACGGAGUCGAGCUCAACCCGCAACGCGA-(Phos)-30] and [50-AUGUACACGGAGUCGACCCAACGCGA-(Phos)-30]

Sequencing libraries were prepared as described (50)except for the omission of an rRNA depletion stepduring the footprint library preparation and the use ofKAPA HiFi polymerase (Kapa Biosystems) for the finalamplification step

Pre-processing and mapping of read

Reads from all libraries were processed and mapped usingthe same parameters Adapter sequence (ie CTGTAGGCACCATCAAT) was trimmed from the 30 end of readsusing cutadapt (httpcodegooglecompcutadapt) andreads shorter than 20 nt after trimming were removedTrimmed reads were mapped to the reference genomeusing bowtie-2 with default lsquolocal-sensitiversquo mode (51)Genome as well as gene annotations of strain 927version 42 were downloaded from EuPathDB (52) andused as the reference in all analyses

Calculation of abundance translational efficiency and read50-end periodicity

A read is considered to map to a region when its midpointfalls within the annotated range Abundance of a region ina library was defined as reads mapped per kilobase permillion non-structural RNA reads (rpkm) Non-structuralRNA refers to the reads mapped to the genome excludingannotated rRNA and tRNA regions Translational effi-ciency of a region was defined as its ratio of abundancein the ribosome profiling library to that of the control

RNA library of the same life cycle stage To investigatethe distinct 3-nt periodicity of the ribosome profilinglibraries 50 ends of the mapped reads were piled up andthe piled-up coverage of each position from each gene(using either start or stop codon as the reference point)were summed up in meta-gene analyses Genes with lessthan 10 reads within the plotted region were excluded

Definition of upstream ORFs and novel CDSs

A uORF was defined as any ORF 9 nt that was locatedwithin an annotated 50 UTR and was on the same strandas the annotated ORF Novel CDSs are defined as anytranslated ORF 30 nt located at least 20 nt away fromany existing annotations In cases in which multipleoverlapping novel ORFs were found on different framesof the same strand only the longest were retained AnORF is defined as translated when 70 of its CDS isbeing covered by the pooled ribosome profiling reads (iefour libraries) with 2 reads per nucleotide

Calculation of ribosome release score

Ribosome release scores (RRSs) of all ORFs includingannotated CDSs putative CDSs and uORFs werecalculated as described with slight modifications (53)Briefly a pseudo 30 UTR region (pUTR3) was firstassigned to all ORFs pUTR3 of an ORF was defined asthe region between its stop codon and a downstream startcodon in any of the three frames on the same strandReads mapped within the CDSs and pUTR3 perkilobase (ie CDSrpk and pUTR3rpk) in both ribosomeprofiling and RNA-sequencing control libraries werecalculated for each ORF The ratio of CDSrpk topUTR3rpk in both ribosome profiling and RNA-sequencing control libraries (ie riboratio and RNAratio)were calculated RRS of each ORF was then defined asthe ratio of riboratio to RNAratio

Identification of novel putative CDSs important forparasitersquos fitness

We reanalysed the data of a published genome-wide RNAinterference fitnessndashcosts association study (54) with ournewly defined CDSs Briefly short read data were down-loaded from European Nucleotide Archive under acces-sion number ERP000431 Reads were then mapped tothe genome using bowtie2 with default lsquolocal-sensitiversquomode The number of reads mapping to the annotatedand putative CDSs was counted To identify thoseputative CDSs that have significantly less reads mappedin the RNA interference libraries which potentially con-tribute to the parasitersquos fitness we performed differentialread count analyses on the pair-wise library comparisons(listed below) using three softwares including DESeq1(55) DESeq2 (httpwwwbioconductororgpackagesreleasebiochtmlDESeq2html) and EdgeR (56) withdefault settings CDSs that were identified to have signifi-cantly less reads in the RNA interference libraries by allthree algorithms (Plt 005 in each case) linear fold change 5 were considered to be potentially related to the loss offitness The pair-wise library comparisons include BFuninduced versus BF induced for 6 days BF uninduced

Nucleic Acids Research 2014 Vol 42 No 6 3625

versus differentiated cells induced and BF uninducedversus PF induced

Analysis of proteomics data

All mass spectrometric raw data files from Butter et al(57) were processed as described except that the data weresearched additionally with the previously un-annotatedCDSs listed in Supplementary Tables S4 and S5 BrieflyMaxQuant 1412 including the Andromeda search engine(5859) was used for processing raw data and databasesearching The experimental design template used was asdescribed before (57) The search was performed against acombination of three databases a T brucei database con-taining 19 103 annotated protein entries for strains 427and 927 (version 50 Tb427 and Tb927 downloadedfrom httptritrypdborg) a list of uORFs and putativeCDSs identified in this study (Supplementary Tables S4and S5) and a database containing typical contaminantsEnzyme search specificity was TrypsinP with threemiscleavages for tryptic digests and LysC with twomiscleavages for digests with lysyl endopeptidaseCarbamidomethylation on cysteines was set as fixed modi-fication methionine oxidation and protein N-terminalacetylation were considered as variable modificationsMass accuracy tolerances (after recalibration) were6 ppm for precursor ions and 05Da for CID MSMSand 20 ppm for HCD MSMS spectra respectivelyFalse discovery rate was fixed at 1 at peptide andprotein level with posterior error probability (PEP) assorting criterion Identifications were matched betweenruns within a 2-min window Protein groups identified inthe potential CDS database had at least one uniquepeptide and a PEPlt 00005

RESULTS

Ribosome footprints reveal ribosome position at sub-codonresolution

To obtain genome-wide information on protein synthesiswe adapted a ribosome profiling protocol published inyeast and mice (4546) for use in T brucei We sequencedribosome footprints and fragmented mRNA from the twotrypanosome stages that have been adapted to in vitroculture the PF of the tsetse fly midgut and theproliferating BF that lives in the blood of the mammalianhost Cultured cells were treated with cycloheximide toarrest translating ribosomes and harvested Becausecycloheximide has been shown to stabilize RNA tran-scripts in T brucei (60) the treatment was limited to2min and applied to both cells used for ribosomeprofililng and cells used for RNA-sequencing analysesTo maximize the accuracy of the ribosome footprintswe optimized the digestion efficiency by testing differentnuclease concentrations and reaction temperatures(lsquoMaterials and Methodsrsquo section and SupplementaryFigure S1) Next to map translated regions of RNA andto exclude RNase-protected regions that are scanned bythe 43S pre-initiation complex we used sucrose gradientsto specifically enrich for monosomes (SupplementaryFigure S1) The high reproducibility between libraries

prepared under different conditions demonstrated the ro-bustness of the ribosome profiling approach despite theelaborate library preparation protocol (SupplementaryFigure S2 BF digestion at 4C versus digestion at roomtemperature R2=09755)

To ensure that nuclease-resistant RNA fragments rep-resent ribosome footprints and not short RNA fragmentsprotected by other RNA-binding proteins we comparedthe genome-wide distribution of sequence reads obtainedfrom ribosome profiling and RNA-sequencing librariesAlignment of ribosome profiling reads revealed that thenuclease-protected reads predominantly mapped alongCDSs with very few reads aligning to intergenic regions(Figure 1A) The 50 nucleotide of the ribosome profilingreads aligned from 12 nt upstream of the ATG codon to18 nt upstream of the stop codon (Figure 1B) Given afootprint length of 28ndash30 nt the ribosome protects mRNAfrom 12 nt upstream of the start codon to 9ndash11 nt down-stream of the stop codon In contrast to ribosome foot-print reads RNA-sequencing reads were aligned along theCDSs and UTRs as expected (Figure 1A) In additionto the enrichment across CDSs the alignment ofribosome profiling reads revealed a distinct 3-nt period-icity with 68 of BF ribosome profiling reads starting atthe first nucleotide of a codon (Figure 1B and C andSupplementary Figure S3) No such 3-nt periodicity wasobserved for the mRNA sequence reads

Only two T brucei genes have introns one encoding apoly(A) polymerase and the other encoding a DNARNAhelicase (6162) Surprisingly there is almost no decreasein sequence reads detectable by RNA-sequencing acrossthe introns while the ribosome profiling data clearly showthat the intronic RNA is not translated (Figure 2A) Thesedata indicate that the process of cis-splicing must be un-usually inefficient in trypanosomes but that a tightcontrol mechanism ensures that only mature mRNAsenter the translational machinery

The strong enrichment of ribosome profiling readsacross CDSs the lack of ribosome profiling reads acrossintrons and the observed 3-nt periodicity are characteristicof ribosome-protected RNA footprints and argue againstunspecific RNA protection (45) Thus ribosome profilingreads indeed represent ribosome footprints and providea means to measure protein synthesis and in additionto identify both previously unannotated CDSs as well asnon-translated transcripts previously annotated as coding(for example Figure 1A Tb927102360) The latter maybe either incorrectly annotated or not translated in thetwo life cycle stages examined

Identification of translation initiation sites

Genome-wide mapping of spliced leader acceptor sites(SASs) ie the 50 end of the 50 UTR to which thespliced leader RNA is trans-spliced revealed hundredsof instances in which SASs mapped within annotatedCDSs or far upstream of annotated CDSs (133536)These findings indicated that for many genes the trueCDS was shorter than the annotated CDS while forothers it allowed the possibility of translation initiationat an AUG upstream of the annotated translation

3626 Nucleic Acids Research 2014 Vol 42 No 6

initiation site However while RNA-sequencing allowedthe unambiguous identification of mis-annotation formany genes translation initiation does not always occurat the first AUG downstream of the SAS Thus the use-fulness of RNA-sequencing to identify the true translationinitiation sites is limited and depends on the genomiccontext

In contrast to RNA-sequencing ribosome profilingallows the determination of the translational landscapeThus it allows the identification of CDS mis-annotationsand the determination of the most probable translationinitiation sites Exemplary for numerous genes forTb92751990 we detected no translation between thefirst and the second AUG (green bars) indicating thatthis region is not translated in the BF and PF and thattranslation initiates at the second AUG (Figure 2Bleft panel) For Tb927102720 we observed strong trans-lation upstream of the annotated CDS The fact thattranslation starts at an upstream AUG which is inframe with the annotated translation initiation sitesuggests that the CDS of Tb927102720 is longer thanannotated (Figure 2B right panel) These examplesshow that ribosome profiling data can be used for there-annotation of incorrectly annotated CDSs

Ribosome profiling reveals extensive translational control

While the role of RNA stability has been investigated on agenome-wide scale in T brucei (29) global regulation atthe levels of protein translation has not been analysedTherefore we decided to apply the ribosome profilingapproach to evaluate the degree of regulation occurringat the level of translation in trypanosomesTo estimate the rate of protein synthesis in trypano-

somes we determined the ribosome footprint density

AUG

28 nt - ribosome footprint

CDS position (nt from start)

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

P s

iteA

site

UAA

CDS position (nt from stop)

P s

iteA

site

ribosome footprint

frame

perc

enta

ge o

f rea

ds

0

20

0

40

60

80

1 2 0 1 2

mRNA

-24 -18 -12 -6 0 6 12 18 24 -24 -18 -12 -6 0 6 12

foot

prin

t (R

PM

)m

RN

A (

RP

M)

mR

NA

(R

PM

)

C

B

A

0

2

4

0

2

4

6

0

2

4mRNA density in PF

ribosome footprint density in PF

mRNA denstiy in BF

foot

prin

t (R

PM

)

0

0

02

04

1

2

3

600000

Tb927102320

Tb927102330

Tb927102340

Tb927102350

Tb927102370

Tb927102360

Chr X 602000 604000 606000 608000

ribosome footprint density in BF

Figure 1 Ribosome footprints reveal coding sequences at sub-codonresolution (A) Ribosome footprints are strongly enriched acrossCDSs mRNA densities and ribosome densities are shown as readsper nucleotide per million reads (RPM) to normalize for differencesin library size (B) Alignment of the 50 nucleotide from ribosome foot-print reads that map close to translation start or translation termin-ation sites Blue boxes mark the approximate size of the ribosomefootprint (C) Percentage of position of sequence reads relative toreading frame

00

10

20

813000Chr III Chr VIII

814000

815000

816000

00010203

0

2

4

490000

492000

494000

00

10

20

0

2

4

0

2

4

00

10

20

615500Chr V Chr X

616000

616500

617000

699000

700000

701000

00

02

04

mR

NA

(R

PM

)

mR

NA

(R

PM

)

foot

prin

t (R

PM

)

foot

prin

t (R

PM

)

mR

NA

(R

PM

)

mR

NA

(R

PM

)

foot

prin

t (R

PM

)

foot

prin

t (R

PM

)

Tb92751990 Tb927102720

Tb92733160 Tb92781510

B

A

Figure 2 Ribosome footprints reveal translated regions (A) Ribosomeprofiles for the two intron-containing genes Introns are represented asa dashed line (B) Ribosome profiles of two possibly mis-annotatedCDSs Black bars mark annotated CDSs Grey bars mark CDSs pre-dicted based on ribosome profiles Green boxes represent ATG codonsand red boxes represent stop codons

Nucleic Acids Research 2014 Vol 42 No 6 3627

across CDSs [measured as reads per million reads per kb(rpkm) see lsquoMaterials andMethodsrsquo section] Of 77 millionPF and 44 million BF ribosome footprint sequence reads28 (PF) and 14 (BF) could be aligned to annotatedCDSs while 70 (PF) and 85 (BF) mapped to knownstructural non-coding RNA (ncRNA) The high percent-age of structural ncRNA in the footprint fraction has beenreported previously (45) and results from the large amountof rRNA in the monosome fraction The lower percentageof CDS reads in the BF sample compared with the PFsample may be due to libraries being generated from thewild-type 427 strain while reads were aligned to the better-annotated genome of the TREU 927 strain In the BFsome of the most highly expressed genes encoding forvariant surface glycoproteins differ in sequence betweenthe 427 and 927 strains and footprints from these geneswere not aligned in this analysisNevertheless for 8072 genes (82 of annotated CDSs)

we were able to detect 10 ribosome footprint reads athreshold that we defined as translation (SupplementaryTable S1) We found the rate of translation to differgreatly among proteins In the PF we observed a11 448-fold difference in ribosome density (3548-fold dif-ference in mRNA levels) between the 1 most highly andthe 1 most weakly translated proteins while in the BFwe observed a 1623-fold difference in ribosome densityand a 262-fold difference in mRNA levels (Figure 3A)These findings suggest translational efficiency to signifi-cantly contribute to the regulation of gene expressionTo determine the translational efficiency for individual

genes we normalized for differences in mRNA abundanceie we divided the ribosome footprint density by the levelsof mRNA abundance Our ribosome profiling dataindicate a relatively unbiased distribution of ribosomedensity across CDSs in the BF and a very minor increasein density towards the 50 end of CDSs in the PF(Supplementary Figure S3) Nevertheless wheneverdetermining the translational efficiency we excludedmRNA and footprint reads that mapped to the first 40 ntof a CDS to avoid possible artifacts from ribosome stallingat translation initiation sites Between the 1 most effi-ciently and the 1 least efficiently translated proteins weobserved in the PF a 117-fold and in the BF a 64-fold rangein translational efficiency ie in the amount of proteinsproduced per transcript (Figure 3B and SupplementaryTable S2) This range in translational efficiency is similarto what has been observed in yeast and roughly 10-foldhigher than what has been described for mice (4546)Interestingly we observed no correlation between RNAabundance and translational efficiency (R2=003) sug-gesting that translational efficiency is regulated independ-ently of RNA stability

Life cycle-specific regulation of translational efficiency

Next we addressed the question if translational efficiencyis regulated in a life cycle-specific manner Ribosomedensity measurements only allow meaningful comparisonsof the rate of protein synthesis when the speed of transla-tion is assumed to remain constant This assumption issupported by measurements in mouse embryonic stem

cells (4664) however the speed of translation may verywell not be identical in two life cycle stages that live atdifferent temperatures In addition our current protocoldoes not allow the absolute quantification of mRNA and

-4 -2 0 2 4 6 8 10 12 140

1000

2000

3000

0

500

1000

1500

-80

-60

-40

-20 0

02

04

0

-6 -3 0 3 6 9 12 150

1000

2000

3000

0

1

2

1459000Chr IX

1461000

1463000

1459000Chr IX

1461000

1463000

0

2

4

0

1

2

0

2

4

translational efficiency (log2)

PF rank in translational efficiencyBF

ran

k in

tran

slat

iona

l effi

cien

cy

log2 rpkm log2 rpkm

num

ber

of g

enes

num

ber

of g

enes

0

500

1000

1500

num

ber

of g

enes

num

ber

of g

enesmRNA

footprintsmRNAfootprints

BF

BF PF

PF

BF mRNA density PF mRNA density

BF footprint density PF footprint density

high

all genes N = 7782

PUF (Pumillio) N = 10

cytochrome oxidases N = 8

alternative oxidase

enzymes requiredfor glycolysis N = 10

A

B

D

C

mR

NA

(R

PM

)fo

otpr

int (

RP

M)

mR

NA

(R

PM

)fo

otpr

int (

RP

M)

Tb092111070 Tb092111070

low

low

-80

-60

-40

-20 0

02

04

0

translational efficiency (log2)

Figure 3 Translational efficiency is regulated (A) Histograms ofmRNA abundance and ribosome density (rate of protein synthesis)for BF (left panel) and PF parasites (right panel) (B) Histogram oftranslational efficiency (ratio of ribosome footprint density to mRNAabundance) for BF (left panel) and PF parasites (right panel) (C) Pair-wise comparisons of translational efficiency in PF and BF CDSs wereranked based on translational efficiency (1=highest translational effi-ciency 7782= lowest translational efficiency) in BF and PF Ranksbetween life cycle stages show a correlation of R=07428 Genefamilies with developmentally regulated translational efficiencies arecolour coded Pumillio genes (red) cytochromes oxidase (blue) genesrequired for glycolysis (63) (green) and the alternative oxidase(Tb927107090 black) (D) Ribosome footprint profile of a gene withdevelopmentally regulated translational efficiency Green arrow indi-cates direction of transcription

3628 Nucleic Acids Research 2014 Vol 42 No 6

ribosome footprint levels in the two different life cyclestages Therefore we decided to compare the overallchanges in translational efficiency within each life cyclestage To this end we ranked the genes in each life cyclestage based on their relative translational efficiency andcompared their rank between the two life cycle stagesA pair-wise comparison indicated a generally positive cor-relation between translational efficiency in the two stages(Figure 3C Pearsonrsquos correlation coefficient=07428Plt 00001 95 CI 07327ndash07526) but it also revealeddistinct life cycle-specific differences for subsets of genesFor example 58 genes found in the lowest 25 in terms oftranslational efficiency in the PF were among the top 25most efficiently translated proteins in the BF Thesecontain various RNA binding proteins numerous phos-phatases and expression site-associated genes (Table 1 andSupplementary Table S3) For an example of a genemore efficiently translated in the BF than in the PF seeFigure 3D

Based on mRNA and protein level measurements differ-ential translational efficiency has been predicted foraconitase the cytochrome oxidases V and VI and procyclins(222565) For all of these genes our data indicate anincreased translational efficiency in the PF (Table 1) thusconfirming previous observations and validating our meas-urements For selected groups of proteins eg annotated ascytochrome oxidase or those required for glycolysis wenoticed the translational efficiency to correspond to the

changes that have been described for the glucosemetabolism(Figure 3C) Whereas the BF is entirely dependent on theglycolytic pathway to generate energy the PF living in theinsect midgut where glucose availability is uncommon orabsent has been shown to contain a much more elaborateenergy metabolism (66) In contrast the efficient translationof the group of PUF (Pumilio and FBF) proteins in PF wasunexpected (Figure 3C) PUF proteins regulate translationand mRNA stability by binding sequences in their targetRNAs and have been shown to affect gene expression inorganisms as divergent as Plasmodium falciparumT brucei humans and yeast (67ndash69) Interestingly wefound both poly(A) binding proteins all four isoforms ofeIF4Es and four of the five eIF4G isoforms all proteinsinvolved in control of translation initiation to be more effi-ciently translated in the PF than in the BF (SupplementaryTable S2)Given the large dynamic range and the life cycle-specific

regulation we propose that differences in translationalefficiency contribute substantially to the control of geneexpression in T brucei

uORFs are frequently translated and may regulatetranslational efficiency

The broad 100-fold range in translational efficiencyamong genes and the differences in translationalefficiencies between life cycle stages raise the question of

Table 1 Developmentally regulated translation

Gene ID Description BF rank PF rank Change inrank (PFndashBF)

Translation up-regulated in BFTb92773250 Expression site-associated gene 6 (ESAG6) protein putative 1372 7405 6033Tb92743980 Chaperone protein DNAj putative 1301 7248 5947Tb927104780 GPI inositol deacylase precursor (GPIdeAc) 390 5994 5604Tb92788140 Small GTP-binding rab protein putative 1217 6804 5587Tb92763480 RNA-binding protein putative (DRBD5) 1856 7262 5406Tb11014701 Membrane-bound acid phosphatase 1 precursor (MBAP1) 689 6065 5376Tb92714650 Cyclin-like F-box protein (CFB2) 10 5289 5279Tb92735660 UDP-Gal or UDP-GlcNAc-dependent glycosyltransferase putative 797 6041 5244Tb92726000 Glycosylphosphatidylinositol-specific phospholipase C (GPI-PLC) 1634 6793 5159Tb92745310 Serinethreonine-protein kinase a putative 1656 6776 5120

Translation up-regulated in PFTb9275440 trans-Sialidase putative 6785 418 6367Tb92776850 trans-Sialidase (TbTS) 7513 1445 6068Tb92777470 Receptor-type adenylate cyclase GRESAG 4 putative 7001 1268 5733Tb92712120 Calpain putative 6258 739 5519Tb9274360 12-Dihydroxy-3-keto-5-methylthiopentene dioxygenase putative 6587 1089 5498Tb091605550 Calpain-like cysteine peptidase putative 6936 1868 5068Tb92787690 Amino acid transporter (pseudogene) putative 6693 1731 4962Tb92781610 MSP-B putative 7199 2272 4927Tb11016650 Serinethreonine-protein kinase putative 5280 363 4917Tb92777110 Leucine-rich repeat protein (LRRP) putative 6501 1601 4900

Genes for which translational up-regulation in PF was previously shownTb92710280 Cytochrome oxidase subunit VI (COXVI) 2820 810 2010Tb091601820 Cytochrome oxidase subunit V (COXV) 935 291 644Tb9271014000 Aconitase (ACO) 657 31 626Tb9276510 GPEET2 procyclin precursor 3848 365 3483Tb9271010260 EP1 procyclin (EP1) 4682 34 4648Tb9271010220 Procyclin-associated gene 2 (PAG2) protein (PAG2) 6903 5375 1528Tb9275330 Receptor-type adenylate cyclase GRESAG 4 putative 3779 809 2970

List of genes with highest change in translational efficiency between PF and BF List does not include hypothetical genes

Nucleic Acids Research 2014 Vol 42 No 6 3629

how protein translation is regulated For a small numberof genes in T brucei sequence motifs in the 30 UTR havebeen linked to developmentally regulated changes in trans-lational efficiency but for the large majority of genes nosuch motifs have been identified (70) Numerous reportsfrom other eukaryotes demonstrated that uORFs canfunction as important regulators of gene expression (71)For example analysis of mRNA and protein levels fromgt10 000 mammalian genes revealed that the presence ofan uORF correlates with a significant reduction of expres-sion from the downstream CDS Furthermore usingreporter constructs it was estimated that the presence ofan uORF results in a decrease in gene expression of 30ndash80 with only a minor reduction of mRNA levels (72)No such analyses have been performed for T brucei but ithas been shown that the removal of a uATG leads to a 7-fold increase in protein levels of a luciferase reporter (41)In this manuscript we use the term ORF or uORF to referto regions of DNA sequences beginning with an ATG andending with a termination codon that may or may notencode for proteins The term coding sequence (CDS) isused to refer to ORFs that encode proteins MinimallyuORFs consist of 9 nt containing an upstream start codon(uATG) an additional sense codon and a terminationcodon whereby the termination codon may be locateddownstream of the start codon of the main CDS (73)Even though 50 UTRs have only been assigned for 4909

genes (927 version 42) applying the criteria listed abovewe identified 8310 uORFs and found 1092 (22) of 50

UTRs to contain at least one uORF (SupplementaryTable S4) Thus uORF are much less common inT brucei than in mammals where 40ndash50 of genescontain uORFs (7475) Nevertheless while uORFscannot explain the translational regulation for all tran-scripts they may represent one of several factors import-ant for the regulation of gene expression in T bruceiTo determine the degree to what uORFs are translated

a prerequisite for affecting translational efficiency of thedownstream CDS we calculated the ribosome densityacross uORFs and identified 1834 uORFs (22) with 2 read-coverage and 70 of the ORF coveredThe average length of these uORFs was 22 aa Just likefor annotated CDSs the alignment of ribosome profilingreads across these uORFs revealed a distinct 3-nt period-icity with 63 of BF ribosome profiling reads starting atthe first nucleotide of a codon (Figure 4A and B) Whilethe periodicity of ribosome footprints across uORFs wasslightly less pronounced than for annotated CDS ourdata suggest that a large proportion of uORFs istranslatedTo evaluate the regulatory potential of uORFs we

compared the ribosome densities between transcripts con-taining at least one uORF (N=1092) and those withoutuORFs (N=3817) In the BF median ribosome densityacross the CDSs of transcripts with uORFs was3031 rpkm compared with 4486 rpkm for genes withoutuORFs (Table 2) In the PF median ribosome densitieswere 2001 rpkm (genes with uORF) and 3650 rpkm(genes without uORF) Thus in both life cycle stages weobserved a higher ribosome density for genes withoutuORFs than for genes with uORFs (Plt 00001) While

we also observed a higher mRNA level for geneswithout uORFs compared with genes with uORFs(Plt 00001) the differences in mRNA levels were lowerthan the differences in ribosome density (Table 2) Thusour data indicate that median translational efficiencyis higher for genes without uORF than for genes withuORF suggesting that the presence of uORFs may nega-tively impact translation of downstream CDSs

Noteworthy while median ribosome density for CDSswas higher in the BF than in the PF (Table 2) we observedthe opposite for 50 UTRs (Plt 00001) For 50 UTRs wefound a higher ribosome density in the PF (median rpkm773) than in the BF (median rpkm 289 SupplementaryTable S4 for an example see Figure 4C) In addition wefound that in the BF 50 UTRs with uORF (408 rpkm)have a higher ribosome density than 50 UTRs withoutuORFs (254 rpkm Plt 00001) Unexpectedly thecontrary was true in PF In the PF we found 50 UTRswithout uORF (834 rpkm) to have a higher ribosomedensity than 50 UTRs with uORF (501 rpkmPlt 00001) For an example of a non-AUG uORF seeFigure 4C right panel Many of such non-AUG uORFshave been observed in yeast and mice embryonic stem cells(mESCs) Interestingly an increase in 50 UTR translationsimilar to what we observed in the PF compared with theBF was observed in yeast on starvation and a decreasein 50 UTR translation was observed on differentiation ofpluripotent mESCs into embryoid bodies (4546)

Genome-wide analyses to prove or disprove a directcorrelation between translation of uORFs and

00

04

08

411000Chr I

411500 412000

00

02

04

00

04

08

Chr I420000 420200 420400

0

2

4

foot

prin

t (R

PM

)fo

otpr

int (

RP

M)

BF BF

PF PF

Tb92711610 Tb92711670

-24 -18 -12 -6 0 6 12

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

uORF

uORF

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

A B

C

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

-24 -18 -12 -6 0 6

Figure 4 Ribosome footprints reveal translation of uORFs(A) Alignment of the 50 nucleotides from ribosome footprint readsthat map close to translation start or translation termination sites ofuORFs (B) Percentage of position of sequence reads relative to readingframe (C) Ribosome profiles of two genes with uORF (left panel) andwithout uORF (right panel) Narrow grey boxes represent 50 UTRsgreen box represents AUG-codon red box represents terminationcodon

3630 Nucleic Acids Research 2014 Vol 42 No 6

translational efficiency is complicated by the existence ofmultiple 50 UTR isoforms resulting from widespreaddifferential trans-splicing (353676) At the same timethe heterogeneity in UTR lengths raises the intriguingpossibility that inclusion or exclusion of an uORFduring alternative trans-splicing may represent a regula-tory mechanism to modulate translational efficiency

Ribosome footprints reveal the presence of hundreds ofpreviously un-annotated CDSs

Even though small proteins (lt200 aa) have been shown toplay major roles in plant and animal development (7778)the complexity of the short proteome remains largelyunexplored because algorithms to reliably predict shortCDSs are lacking (79) We observed that while almostall ribosome footprints aligned to annotated features inthe BF 101 197 reads (023) and in the PF 1 116 030reads (144) did not It is noteworthy that just as weobserved for 50 UTRs the percentage of reads not aligningto CDSs is higher in the PF than in the BF We suspectthat many of these reads originate from un-annotatedUTRs but some may stem from previously un-annotatedCDSs A previous RNA-sequencing-based analysisdetected 1114 new un-annotated transcripts in the PF1011 of which had the potential to encode one or morepeptides ( 25 aa) Using available proteomics data theauthors were able to confirm translation for 19 of the 1114transcripts (13) Given our ability to determine ribosomepositions at nucleotide resolution we looked for new pre-viously un-annotated CDSs

We mapped ribosome footprint reads from the BF andPF and determined the ribosome footprint density for allpotential CDSs at least 10 aa in length To avoid overlapwith annotated CDSs we only considered ORFs locatedat least 20 nt away from annotated features Thisapproach enabled us to identify a set of 2021 candidateCDSs with 2 read-coverage and 70 of the ORFcovered The candidate CDSs ranged in length from 10to 378 aa (average 30 aa Supplementary Table S5) For797 of the 1114 previously identified transcripts we foundthe average ribosome footprint read-coverage to be 2(Supplementary Table S6 Previously identified transcriptsfrom chromosome 10 were not considered because oursequencing reads were mapped to a newer version of the

chromosome 10 assembly and thus genomic coordinatesmight be incompatible)To determine whether our set of candidate CDSs is

translated we analysed published proteomics datasets(57) to search for protein products For 24 of the 2021candidate CDSs (average size 117 aa 13 kDa) proteinproducts could be identified however four of theseidentified peptides also matched annotated CDSs Inaddition we identified protein products for 31 uORFs(Supplementary Table S4) We suspect that the lownumber of identified protein products relative to thelarge number of candidate CDSs may be caused (i) bythe experimental set-up of the original proteomicsanalysis where proteins were separated by 1D-SDS-PAGE prior to in-gel protease digest and mass spectro-metric analysis whereby proteins with a size smaller than5ndash10 kDa typically run out of the gel (ii) by the limitedsensitivity of mass spectrometric analyses compared withDNA sequencing-based techniques and (iii) by the factthat not all ribosome-protected RNAs are translatedThe first point is supported by the fact that only 67

(3) of our candidate CDSs were 90 aa (10 kDa) butfor 28 of those large candidate CDSs we could identifypeptides Evidence supporting the possibility that not allribosome-protected RNAs are translated exists in micewhere even known long ncRNAs have been found to beassociated with ribosomes (46) To evaluate the codingpotential of a transcript and to separate long ncRNAfrom small coding genes a new metric was establishedthe so-called RRS (53) The RRS takes advantage of thefact that translating ribosomes are released on encounter-ing a stop codon This release results in a sharp decrease inribosome occupancy between protein-coding regions andthe subsequent 30 UTR Consequently the RRS wasdefined as the ratio of footprint reads in the putativeCDS to footprint reads in the corresponding 30 UTRdivided by the ratio of RNA reads in the putative CDSto RNA reads in the corresponding 30 UTR (seelsquoMaterials and Methodsrsquo section)Similarly to annotated CDSs and uORFs our averaged

ribosome footprint data indicated a well-defined 3-nt peri-odicity for sequence reads occurring near translationinitiation sites However we observed a less well-defineddrop in ribosome densities across translation terminationsites (Figure 5A and B) which may indicate a lack of

Table 2 Translational efficiency of genes with and without uORF

Transcripts withuORF (N=3817)

Transcripts withoutuORF (N=1092)

MannndashWhitney test

Bloodstream formRibosome density (median rpkm) 3031 4466 Plt 00001mRNA levels (median rpkm) 3083 3613 Plt 00001Translational efficiency (ribosome densitymRNA levels) 100 127 Plt 00001

Procyclic formRibosome density (median rpkm) 2001 3650 Plt 00001mRNA levels (median rpkm) 2404 3038 Plt 00001Translational efficiency (ribosome densitymRNA levels) 087 126 Plt 00001

Reads mapping within the first 40 nt of a CDS were not considered for the measurements of translational efficiency Number of genes with annotated50 UTR N=4909

Nucleic Acids Research 2014 Vol 42 No 6 3631

productive translation Because we observed an abruptdrop in footprint density downstream of the stop codonof annotated genes in T brucei (Figure 1A and B) wedecided to apply the same parameters to evaluate the like-lihood that ORFs with high footprint density are indeedbeing translated into functional proteins Again candidateCDSs were defined as ORFs with a minimum length of10 aa and an average ribosome footprint coverage of 2over at least 70 of the ORF To define the correspond-ing putative 30 UTR we applied the same parameters usedin mice (53) ie the 30 UTR was defined as the regionbeginning immediately downstream of the ORF andending at the first subsequent start codon (in anyreading frame) A limitation of the RRS is that it canonly be calculated for an ORF if at least one entire foot-print read and one RNA read map to the putative CDSand the corresponding 30 UTR Because many 30 UTRswere too short to fulfill this criterion or simply did notcontain any footprint reads a sign for efficient translationtermination we were able to determine the RRS for only

1445 of genes annotated in T brucei (SupplementaryTable S7) For these 1445 genes the median RRS was839 compared with an RRS of 019 for the subset ofgenes annotated as hypothetical unlikely (N=90) Thussimilar to the observation in mice the RRS appears to bea good predictor of productive translation Next wedetermined the RRS for the previously un-annotated can-didate CDSs that fulfilled the criteria for RRS calculation(N=265) The median RRS of these candidate CDSs waswith 399 lower than the median RRS of the annotatedgenes (839) Thirteen candidate CDSs had an RRS largerthan 839 and 98 candidate CDSs had an RRS higher thanthe well-described intron-containing poly(A) polymerase(RRS=963) The RRS could only be calculated forone of the candidate CDSs for which we had identifiedprotein products (RRS=5744)

Finally in an attempt to learn about the function of theputative CDSs we searched for known protein motifsin those candidate CDSs for which we had identifiedprotein products and for those with an RRS above 10However except for one candidate CDS located withinthe histone H2B gene array and which contained anH2B signature motif (Supplementary Table S5) nomotifs were identified

Taken together proteomics data and a well-definedribosome release (high RRS) suggest extensive translationbeyond the annotated CDSs in T brucei

Newly identified CDSs are important for parasite fitness

The combination of ribosome profiling data with previ-ously published mass spectrometric data allowed us toidentify a large number of new CDSs Nonetheless the bio-logical significance of theseCDSs remains to be determinedIn addition while ORFs with a high ribosome footprintdensity a low RRS and without mass spectrometricevidence may not be translated into functional proteinsthey may nevertheless play important regulatory roles

Therefore we decided to evaluate the importance of ourcandidate CDSs independent of RRS and mass spectro-metric evidence in parasite survival A previously pub-lished high-throughput phenotyping approach termedRIT-seq measured the fitnessndashcost associations usingRNA interference (RNAi) The RIT-seq data revealed asignificant loss in fitness on RNAi induction of 2724 CDSsin the BF 1972 CDSs in the PF and 2677 CDSs in inducedand differentiated parasites (54) The approach is based onthe integration of RNAi libraries into trypanosomes and acomparison of the recovery of RNAi targets from tryp-anosome populations before and after RNAi inductionThe RNAi library was generated using genomic DNAbut the effect on parasite fitness was only determined forannotated genes Re-analysing the same datasets we finda significant loss of fitness on RNAi induction for 214putative CDSs in the BF (6 days after RNAi induction)16 CDSs in the PF and 227 CDSs on differentiation(Supplementary Table S8) For examples of candidateCDSs potentially essential for viability in the BF seeFigure 5C It is important to note that the RNAi libraryused for the RIT-seq experiments contains a median insertsize of 600 bp (average 1 kb) thus somewhat limiting the

01234

0

4

8

0005101520

02468

05

101520

02468

Chr III

622400

622000

622800

0

4

8

Chr X

2577800

2578200

01234

mR

NA

fo

otpr

int

RN

Ai r

eads

R

NA

i rea

ds

A

C

B

putative CDS putative CDS

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

-24 -18 -12 -6 0 6 12 -24 -18 -12 -6 0 6

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

putative CDS

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

(RPM)

Figure 5 Ribosome footprints reveal previously un-annotated CDSs(A) Alignment of the 50 nucleotides from ribosome footprint reads thatmap close to translation start or translation termination sites of candi-date CDSs (B) Percentage of position of sequence reads relative toreading frame (C) Ribosome mRNA and RIT-seq (RNAi targetsequencing) profiles of two previously un-annotated putative CDSs

3632 Nucleic Acids Research 2014 Vol 42 No 6

resolution of the RIT-seq data (8081) Nevertheless thesefindings suggest an important biological role for a subsetof genes that have been missed in previous genome anno-tations Intriguingly the RNAi data reveal large develop-mental differences in the fitness costs associated with thedown-regulation of putative CDSs suggesting that the setof candidate CDSs may be more important during themammalian stage

DISCUSSION

In this study we report the first genome-wide analysis ofprotein synthesis and a strand-specific analysis of RNAtranscript levels for T brucei It is the first such analysisfor a eukaryotic pathogen and an organism without tran-scriptional control Our data reveal large differences intranslational efficiency among transcripts in the same lifecycle stage and between the same transcript in differentstages In addition the sequencing of ribosome-protectedRNA enabled us to identify transcripts that are likely tobe translated Hundreds of putative previously unidenti-fied CDSs appear to be essential for parasite fitness andthe analysis of available proteomics data confirmed theexistence of at least 20 of these previously unknown CDSs

Eukaryotic gene expression is regulated at multiplelevels with translation of mRNA into proteins beingone of the most important (17) The ability to determineboth translatome and transcriptome enabled us toevaluate the efficiency with which individual transcriptsare translated into proteins and suggests translationalcontrol to be an important regulatory mechanism inT brucei Under a single condition we observed transla-tional efficiency to vary over two orders of magnitudethus its importance equals that of RNA stability forwhich a similar range has been measured (82) Howeverwe observed no correlation between RNA abundance andtranslational efficiency suggesting that translational effi-ciency is regulated independently of RNA stability

To use ribosome density to estimate the rate of proteinsynthesis the speed of translation must be constant Thisassumption is supported by measurements in mouse em-bryonic stem cells that demonstrate the rate of translationto be consistent between different classes of mRNA thekinetics of elongation to be independent of length andprotein abundance and the speed of translation to be in-dependent of codon usage (4664) Nevertheless the speedof translation may very well be different between the twolife cycle stages of the parasite that live at 37C and 27CFurthermore our approach does not permit measurementof the absolute mRNA and footprint abundance for theindividual life cycle stages making direct comparisons oftranslation efficiencies for individual transcripts unreli-able Therefore we did not compare translational effi-ciency directly but rather determined the lsquorankrsquo intranslational efficiency for all genes in each life cyclestage assuming that a lsquochange in rankrsquo indicated lifecycle-dependent translational control While we see ageneral positive correlation in translational efficiencybetween the two life cycle stages (Pearson r=07428)for a large number of genes we observed developmental

regulation of translation Importantly the regulation weobserve agrees well with that reported in previous studiesfor a small subset of genes but also includes many proteinspreviously unknown to be developmentally regulated Inaddition the efficient translation of proteins annotated ascytochrome oxidases in the PF and of enzymes importantfor glycolysis in the BF is consistent with differences inenergy metabolism with the BF entirely dependent on gly-colysis to generate energyHow translational control is achieved in T brucei

remains to be seen but our analysis suggests that uORFmay be an important contributor Work in several organ-isms has shown that generally the presence of uORFs cor-relates with reduced gene expression (71) However forsome genes like the transcription factor GCN4 in yeastand the activating transcription factor ATF4 in mammalsthis correlation is reversed on stress (8384) Thus uORFcan exert positive and negative effects on translation ofdownstream CDSs Based on previous 50 UTR annota-tions we identified uORFs in 22 of T brucei genesbut it will be interesting to learn if the regulatory effectsof uORFs may be supplemented by translation from non-AUG uORFs Our ribosome profiling data indicate anincrease in translation from 50 UTRs and a more ubiqui-tous translation initiation from non-AUG sites in thePF compared with the BF Similarly an increase in thetranslation from 50 UTRs has been observed in yeast onstarvation (45) One of the most highly conserved stress-induced mechanisms to regulate translation involves theinactivation of the translation initiation factor eIF2aby phosphorylation at a conserved serine (8586)Phosphorylation of the T brucei and L major orthologsoccurs at the corresponding Thr residues (687) but thegeneration of a T brucei cell line exclusively expressing amutant form of eIF2a demonstrated that phosphorylationof eIF2a is not required for heat-induced translationalarrest or for the formation of heat shock stress granulesin T brucei (88) Given the finding that eIF2a may also beimportant in determining the stringency in initiator codonselection (89) it will be interesting to see whether eIF2aphosphorylation affects the selection of initiator codonsand whether this plays a role in the fine-tuning of geneexpressionBesides being a valuable tool to study translational

regulation ribosome profiling will also be invaluable forthe discovery of small peptides While only 2 of thehuman genome contains annotated CDSs (90) transcrip-tome analyses have revealed almost genome-wide tran-scription of RNA mostly considered to be non-protein-coding In lower organisms the percentage of thegenome encoding for proteins is higher (629192) butpervasive transcription of non-protein coding regionscan be found in essentially all organisms (93) The vastamount of ncRNA has triggered great interest inelucidating the biological significance of these transcriptsand has led to the identification of numerous regulatorymechanisms controlled by short and long ncRNAInterestingly the majority of transcription occurs aslong ncRNA in humans (94) many of which containORFs making it difficult to unambiguously classify a tran-script as non-protein-coding To distinguish long ncRNA

Nucleic Acids Research 2014 Vol 42 No 6 3633

from mRNA ORF length has served as the mostcommonly applied criterion because even without select-ive pressure short ORFs will occur frequently by chancewhile the probability that long ORFs occur by chance arelow Nevertheless recent findings reveal that eukaryoticgenomes contain a large number of small CDSs (95ndash97)and that small peptides play important biological roles(77789899) In addition numerous long ncRNAs con-taining ORFs longer than 100 aa have been found to as-sociate with ribosomes (4653) While these findingscomplicate the annotation of CDSs and demand for newapproaches to annotate genes they raise the exciting pos-sibility that a large number of unknown small CDSs existsand awaits discoveryFor this study RNA was obtained from a 427 strain

the most common laboratory strain and sequence readswere aligned to the more complete genome of the 927strain nevertheless RNA-sequencing analysis indicateswidespread transcription Thus as seen in other organ-isms active transcription is not necessarily a good pre-dictor of CDSs In contrast we found ribosomefootprint density to be highly enriched across annotatedCDSs excluding introns It should thus serve as a reliabletool to identify novel genes To search for new CDSs andto evaluate their biological function we combinedribosome profiling proteomics and genome-wide RNAidata While proteomics data allowed us to identifyprotein products for only 20 relatively large candidateCDSs the experimental set-up of the proteomic analysisthat involved the removal of small peptides made itimpossible to verify the existence of many small CDSsThus more important than the verification of putativeCDSs with proteomics data may be our finding thatgt200 of the putative CDSs appear to be essential forparasite fitness Furthermore RIT-seq data suggest thatcandidate CDSs are 13-fold less likely to be essentialfor viability in the PF than in the BF or during differen-tiation of cells from BF to PF These findings point to apossible role of small peptides in the adaptation of tryp-anosomes to life in the mammalian host Generally ourdata suggest the existence of a large yet to be exploredsmall proteome but further characterizations of thesmall proteome will have to include a more targetedproteomics analysisIn this analysis we focused on measuring translational

control and the identification of new CDSs Howeverribosome profiling data should also aid in the identifica-tion of bona fide ncRNAs For many annotated CDSs weobserved RNA transcripts but not ribosome footprintssuggesting the presence of ncRNA Though given thatwe have only analysed two life cycle stages theseputative ncRNAs may be translated during other stagesof the parasitersquos life cycle Thus a comprehensive searchfor ncRNA should include ribosome profiling analyses ofall life cycle stages

ACCESSION NUMBERS

All sequencing data have been deposited in the EuropeanNucleotide Archive PRJEB4801

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Stan Gorski Susanne Kramer Christian Janzenand Jose-Juan Lopez-Rubio for valuable discussions andcritical reading of the manuscript and Yanjie Chao andJan Medenbach for much appreciated advice ongenerating ribosome profiling libraries

FUNDING

Young Investigator Program of the Research Center ofInfectious Diseases (ZINF) of the University ofWuerzburg Germany German Research FoundationDFG [SI 16102-1] Human Frontier Science Program(to TNS) French National Research Agency [ANR-2010-GENM-011-01 GENAMIBE to CCH] Fundingfor open access charge German Research Foundation(DFG) and the University of Wuerzburg in the fundingprogramme Open Access Publishing

Conflict of interest statement None declared

REFERENCES

1 MooreMJ (2005) From birth to death the complex lives ofeukaryotic mRNAs Science 309 1514ndash1518

2 NagarajN WisniewskiJR GeigerT CoxJ KircherMKelsoJ PaaboS and MannM (2011) Deep proteome andtranscriptome mapping of a human cancer cell line Mol SystBiol 7 548

3 LuP VogelC WangR YaoX and MarcotteEM (2007)Absolute protein expression profiling estimates the relativecontributions of transcriptional and translational regulationNat Biotechnol 25 117ndash124

4 de GodoyLM OlsenJV CoxJ NielsenML HubnerNCFrohlichF WaltherTC and MannM (2008) Comprehensivemass-spectrometry-based proteome quantification of haploidversus diploid yeast Nature 455 1251ndash1254

5 McNicollF DrummelsmithJ MullerM MadoreE BoilardNOuelletteM and PapadopoulouB (2006) A combinedproteomic and transcriptomic approach to the study ofstage differentiation in Leishmania infantum Proteomics 63567ndash3581

6 LahavT SivamD VolpinH RonenM TsigankovPGreenA HollandN KuzykM BorchersC ZilbersteinDet al (2011) Multiple levels of gene regulation mediatedifferentiation of the intracellular pathogen LeishmaniaFASEB J 25 515ndash525

7 SchwanhausserB BusseD LiN DittmarG SchuchhardtJWolfJ ChenW and SelbachM (2011) Global quantificationof mammalian gene expression control Nature 473 337ndash342

8 FevreEM WissmannBV WelburnSC and LutumbaP (2008)The burden of human African trypanosomiasis PLoS Negl TropDis 2 e333

9 LozanoR NaghaviM ForemanK LimS ShibuyaKAboyansV AbrahamJ AdairT AggarwalR AhnSY et al(2012) Global and regional mortality from 235 causes ofdeath for 20 age groups in 1990 and 2010 a systematic analysisfor the Global Burden of Disease Study 2010 Lancet 3802095ndash2128

10 Martinez-CalvilloS YanS NguyenD FoxM StuartK andMylerPJ (2003) Transcription of Leishmania major Friedlinchromosome 1 initiates in both directions within a single regionMol Cell 11 1291ndash1299

3634 Nucleic Acids Research 2014 Vol 42 No 6

11 Martinez-CalvilloS NguyenD StuartK and MylerPJ (2004)Transcription initiation and termination on Leishmania majorchromosome 3 Eukaryot Cell 3 506ndash517

12 SiegelTN HekstraDR KempLE FigueiredoLMLowellJE FenyoD WangX DewellS and CrossGAM(2009) Four histone variants mark the boundaries of polycistronictranscription units in Trypanosoma brucei Genes Dev 231063ndash1076

13 KolevNG FranklinJB CarmiS ShiH MichaeliS andTschudiC (2010) The transcriptome of the human pathogenTrypanosoma brucei at single-nucleotide resolutionPLoS Pathog 6 e1001090

14 LeBowitzJH SmithHQ RuscheL and BeverleySM (1993)Coupling of poly(A) site selection and trans-splicing inLeishmania Genes Dev 7 996ndash1007

15 UlluE MatthewsKR and TschudiC (1993) Temporal order ofRNA-processing reactions in trypanosomes rapid trans splicingprecedes polyadenylation of newly synthesized tubulin transcriptsMol Cell Biol 13 720ndash725

16 MatthewsKR TschudiC and UlluE (1994) A commonpyrimidine-rich motif governs trans-splicing and polyadenylationof tubulin polycistronic pre-mRNA in trypanosomes Genes Dev8 491ndash501

17 WrightJR SiegelTN and CrossGAM (2010) Histone H3trimethylated at lysine 4 is enriched at probable transcriptionstart sites in Trypanosoma brucei Mol Biochem Parasitol 136434ndash450

18 ClaytonC and ShapiraM (2007) Post-transcriptional regulationof gene expression in trypanosomes and leishmanias MolBiochem Parasitol 156 93ndash101

19 MatthewsKR (2011) Controlling and coordinating developmentin vector-transmitted parasites Science 331 1149ndash1153

20 CrossGA KleinRA and LinsteadDJ (1975) Utilization ofamino acids by Trypanosoma brucei in culture L-threonine as aprecursor for acetate Parasitology 71 311ndash326

21 BrunR and SchonenbergerM (1979) Cultivation and in vitrocloning or procyclic culture forms of Trypanosoma brucei in asemi-defined medium Acta Trop 36 289ndash292

22 FurgerA SchurchN KurathU and RoditiI (1997) Elementsin the 30 untranslated region of procyclin mRNA regulateexpression in insect forms of Trypanosoma brucei bymodulating RNA stability and translation Mol Cell Biol 174372ndash4380

23 HehlA VassellaE BraunR and RoditiI (1994) A conservedstem-loop structure in the 30 untranslated region of procyclinmRNAs regulates expression in Trypanosoma brucei Proc NatlAcad Sci USA 91 370ndash374

24 HotzHR HartmannC HuoberK HugM and ClaytonC(1997) Mechanisms of developmental regulation in Trypanosomabrucei a polypyrimidine tract in the 30-untranslated region of asurface protein mRNA affects RNA abundance and translationNucleic Acids Res 25 3017ndash3026

25 MayhoM FennK CraddyP CrosthwaiteS and MatthewsK(2006) Post-transcriptional control of nuclear-encoded cytochromeoxidase subunits in Trypanosoma brucei evidence for genome-wide conservation of life-cycle stage-specific regulatory elementsNucleic Acids Res 34 5312ndash5324

26 WalradP PaterouA Acosta-SerranoA and MatthewsKR(2009) Differential trypanosome surface coat regulation by aCCCH protein that co-associates with procyclin mRNA cis-elements PLoS Pathog 5 e1000317

27 HelmJR WilsonME and DonelsonJE (2009) Differentialexpression of a protease gene family in African trypanosomesMol Biochem Parasitol 163 8ndash18

28 HornD (2008) Codon usage suggests that translational selectionhas a major impact on protein expression in trypanosomatidsBMC Genomics 9 2

29 ManfulT FaddaA and ClaytonC (2011) The role of the 50-30

exoribonuclease XRNA in transcriptome-wide mRNAdegradation RNA 17 2039ndash2047

30 YoffeY ZuberekJ LewdorowiczM ZeiraZ KeasarCOrr-DahanI Jankowska-AnyszkaM StepinskiJDarzynkiewiczE and ShapiraM (2004) Cap-binding activity ofan eIF4E homolog from Leishmania RNA 10 1764ndash1775

31 YoffeY ZuberekJ LererA LewdorowiczM StepinskiJAltmannM DarzynkiewiczE and ShapiraM (2006) Bindingspecificities and potential roles of isoforms of eukaryotic initiationfactor 4E in Leishmania Eukaryot Cell 5 1969ndash1979

32 De GaudenziJ FraschAC and ClaytonC (2005) RNA-bindingdomain proteins in Kinetoplastids a comparative analysisEukaryot Cell 4 2106ndash2114

33 DhaliaR ReisCR FreireER RochaPO KatzRMunizJR StandartN and de Melo NetoOP (2005)Translation initiation in Leishmania major characterisation ofmultiple eIF4F subunit homologues Mol Biochem Parasitol140 23ndash41

34 ZinovievA and ShapiraM (2012) Evolutionary conservation anddiversification of the translation initiation apparatus intrypanosomatids Comp Funct Genomics 2012 813718

35 SiegelTN HekstraDR WangX DewellS and CrossGAM(2010) Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicingand polyadenylation sites Nucleic Acids Res 38 4946ndash4957

36 NilssonD GunasekeraK ManiJ OsterasM FarinelliLBaerlocherL RoditiI and OchsenreiterT (2010) Spliced leadertrapping reveals widespread alternative splicing patterns in thehighly dynamic transcriptome of Trypanosoma brucei PLoSPathog 6 e1001037

37 SiegelTN GunasekeraK CrossGA and OchsenreiterT(2011) Gene expression in Trypanosoma brucei lessons fromhigh-throughput RNA sequencing Trends Parasitol 27 434ndash441

38 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principles ofits regulation Nat Rev Mol Cell Biol 11 113ndash127

39 KozakM (1984) Selection of initiation sites by eucaryoticribosomes effect of inserting AUG triplets upstream from thecoding sequence for preproinsulin Nucleic Acids Res 123873ndash3893

40 SomersJ PoyryT and WillisAE (2013) A perspective onmammalian upstream open reading frame function Int JBiochem Cell Biol 45 1690ndash1700

41 SiegelTN TanKS and CrossGAM (2005) Systematic studyof sequence motifs for RNA trans splicing in Trypanosomabrucei Mol Cell Biol 25 9586ndash9594

42 AravaY WangY StoreyJD LiuCL BrownPO andHerschlagD (2003) Genome-wide analysis of mRNA translationprofiles in Saccharomyces cerevisiae Proc Natl Acad Sci USA100 3889ndash3894

43 CapewellP MonkS IvensA MacgregorP FennKWalradP BringaudF SmithTK and MatthewsKR (2013)Regulation of trypanosoma brucei total and polysomal mRNAduring development within its mammalian host PLoS One 8e67069

44 BrechtM and ParsonsM (1998) Changes in polysome profilesaccompany trypanosome development Mol Biochem Parasitol97 189ndash198

45 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

46 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

47 OhE BeckerAH SandikciA HuberD ChabaR GlogeFNicholsRJ TypasA GrossCA KramerG et al (2011)Selective ribosome profiling reveals the cotranslational chaperoneaction of trigger factor in vivo Cell 147 1295ndash1308

48 MichelAM and BaranovPV (2013) Ribosome profiling aHi-Def monitor for protein synthesis at the genome-wide scaleWiley Interdiscip Rev RNA 349 4184ndash4188

49 IngoliaNT (2010) Genome-wide translational profiling byribosome footprinting Methods Enzymol 470 119ndash142

50 IngoliaNT BrarGA RouskinS McGeachyAM andWeissmanJS (2012) The ribosome profiling strategy formonitoring translation in vivo by deep sequencing ofribosome-protected mRNA fragments Nat Protoc 7 1534ndash1550

51 LangmeadB and SalzbergSL (2012) Fast gapped-readalignment with Bowtie 2 Nat Methods 9 357ndash359

Nucleic Acids Research 2014 Vol 42 No 6 3635

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 3: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

16 000 g 4C for 10min the supernatant transferred to anew microcentrifuge tube and the OD260 was determinedusing a Nanodrop 2000

Preparation of RNA footprint and mRNA sequencinglibraries

For both BF and PF 200ml aliquots of the lysate(OD260=40) were digested with RNase I (Ambion) atRT (1200 units) or on ice (1600 units) After 1 h the diges-tions were stopped by adding 100 units of SUPERaseInRNase inhibitor (Ambion) to the RNase-treated samplesIn parallel 100 units of SUPERaseIn RNase inhibitorwere added to a 200 ml aliquot of lysate not containingRNase I (undigested control) Monosomes were purifiedusing sucrose gradients as described previously (45)

Total RNA from undigested lysates and the footprintscollected in the monosome fractions were purified usinghot acid PhenolndashChloroformndashIsoamyl alcohol (vvv25241) at 65C (49) To generate mRNA sequencinglibraries undigested RNA was polyA-enriched using aDynabeads mRNA Purification Kit (Ambion) accordingthe manufacturerrsquos instructions The polyA-enrichedRNA was fragmented by incubation with an RNAFragmentation Reagent (Ambion) at 70C for 30minSuccessful fragmentation was monitored on a 15denaturing-PAGE gel Both ribosome footprints andfragmented mRNA (26ndash34 nt) were size-selected by elec-trophoresis using a 15 denaturing-PAGE gel and twocustom-made (IDT) synthetic RNA markers [50-AUGUACACGGAGUCGAGCUCAACCCGCAACGCGA-(Phos)-30] and [50-AUGUACACGGAGUCGACCCAACGCGA-(Phos)-30]

Sequencing libraries were prepared as described (50)except for the omission of an rRNA depletion stepduring the footprint library preparation and the use ofKAPA HiFi polymerase (Kapa Biosystems) for the finalamplification step

Pre-processing and mapping of read

Reads from all libraries were processed and mapped usingthe same parameters Adapter sequence (ie CTGTAGGCACCATCAAT) was trimmed from the 30 end of readsusing cutadapt (httpcodegooglecompcutadapt) andreads shorter than 20 nt after trimming were removedTrimmed reads were mapped to the reference genomeusing bowtie-2 with default lsquolocal-sensitiversquo mode (51)Genome as well as gene annotations of strain 927version 42 were downloaded from EuPathDB (52) andused as the reference in all analyses

Calculation of abundance translational efficiency and read50-end periodicity

A read is considered to map to a region when its midpointfalls within the annotated range Abundance of a region ina library was defined as reads mapped per kilobase permillion non-structural RNA reads (rpkm) Non-structuralRNA refers to the reads mapped to the genome excludingannotated rRNA and tRNA regions Translational effi-ciency of a region was defined as its ratio of abundancein the ribosome profiling library to that of the control

RNA library of the same life cycle stage To investigatethe distinct 3-nt periodicity of the ribosome profilinglibraries 50 ends of the mapped reads were piled up andthe piled-up coverage of each position from each gene(using either start or stop codon as the reference point)were summed up in meta-gene analyses Genes with lessthan 10 reads within the plotted region were excluded

Definition of upstream ORFs and novel CDSs

A uORF was defined as any ORF 9 nt that was locatedwithin an annotated 50 UTR and was on the same strandas the annotated ORF Novel CDSs are defined as anytranslated ORF 30 nt located at least 20 nt away fromany existing annotations In cases in which multipleoverlapping novel ORFs were found on different framesof the same strand only the longest were retained AnORF is defined as translated when 70 of its CDS isbeing covered by the pooled ribosome profiling reads (iefour libraries) with 2 reads per nucleotide

Calculation of ribosome release score

Ribosome release scores (RRSs) of all ORFs includingannotated CDSs putative CDSs and uORFs werecalculated as described with slight modifications (53)Briefly a pseudo 30 UTR region (pUTR3) was firstassigned to all ORFs pUTR3 of an ORF was defined asthe region between its stop codon and a downstream startcodon in any of the three frames on the same strandReads mapped within the CDSs and pUTR3 perkilobase (ie CDSrpk and pUTR3rpk) in both ribosomeprofiling and RNA-sequencing control libraries werecalculated for each ORF The ratio of CDSrpk topUTR3rpk in both ribosome profiling and RNA-sequencing control libraries (ie riboratio and RNAratio)were calculated RRS of each ORF was then defined asthe ratio of riboratio to RNAratio

Identification of novel putative CDSs important forparasitersquos fitness

We reanalysed the data of a published genome-wide RNAinterference fitnessndashcosts association study (54) with ournewly defined CDSs Briefly short read data were down-loaded from European Nucleotide Archive under acces-sion number ERP000431 Reads were then mapped tothe genome using bowtie2 with default lsquolocal-sensitiversquomode The number of reads mapping to the annotatedand putative CDSs was counted To identify thoseputative CDSs that have significantly less reads mappedin the RNA interference libraries which potentially con-tribute to the parasitersquos fitness we performed differentialread count analyses on the pair-wise library comparisons(listed below) using three softwares including DESeq1(55) DESeq2 (httpwwwbioconductororgpackagesreleasebiochtmlDESeq2html) and EdgeR (56) withdefault settings CDSs that were identified to have signifi-cantly less reads in the RNA interference libraries by allthree algorithms (Plt 005 in each case) linear fold change 5 were considered to be potentially related to the loss offitness The pair-wise library comparisons include BFuninduced versus BF induced for 6 days BF uninduced

Nucleic Acids Research 2014 Vol 42 No 6 3625

versus differentiated cells induced and BF uninducedversus PF induced

Analysis of proteomics data

All mass spectrometric raw data files from Butter et al(57) were processed as described except that the data weresearched additionally with the previously un-annotatedCDSs listed in Supplementary Tables S4 and S5 BrieflyMaxQuant 1412 including the Andromeda search engine(5859) was used for processing raw data and databasesearching The experimental design template used was asdescribed before (57) The search was performed against acombination of three databases a T brucei database con-taining 19 103 annotated protein entries for strains 427and 927 (version 50 Tb427 and Tb927 downloadedfrom httptritrypdborg) a list of uORFs and putativeCDSs identified in this study (Supplementary Tables S4and S5) and a database containing typical contaminantsEnzyme search specificity was TrypsinP with threemiscleavages for tryptic digests and LysC with twomiscleavages for digests with lysyl endopeptidaseCarbamidomethylation on cysteines was set as fixed modi-fication methionine oxidation and protein N-terminalacetylation were considered as variable modificationsMass accuracy tolerances (after recalibration) were6 ppm for precursor ions and 05Da for CID MSMSand 20 ppm for HCD MSMS spectra respectivelyFalse discovery rate was fixed at 1 at peptide andprotein level with posterior error probability (PEP) assorting criterion Identifications were matched betweenruns within a 2-min window Protein groups identified inthe potential CDS database had at least one uniquepeptide and a PEPlt 00005

RESULTS

Ribosome footprints reveal ribosome position at sub-codonresolution

To obtain genome-wide information on protein synthesiswe adapted a ribosome profiling protocol published inyeast and mice (4546) for use in T brucei We sequencedribosome footprints and fragmented mRNA from the twotrypanosome stages that have been adapted to in vitroculture the PF of the tsetse fly midgut and theproliferating BF that lives in the blood of the mammalianhost Cultured cells were treated with cycloheximide toarrest translating ribosomes and harvested Becausecycloheximide has been shown to stabilize RNA tran-scripts in T brucei (60) the treatment was limited to2min and applied to both cells used for ribosomeprofililng and cells used for RNA-sequencing analysesTo maximize the accuracy of the ribosome footprintswe optimized the digestion efficiency by testing differentnuclease concentrations and reaction temperatures(lsquoMaterials and Methodsrsquo section and SupplementaryFigure S1) Next to map translated regions of RNA andto exclude RNase-protected regions that are scanned bythe 43S pre-initiation complex we used sucrose gradientsto specifically enrich for monosomes (SupplementaryFigure S1) The high reproducibility between libraries

prepared under different conditions demonstrated the ro-bustness of the ribosome profiling approach despite theelaborate library preparation protocol (SupplementaryFigure S2 BF digestion at 4C versus digestion at roomtemperature R2=09755)

To ensure that nuclease-resistant RNA fragments rep-resent ribosome footprints and not short RNA fragmentsprotected by other RNA-binding proteins we comparedthe genome-wide distribution of sequence reads obtainedfrom ribosome profiling and RNA-sequencing librariesAlignment of ribosome profiling reads revealed that thenuclease-protected reads predominantly mapped alongCDSs with very few reads aligning to intergenic regions(Figure 1A) The 50 nucleotide of the ribosome profilingreads aligned from 12 nt upstream of the ATG codon to18 nt upstream of the stop codon (Figure 1B) Given afootprint length of 28ndash30 nt the ribosome protects mRNAfrom 12 nt upstream of the start codon to 9ndash11 nt down-stream of the stop codon In contrast to ribosome foot-print reads RNA-sequencing reads were aligned along theCDSs and UTRs as expected (Figure 1A) In additionto the enrichment across CDSs the alignment ofribosome profiling reads revealed a distinct 3-nt period-icity with 68 of BF ribosome profiling reads starting atthe first nucleotide of a codon (Figure 1B and C andSupplementary Figure S3) No such 3-nt periodicity wasobserved for the mRNA sequence reads

Only two T brucei genes have introns one encoding apoly(A) polymerase and the other encoding a DNARNAhelicase (6162) Surprisingly there is almost no decreasein sequence reads detectable by RNA-sequencing acrossthe introns while the ribosome profiling data clearly showthat the intronic RNA is not translated (Figure 2A) Thesedata indicate that the process of cis-splicing must be un-usually inefficient in trypanosomes but that a tightcontrol mechanism ensures that only mature mRNAsenter the translational machinery

The strong enrichment of ribosome profiling readsacross CDSs the lack of ribosome profiling reads acrossintrons and the observed 3-nt periodicity are characteristicof ribosome-protected RNA footprints and argue againstunspecific RNA protection (45) Thus ribosome profilingreads indeed represent ribosome footprints and providea means to measure protein synthesis and in additionto identify both previously unannotated CDSs as well asnon-translated transcripts previously annotated as coding(for example Figure 1A Tb927102360) The latter maybe either incorrectly annotated or not translated in thetwo life cycle stages examined

Identification of translation initiation sites

Genome-wide mapping of spliced leader acceptor sites(SASs) ie the 50 end of the 50 UTR to which thespliced leader RNA is trans-spliced revealed hundredsof instances in which SASs mapped within annotatedCDSs or far upstream of annotated CDSs (133536)These findings indicated that for many genes the trueCDS was shorter than the annotated CDS while forothers it allowed the possibility of translation initiationat an AUG upstream of the annotated translation

3626 Nucleic Acids Research 2014 Vol 42 No 6

initiation site However while RNA-sequencing allowedthe unambiguous identification of mis-annotation formany genes translation initiation does not always occurat the first AUG downstream of the SAS Thus the use-fulness of RNA-sequencing to identify the true translationinitiation sites is limited and depends on the genomiccontext

In contrast to RNA-sequencing ribosome profilingallows the determination of the translational landscapeThus it allows the identification of CDS mis-annotationsand the determination of the most probable translationinitiation sites Exemplary for numerous genes forTb92751990 we detected no translation between thefirst and the second AUG (green bars) indicating thatthis region is not translated in the BF and PF and thattranslation initiates at the second AUG (Figure 2Bleft panel) For Tb927102720 we observed strong trans-lation upstream of the annotated CDS The fact thattranslation starts at an upstream AUG which is inframe with the annotated translation initiation sitesuggests that the CDS of Tb927102720 is longer thanannotated (Figure 2B right panel) These examplesshow that ribosome profiling data can be used for there-annotation of incorrectly annotated CDSs

Ribosome profiling reveals extensive translational control

While the role of RNA stability has been investigated on agenome-wide scale in T brucei (29) global regulation atthe levels of protein translation has not been analysedTherefore we decided to apply the ribosome profilingapproach to evaluate the degree of regulation occurringat the level of translation in trypanosomesTo estimate the rate of protein synthesis in trypano-

somes we determined the ribosome footprint density

AUG

28 nt - ribosome footprint

CDS position (nt from start)

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

P s

iteA

site

UAA

CDS position (nt from stop)

P s

iteA

site

ribosome footprint

frame

perc

enta

ge o

f rea

ds

0

20

0

40

60

80

1 2 0 1 2

mRNA

-24 -18 -12 -6 0 6 12 18 24 -24 -18 -12 -6 0 6 12

foot

prin

t (R

PM

)m

RN

A (

RP

M)

mR

NA

(R

PM

)

C

B

A

0

2

4

0

2

4

6

0

2

4mRNA density in PF

ribosome footprint density in PF

mRNA denstiy in BF

foot

prin

t (R

PM

)

0

0

02

04

1

2

3

600000

Tb927102320

Tb927102330

Tb927102340

Tb927102350

Tb927102370

Tb927102360

Chr X 602000 604000 606000 608000

ribosome footprint density in BF

Figure 1 Ribosome footprints reveal coding sequences at sub-codonresolution (A) Ribosome footprints are strongly enriched acrossCDSs mRNA densities and ribosome densities are shown as readsper nucleotide per million reads (RPM) to normalize for differencesin library size (B) Alignment of the 50 nucleotide from ribosome foot-print reads that map close to translation start or translation termin-ation sites Blue boxes mark the approximate size of the ribosomefootprint (C) Percentage of position of sequence reads relative toreading frame

00

10

20

813000Chr III Chr VIII

814000

815000

816000

00010203

0

2

4

490000

492000

494000

00

10

20

0

2

4

0

2

4

00

10

20

615500Chr V Chr X

616000

616500

617000

699000

700000

701000

00

02

04

mR

NA

(R

PM

)

mR

NA

(R

PM

)

foot

prin

t (R

PM

)

foot

prin

t (R

PM

)

mR

NA

(R

PM

)

mR

NA

(R

PM

)

foot

prin

t (R

PM

)

foot

prin

t (R

PM

)

Tb92751990 Tb927102720

Tb92733160 Tb92781510

B

A

Figure 2 Ribosome footprints reveal translated regions (A) Ribosomeprofiles for the two intron-containing genes Introns are represented asa dashed line (B) Ribosome profiles of two possibly mis-annotatedCDSs Black bars mark annotated CDSs Grey bars mark CDSs pre-dicted based on ribosome profiles Green boxes represent ATG codonsand red boxes represent stop codons

Nucleic Acids Research 2014 Vol 42 No 6 3627

across CDSs [measured as reads per million reads per kb(rpkm) see lsquoMaterials andMethodsrsquo section] Of 77 millionPF and 44 million BF ribosome footprint sequence reads28 (PF) and 14 (BF) could be aligned to annotatedCDSs while 70 (PF) and 85 (BF) mapped to knownstructural non-coding RNA (ncRNA) The high percent-age of structural ncRNA in the footprint fraction has beenreported previously (45) and results from the large amountof rRNA in the monosome fraction The lower percentageof CDS reads in the BF sample compared with the PFsample may be due to libraries being generated from thewild-type 427 strain while reads were aligned to the better-annotated genome of the TREU 927 strain In the BFsome of the most highly expressed genes encoding forvariant surface glycoproteins differ in sequence betweenthe 427 and 927 strains and footprints from these geneswere not aligned in this analysisNevertheless for 8072 genes (82 of annotated CDSs)

we were able to detect 10 ribosome footprint reads athreshold that we defined as translation (SupplementaryTable S1) We found the rate of translation to differgreatly among proteins In the PF we observed a11 448-fold difference in ribosome density (3548-fold dif-ference in mRNA levels) between the 1 most highly andthe 1 most weakly translated proteins while in the BFwe observed a 1623-fold difference in ribosome densityand a 262-fold difference in mRNA levels (Figure 3A)These findings suggest translational efficiency to signifi-cantly contribute to the regulation of gene expressionTo determine the translational efficiency for individual

genes we normalized for differences in mRNA abundanceie we divided the ribosome footprint density by the levelsof mRNA abundance Our ribosome profiling dataindicate a relatively unbiased distribution of ribosomedensity across CDSs in the BF and a very minor increasein density towards the 50 end of CDSs in the PF(Supplementary Figure S3) Nevertheless wheneverdetermining the translational efficiency we excludedmRNA and footprint reads that mapped to the first 40 ntof a CDS to avoid possible artifacts from ribosome stallingat translation initiation sites Between the 1 most effi-ciently and the 1 least efficiently translated proteins weobserved in the PF a 117-fold and in the BF a 64-fold rangein translational efficiency ie in the amount of proteinsproduced per transcript (Figure 3B and SupplementaryTable S2) This range in translational efficiency is similarto what has been observed in yeast and roughly 10-foldhigher than what has been described for mice (4546)Interestingly we observed no correlation between RNAabundance and translational efficiency (R2=003) sug-gesting that translational efficiency is regulated independ-ently of RNA stability

Life cycle-specific regulation of translational efficiency

Next we addressed the question if translational efficiencyis regulated in a life cycle-specific manner Ribosomedensity measurements only allow meaningful comparisonsof the rate of protein synthesis when the speed of transla-tion is assumed to remain constant This assumption issupported by measurements in mouse embryonic stem

cells (4664) however the speed of translation may verywell not be identical in two life cycle stages that live atdifferent temperatures In addition our current protocoldoes not allow the absolute quantification of mRNA and

-4 -2 0 2 4 6 8 10 12 140

1000

2000

3000

0

500

1000

1500

-80

-60

-40

-20 0

02

04

0

-6 -3 0 3 6 9 12 150

1000

2000

3000

0

1

2

1459000Chr IX

1461000

1463000

1459000Chr IX

1461000

1463000

0

2

4

0

1

2

0

2

4

translational efficiency (log2)

PF rank in translational efficiencyBF

ran

k in

tran

slat

iona

l effi

cien

cy

log2 rpkm log2 rpkm

num

ber

of g

enes

num

ber

of g

enes

0

500

1000

1500

num

ber

of g

enes

num

ber

of g

enesmRNA

footprintsmRNAfootprints

BF

BF PF

PF

BF mRNA density PF mRNA density

BF footprint density PF footprint density

high

all genes N = 7782

PUF (Pumillio) N = 10

cytochrome oxidases N = 8

alternative oxidase

enzymes requiredfor glycolysis N = 10

A

B

D

C

mR

NA

(R

PM

)fo

otpr

int (

RP

M)

mR

NA

(R

PM

)fo

otpr

int (

RP

M)

Tb092111070 Tb092111070

low

low

-80

-60

-40

-20 0

02

04

0

translational efficiency (log2)

Figure 3 Translational efficiency is regulated (A) Histograms ofmRNA abundance and ribosome density (rate of protein synthesis)for BF (left panel) and PF parasites (right panel) (B) Histogram oftranslational efficiency (ratio of ribosome footprint density to mRNAabundance) for BF (left panel) and PF parasites (right panel) (C) Pair-wise comparisons of translational efficiency in PF and BF CDSs wereranked based on translational efficiency (1=highest translational effi-ciency 7782= lowest translational efficiency) in BF and PF Ranksbetween life cycle stages show a correlation of R=07428 Genefamilies with developmentally regulated translational efficiencies arecolour coded Pumillio genes (red) cytochromes oxidase (blue) genesrequired for glycolysis (63) (green) and the alternative oxidase(Tb927107090 black) (D) Ribosome footprint profile of a gene withdevelopmentally regulated translational efficiency Green arrow indi-cates direction of transcription

3628 Nucleic Acids Research 2014 Vol 42 No 6

ribosome footprint levels in the two different life cyclestages Therefore we decided to compare the overallchanges in translational efficiency within each life cyclestage To this end we ranked the genes in each life cyclestage based on their relative translational efficiency andcompared their rank between the two life cycle stagesA pair-wise comparison indicated a generally positive cor-relation between translational efficiency in the two stages(Figure 3C Pearsonrsquos correlation coefficient=07428Plt 00001 95 CI 07327ndash07526) but it also revealeddistinct life cycle-specific differences for subsets of genesFor example 58 genes found in the lowest 25 in terms oftranslational efficiency in the PF were among the top 25most efficiently translated proteins in the BF Thesecontain various RNA binding proteins numerous phos-phatases and expression site-associated genes (Table 1 andSupplementary Table S3) For an example of a genemore efficiently translated in the BF than in the PF seeFigure 3D

Based on mRNA and protein level measurements differ-ential translational efficiency has been predicted foraconitase the cytochrome oxidases V and VI and procyclins(222565) For all of these genes our data indicate anincreased translational efficiency in the PF (Table 1) thusconfirming previous observations and validating our meas-urements For selected groups of proteins eg annotated ascytochrome oxidase or those required for glycolysis wenoticed the translational efficiency to correspond to the

changes that have been described for the glucosemetabolism(Figure 3C) Whereas the BF is entirely dependent on theglycolytic pathway to generate energy the PF living in theinsect midgut where glucose availability is uncommon orabsent has been shown to contain a much more elaborateenergy metabolism (66) In contrast the efficient translationof the group of PUF (Pumilio and FBF) proteins in PF wasunexpected (Figure 3C) PUF proteins regulate translationand mRNA stability by binding sequences in their targetRNAs and have been shown to affect gene expression inorganisms as divergent as Plasmodium falciparumT brucei humans and yeast (67ndash69) Interestingly wefound both poly(A) binding proteins all four isoforms ofeIF4Es and four of the five eIF4G isoforms all proteinsinvolved in control of translation initiation to be more effi-ciently translated in the PF than in the BF (SupplementaryTable S2)Given the large dynamic range and the life cycle-specific

regulation we propose that differences in translationalefficiency contribute substantially to the control of geneexpression in T brucei

uORFs are frequently translated and may regulatetranslational efficiency

The broad 100-fold range in translational efficiencyamong genes and the differences in translationalefficiencies between life cycle stages raise the question of

Table 1 Developmentally regulated translation

Gene ID Description BF rank PF rank Change inrank (PFndashBF)

Translation up-regulated in BFTb92773250 Expression site-associated gene 6 (ESAG6) protein putative 1372 7405 6033Tb92743980 Chaperone protein DNAj putative 1301 7248 5947Tb927104780 GPI inositol deacylase precursor (GPIdeAc) 390 5994 5604Tb92788140 Small GTP-binding rab protein putative 1217 6804 5587Tb92763480 RNA-binding protein putative (DRBD5) 1856 7262 5406Tb11014701 Membrane-bound acid phosphatase 1 precursor (MBAP1) 689 6065 5376Tb92714650 Cyclin-like F-box protein (CFB2) 10 5289 5279Tb92735660 UDP-Gal or UDP-GlcNAc-dependent glycosyltransferase putative 797 6041 5244Tb92726000 Glycosylphosphatidylinositol-specific phospholipase C (GPI-PLC) 1634 6793 5159Tb92745310 Serinethreonine-protein kinase a putative 1656 6776 5120

Translation up-regulated in PFTb9275440 trans-Sialidase putative 6785 418 6367Tb92776850 trans-Sialidase (TbTS) 7513 1445 6068Tb92777470 Receptor-type adenylate cyclase GRESAG 4 putative 7001 1268 5733Tb92712120 Calpain putative 6258 739 5519Tb9274360 12-Dihydroxy-3-keto-5-methylthiopentene dioxygenase putative 6587 1089 5498Tb091605550 Calpain-like cysteine peptidase putative 6936 1868 5068Tb92787690 Amino acid transporter (pseudogene) putative 6693 1731 4962Tb92781610 MSP-B putative 7199 2272 4927Tb11016650 Serinethreonine-protein kinase putative 5280 363 4917Tb92777110 Leucine-rich repeat protein (LRRP) putative 6501 1601 4900

Genes for which translational up-regulation in PF was previously shownTb92710280 Cytochrome oxidase subunit VI (COXVI) 2820 810 2010Tb091601820 Cytochrome oxidase subunit V (COXV) 935 291 644Tb9271014000 Aconitase (ACO) 657 31 626Tb9276510 GPEET2 procyclin precursor 3848 365 3483Tb9271010260 EP1 procyclin (EP1) 4682 34 4648Tb9271010220 Procyclin-associated gene 2 (PAG2) protein (PAG2) 6903 5375 1528Tb9275330 Receptor-type adenylate cyclase GRESAG 4 putative 3779 809 2970

List of genes with highest change in translational efficiency between PF and BF List does not include hypothetical genes

Nucleic Acids Research 2014 Vol 42 No 6 3629

how protein translation is regulated For a small numberof genes in T brucei sequence motifs in the 30 UTR havebeen linked to developmentally regulated changes in trans-lational efficiency but for the large majority of genes nosuch motifs have been identified (70) Numerous reportsfrom other eukaryotes demonstrated that uORFs canfunction as important regulators of gene expression (71)For example analysis of mRNA and protein levels fromgt10 000 mammalian genes revealed that the presence ofan uORF correlates with a significant reduction of expres-sion from the downstream CDS Furthermore usingreporter constructs it was estimated that the presence ofan uORF results in a decrease in gene expression of 30ndash80 with only a minor reduction of mRNA levels (72)No such analyses have been performed for T brucei but ithas been shown that the removal of a uATG leads to a 7-fold increase in protein levels of a luciferase reporter (41)In this manuscript we use the term ORF or uORF to referto regions of DNA sequences beginning with an ATG andending with a termination codon that may or may notencode for proteins The term coding sequence (CDS) isused to refer to ORFs that encode proteins MinimallyuORFs consist of 9 nt containing an upstream start codon(uATG) an additional sense codon and a terminationcodon whereby the termination codon may be locateddownstream of the start codon of the main CDS (73)Even though 50 UTRs have only been assigned for 4909

genes (927 version 42) applying the criteria listed abovewe identified 8310 uORFs and found 1092 (22) of 50

UTRs to contain at least one uORF (SupplementaryTable S4) Thus uORF are much less common inT brucei than in mammals where 40ndash50 of genescontain uORFs (7475) Nevertheless while uORFscannot explain the translational regulation for all tran-scripts they may represent one of several factors import-ant for the regulation of gene expression in T bruceiTo determine the degree to what uORFs are translated

a prerequisite for affecting translational efficiency of thedownstream CDS we calculated the ribosome densityacross uORFs and identified 1834 uORFs (22) with 2 read-coverage and 70 of the ORF coveredThe average length of these uORFs was 22 aa Just likefor annotated CDSs the alignment of ribosome profilingreads across these uORFs revealed a distinct 3-nt period-icity with 63 of BF ribosome profiling reads starting atthe first nucleotide of a codon (Figure 4A and B) Whilethe periodicity of ribosome footprints across uORFs wasslightly less pronounced than for annotated CDS ourdata suggest that a large proportion of uORFs istranslatedTo evaluate the regulatory potential of uORFs we

compared the ribosome densities between transcripts con-taining at least one uORF (N=1092) and those withoutuORFs (N=3817) In the BF median ribosome densityacross the CDSs of transcripts with uORFs was3031 rpkm compared with 4486 rpkm for genes withoutuORFs (Table 2) In the PF median ribosome densitieswere 2001 rpkm (genes with uORF) and 3650 rpkm(genes without uORF) Thus in both life cycle stages weobserved a higher ribosome density for genes withoutuORFs than for genes with uORFs (Plt 00001) While

we also observed a higher mRNA level for geneswithout uORFs compared with genes with uORFs(Plt 00001) the differences in mRNA levels were lowerthan the differences in ribosome density (Table 2) Thusour data indicate that median translational efficiencyis higher for genes without uORF than for genes withuORF suggesting that the presence of uORFs may nega-tively impact translation of downstream CDSs

Noteworthy while median ribosome density for CDSswas higher in the BF than in the PF (Table 2) we observedthe opposite for 50 UTRs (Plt 00001) For 50 UTRs wefound a higher ribosome density in the PF (median rpkm773) than in the BF (median rpkm 289 SupplementaryTable S4 for an example see Figure 4C) In addition wefound that in the BF 50 UTRs with uORF (408 rpkm)have a higher ribosome density than 50 UTRs withoutuORFs (254 rpkm Plt 00001) Unexpectedly thecontrary was true in PF In the PF we found 50 UTRswithout uORF (834 rpkm) to have a higher ribosomedensity than 50 UTRs with uORF (501 rpkmPlt 00001) For an example of a non-AUG uORF seeFigure 4C right panel Many of such non-AUG uORFshave been observed in yeast and mice embryonic stem cells(mESCs) Interestingly an increase in 50 UTR translationsimilar to what we observed in the PF compared with theBF was observed in yeast on starvation and a decreasein 50 UTR translation was observed on differentiation ofpluripotent mESCs into embryoid bodies (4546)

Genome-wide analyses to prove or disprove a directcorrelation between translation of uORFs and

00

04

08

411000Chr I

411500 412000

00

02

04

00

04

08

Chr I420000 420200 420400

0

2

4

foot

prin

t (R

PM

)fo

otpr

int (

RP

M)

BF BF

PF PF

Tb92711610 Tb92711670

-24 -18 -12 -6 0 6 12

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

uORF

uORF

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

A B

C

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

-24 -18 -12 -6 0 6

Figure 4 Ribosome footprints reveal translation of uORFs(A) Alignment of the 50 nucleotides from ribosome footprint readsthat map close to translation start or translation termination sites ofuORFs (B) Percentage of position of sequence reads relative to readingframe (C) Ribosome profiles of two genes with uORF (left panel) andwithout uORF (right panel) Narrow grey boxes represent 50 UTRsgreen box represents AUG-codon red box represents terminationcodon

3630 Nucleic Acids Research 2014 Vol 42 No 6

translational efficiency is complicated by the existence ofmultiple 50 UTR isoforms resulting from widespreaddifferential trans-splicing (353676) At the same timethe heterogeneity in UTR lengths raises the intriguingpossibility that inclusion or exclusion of an uORFduring alternative trans-splicing may represent a regula-tory mechanism to modulate translational efficiency

Ribosome footprints reveal the presence of hundreds ofpreviously un-annotated CDSs

Even though small proteins (lt200 aa) have been shown toplay major roles in plant and animal development (7778)the complexity of the short proteome remains largelyunexplored because algorithms to reliably predict shortCDSs are lacking (79) We observed that while almostall ribosome footprints aligned to annotated features inthe BF 101 197 reads (023) and in the PF 1 116 030reads (144) did not It is noteworthy that just as weobserved for 50 UTRs the percentage of reads not aligningto CDSs is higher in the PF than in the BF We suspectthat many of these reads originate from un-annotatedUTRs but some may stem from previously un-annotatedCDSs A previous RNA-sequencing-based analysisdetected 1114 new un-annotated transcripts in the PF1011 of which had the potential to encode one or morepeptides ( 25 aa) Using available proteomics data theauthors were able to confirm translation for 19 of the 1114transcripts (13) Given our ability to determine ribosomepositions at nucleotide resolution we looked for new pre-viously un-annotated CDSs

We mapped ribosome footprint reads from the BF andPF and determined the ribosome footprint density for allpotential CDSs at least 10 aa in length To avoid overlapwith annotated CDSs we only considered ORFs locatedat least 20 nt away from annotated features Thisapproach enabled us to identify a set of 2021 candidateCDSs with 2 read-coverage and 70 of the ORFcovered The candidate CDSs ranged in length from 10to 378 aa (average 30 aa Supplementary Table S5) For797 of the 1114 previously identified transcripts we foundthe average ribosome footprint read-coverage to be 2(Supplementary Table S6 Previously identified transcriptsfrom chromosome 10 were not considered because oursequencing reads were mapped to a newer version of the

chromosome 10 assembly and thus genomic coordinatesmight be incompatible)To determine whether our set of candidate CDSs is

translated we analysed published proteomics datasets(57) to search for protein products For 24 of the 2021candidate CDSs (average size 117 aa 13 kDa) proteinproducts could be identified however four of theseidentified peptides also matched annotated CDSs Inaddition we identified protein products for 31 uORFs(Supplementary Table S4) We suspect that the lownumber of identified protein products relative to thelarge number of candidate CDSs may be caused (i) bythe experimental set-up of the original proteomicsanalysis where proteins were separated by 1D-SDS-PAGE prior to in-gel protease digest and mass spectro-metric analysis whereby proteins with a size smaller than5ndash10 kDa typically run out of the gel (ii) by the limitedsensitivity of mass spectrometric analyses compared withDNA sequencing-based techniques and (iii) by the factthat not all ribosome-protected RNAs are translatedThe first point is supported by the fact that only 67

(3) of our candidate CDSs were 90 aa (10 kDa) butfor 28 of those large candidate CDSs we could identifypeptides Evidence supporting the possibility that not allribosome-protected RNAs are translated exists in micewhere even known long ncRNAs have been found to beassociated with ribosomes (46) To evaluate the codingpotential of a transcript and to separate long ncRNAfrom small coding genes a new metric was establishedthe so-called RRS (53) The RRS takes advantage of thefact that translating ribosomes are released on encounter-ing a stop codon This release results in a sharp decrease inribosome occupancy between protein-coding regions andthe subsequent 30 UTR Consequently the RRS wasdefined as the ratio of footprint reads in the putativeCDS to footprint reads in the corresponding 30 UTRdivided by the ratio of RNA reads in the putative CDSto RNA reads in the corresponding 30 UTR (seelsquoMaterials and Methodsrsquo section)Similarly to annotated CDSs and uORFs our averaged

ribosome footprint data indicated a well-defined 3-nt peri-odicity for sequence reads occurring near translationinitiation sites However we observed a less well-defineddrop in ribosome densities across translation terminationsites (Figure 5A and B) which may indicate a lack of

Table 2 Translational efficiency of genes with and without uORF

Transcripts withuORF (N=3817)

Transcripts withoutuORF (N=1092)

MannndashWhitney test

Bloodstream formRibosome density (median rpkm) 3031 4466 Plt 00001mRNA levels (median rpkm) 3083 3613 Plt 00001Translational efficiency (ribosome densitymRNA levels) 100 127 Plt 00001

Procyclic formRibosome density (median rpkm) 2001 3650 Plt 00001mRNA levels (median rpkm) 2404 3038 Plt 00001Translational efficiency (ribosome densitymRNA levels) 087 126 Plt 00001

Reads mapping within the first 40 nt of a CDS were not considered for the measurements of translational efficiency Number of genes with annotated50 UTR N=4909

Nucleic Acids Research 2014 Vol 42 No 6 3631

productive translation Because we observed an abruptdrop in footprint density downstream of the stop codonof annotated genes in T brucei (Figure 1A and B) wedecided to apply the same parameters to evaluate the like-lihood that ORFs with high footprint density are indeedbeing translated into functional proteins Again candidateCDSs were defined as ORFs with a minimum length of10 aa and an average ribosome footprint coverage of 2over at least 70 of the ORF To define the correspond-ing putative 30 UTR we applied the same parameters usedin mice (53) ie the 30 UTR was defined as the regionbeginning immediately downstream of the ORF andending at the first subsequent start codon (in anyreading frame) A limitation of the RRS is that it canonly be calculated for an ORF if at least one entire foot-print read and one RNA read map to the putative CDSand the corresponding 30 UTR Because many 30 UTRswere too short to fulfill this criterion or simply did notcontain any footprint reads a sign for efficient translationtermination we were able to determine the RRS for only

1445 of genes annotated in T brucei (SupplementaryTable S7) For these 1445 genes the median RRS was839 compared with an RRS of 019 for the subset ofgenes annotated as hypothetical unlikely (N=90) Thussimilar to the observation in mice the RRS appears to bea good predictor of productive translation Next wedetermined the RRS for the previously un-annotated can-didate CDSs that fulfilled the criteria for RRS calculation(N=265) The median RRS of these candidate CDSs waswith 399 lower than the median RRS of the annotatedgenes (839) Thirteen candidate CDSs had an RRS largerthan 839 and 98 candidate CDSs had an RRS higher thanthe well-described intron-containing poly(A) polymerase(RRS=963) The RRS could only be calculated forone of the candidate CDSs for which we had identifiedprotein products (RRS=5744)

Finally in an attempt to learn about the function of theputative CDSs we searched for known protein motifsin those candidate CDSs for which we had identifiedprotein products and for those with an RRS above 10However except for one candidate CDS located withinthe histone H2B gene array and which contained anH2B signature motif (Supplementary Table S5) nomotifs were identified

Taken together proteomics data and a well-definedribosome release (high RRS) suggest extensive translationbeyond the annotated CDSs in T brucei

Newly identified CDSs are important for parasite fitness

The combination of ribosome profiling data with previ-ously published mass spectrometric data allowed us toidentify a large number of new CDSs Nonetheless the bio-logical significance of theseCDSs remains to be determinedIn addition while ORFs with a high ribosome footprintdensity a low RRS and without mass spectrometricevidence may not be translated into functional proteinsthey may nevertheless play important regulatory roles

Therefore we decided to evaluate the importance of ourcandidate CDSs independent of RRS and mass spectro-metric evidence in parasite survival A previously pub-lished high-throughput phenotyping approach termedRIT-seq measured the fitnessndashcost associations usingRNA interference (RNAi) The RIT-seq data revealed asignificant loss in fitness on RNAi induction of 2724 CDSsin the BF 1972 CDSs in the PF and 2677 CDSs in inducedand differentiated parasites (54) The approach is based onthe integration of RNAi libraries into trypanosomes and acomparison of the recovery of RNAi targets from tryp-anosome populations before and after RNAi inductionThe RNAi library was generated using genomic DNAbut the effect on parasite fitness was only determined forannotated genes Re-analysing the same datasets we finda significant loss of fitness on RNAi induction for 214putative CDSs in the BF (6 days after RNAi induction)16 CDSs in the PF and 227 CDSs on differentiation(Supplementary Table S8) For examples of candidateCDSs potentially essential for viability in the BF seeFigure 5C It is important to note that the RNAi libraryused for the RIT-seq experiments contains a median insertsize of 600 bp (average 1 kb) thus somewhat limiting the

01234

0

4

8

0005101520

02468

05

101520

02468

Chr III

622400

622000

622800

0

4

8

Chr X

2577800

2578200

01234

mR

NA

fo

otpr

int

RN

Ai r

eads

R

NA

i rea

ds

A

C

B

putative CDS putative CDS

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

-24 -18 -12 -6 0 6 12 -24 -18 -12 -6 0 6

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

putative CDS

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

(RPM)

Figure 5 Ribosome footprints reveal previously un-annotated CDSs(A) Alignment of the 50 nucleotides from ribosome footprint reads thatmap close to translation start or translation termination sites of candi-date CDSs (B) Percentage of position of sequence reads relative toreading frame (C) Ribosome mRNA and RIT-seq (RNAi targetsequencing) profiles of two previously un-annotated putative CDSs

3632 Nucleic Acids Research 2014 Vol 42 No 6

resolution of the RIT-seq data (8081) Nevertheless thesefindings suggest an important biological role for a subsetof genes that have been missed in previous genome anno-tations Intriguingly the RNAi data reveal large develop-mental differences in the fitness costs associated with thedown-regulation of putative CDSs suggesting that the setof candidate CDSs may be more important during themammalian stage

DISCUSSION

In this study we report the first genome-wide analysis ofprotein synthesis and a strand-specific analysis of RNAtranscript levels for T brucei It is the first such analysisfor a eukaryotic pathogen and an organism without tran-scriptional control Our data reveal large differences intranslational efficiency among transcripts in the same lifecycle stage and between the same transcript in differentstages In addition the sequencing of ribosome-protectedRNA enabled us to identify transcripts that are likely tobe translated Hundreds of putative previously unidenti-fied CDSs appear to be essential for parasite fitness andthe analysis of available proteomics data confirmed theexistence of at least 20 of these previously unknown CDSs

Eukaryotic gene expression is regulated at multiplelevels with translation of mRNA into proteins beingone of the most important (17) The ability to determineboth translatome and transcriptome enabled us toevaluate the efficiency with which individual transcriptsare translated into proteins and suggests translationalcontrol to be an important regulatory mechanism inT brucei Under a single condition we observed transla-tional efficiency to vary over two orders of magnitudethus its importance equals that of RNA stability forwhich a similar range has been measured (82) Howeverwe observed no correlation between RNA abundance andtranslational efficiency suggesting that translational effi-ciency is regulated independently of RNA stability

To use ribosome density to estimate the rate of proteinsynthesis the speed of translation must be constant Thisassumption is supported by measurements in mouse em-bryonic stem cells that demonstrate the rate of translationto be consistent between different classes of mRNA thekinetics of elongation to be independent of length andprotein abundance and the speed of translation to be in-dependent of codon usage (4664) Nevertheless the speedof translation may very well be different between the twolife cycle stages of the parasite that live at 37C and 27CFurthermore our approach does not permit measurementof the absolute mRNA and footprint abundance for theindividual life cycle stages making direct comparisons oftranslation efficiencies for individual transcripts unreli-able Therefore we did not compare translational effi-ciency directly but rather determined the lsquorankrsquo intranslational efficiency for all genes in each life cyclestage assuming that a lsquochange in rankrsquo indicated lifecycle-dependent translational control While we see ageneral positive correlation in translational efficiencybetween the two life cycle stages (Pearson r=07428)for a large number of genes we observed developmental

regulation of translation Importantly the regulation weobserve agrees well with that reported in previous studiesfor a small subset of genes but also includes many proteinspreviously unknown to be developmentally regulated Inaddition the efficient translation of proteins annotated ascytochrome oxidases in the PF and of enzymes importantfor glycolysis in the BF is consistent with differences inenergy metabolism with the BF entirely dependent on gly-colysis to generate energyHow translational control is achieved in T brucei

remains to be seen but our analysis suggests that uORFmay be an important contributor Work in several organ-isms has shown that generally the presence of uORFs cor-relates with reduced gene expression (71) However forsome genes like the transcription factor GCN4 in yeastand the activating transcription factor ATF4 in mammalsthis correlation is reversed on stress (8384) Thus uORFcan exert positive and negative effects on translation ofdownstream CDSs Based on previous 50 UTR annota-tions we identified uORFs in 22 of T brucei genesbut it will be interesting to learn if the regulatory effectsof uORFs may be supplemented by translation from non-AUG uORFs Our ribosome profiling data indicate anincrease in translation from 50 UTRs and a more ubiqui-tous translation initiation from non-AUG sites in thePF compared with the BF Similarly an increase in thetranslation from 50 UTRs has been observed in yeast onstarvation (45) One of the most highly conserved stress-induced mechanisms to regulate translation involves theinactivation of the translation initiation factor eIF2aby phosphorylation at a conserved serine (8586)Phosphorylation of the T brucei and L major orthologsoccurs at the corresponding Thr residues (687) but thegeneration of a T brucei cell line exclusively expressing amutant form of eIF2a demonstrated that phosphorylationof eIF2a is not required for heat-induced translationalarrest or for the formation of heat shock stress granulesin T brucei (88) Given the finding that eIF2a may also beimportant in determining the stringency in initiator codonselection (89) it will be interesting to see whether eIF2aphosphorylation affects the selection of initiator codonsand whether this plays a role in the fine-tuning of geneexpressionBesides being a valuable tool to study translational

regulation ribosome profiling will also be invaluable forthe discovery of small peptides While only 2 of thehuman genome contains annotated CDSs (90) transcrip-tome analyses have revealed almost genome-wide tran-scription of RNA mostly considered to be non-protein-coding In lower organisms the percentage of thegenome encoding for proteins is higher (629192) butpervasive transcription of non-protein coding regionscan be found in essentially all organisms (93) The vastamount of ncRNA has triggered great interest inelucidating the biological significance of these transcriptsand has led to the identification of numerous regulatorymechanisms controlled by short and long ncRNAInterestingly the majority of transcription occurs aslong ncRNA in humans (94) many of which containORFs making it difficult to unambiguously classify a tran-script as non-protein-coding To distinguish long ncRNA

Nucleic Acids Research 2014 Vol 42 No 6 3633

from mRNA ORF length has served as the mostcommonly applied criterion because even without select-ive pressure short ORFs will occur frequently by chancewhile the probability that long ORFs occur by chance arelow Nevertheless recent findings reveal that eukaryoticgenomes contain a large number of small CDSs (95ndash97)and that small peptides play important biological roles(77789899) In addition numerous long ncRNAs con-taining ORFs longer than 100 aa have been found to as-sociate with ribosomes (4653) While these findingscomplicate the annotation of CDSs and demand for newapproaches to annotate genes they raise the exciting pos-sibility that a large number of unknown small CDSs existsand awaits discoveryFor this study RNA was obtained from a 427 strain

the most common laboratory strain and sequence readswere aligned to the more complete genome of the 927strain nevertheless RNA-sequencing analysis indicateswidespread transcription Thus as seen in other organ-isms active transcription is not necessarily a good pre-dictor of CDSs In contrast we found ribosomefootprint density to be highly enriched across annotatedCDSs excluding introns It should thus serve as a reliabletool to identify novel genes To search for new CDSs andto evaluate their biological function we combinedribosome profiling proteomics and genome-wide RNAidata While proteomics data allowed us to identifyprotein products for only 20 relatively large candidateCDSs the experimental set-up of the proteomic analysisthat involved the removal of small peptides made itimpossible to verify the existence of many small CDSsThus more important than the verification of putativeCDSs with proteomics data may be our finding thatgt200 of the putative CDSs appear to be essential forparasite fitness Furthermore RIT-seq data suggest thatcandidate CDSs are 13-fold less likely to be essentialfor viability in the PF than in the BF or during differen-tiation of cells from BF to PF These findings point to apossible role of small peptides in the adaptation of tryp-anosomes to life in the mammalian host Generally ourdata suggest the existence of a large yet to be exploredsmall proteome but further characterizations of thesmall proteome will have to include a more targetedproteomics analysisIn this analysis we focused on measuring translational

control and the identification of new CDSs Howeverribosome profiling data should also aid in the identifica-tion of bona fide ncRNAs For many annotated CDSs weobserved RNA transcripts but not ribosome footprintssuggesting the presence of ncRNA Though given thatwe have only analysed two life cycle stages theseputative ncRNAs may be translated during other stagesof the parasitersquos life cycle Thus a comprehensive searchfor ncRNA should include ribosome profiling analyses ofall life cycle stages

ACCESSION NUMBERS

All sequencing data have been deposited in the EuropeanNucleotide Archive PRJEB4801

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Stan Gorski Susanne Kramer Christian Janzenand Jose-Juan Lopez-Rubio for valuable discussions andcritical reading of the manuscript and Yanjie Chao andJan Medenbach for much appreciated advice ongenerating ribosome profiling libraries

FUNDING

Young Investigator Program of the Research Center ofInfectious Diseases (ZINF) of the University ofWuerzburg Germany German Research FoundationDFG [SI 16102-1] Human Frontier Science Program(to TNS) French National Research Agency [ANR-2010-GENM-011-01 GENAMIBE to CCH] Fundingfor open access charge German Research Foundation(DFG) and the University of Wuerzburg in the fundingprogramme Open Access Publishing

Conflict of interest statement None declared

REFERENCES

1 MooreMJ (2005) From birth to death the complex lives ofeukaryotic mRNAs Science 309 1514ndash1518

2 NagarajN WisniewskiJR GeigerT CoxJ KircherMKelsoJ PaaboS and MannM (2011) Deep proteome andtranscriptome mapping of a human cancer cell line Mol SystBiol 7 548

3 LuP VogelC WangR YaoX and MarcotteEM (2007)Absolute protein expression profiling estimates the relativecontributions of transcriptional and translational regulationNat Biotechnol 25 117ndash124

4 de GodoyLM OlsenJV CoxJ NielsenML HubnerNCFrohlichF WaltherTC and MannM (2008) Comprehensivemass-spectrometry-based proteome quantification of haploidversus diploid yeast Nature 455 1251ndash1254

5 McNicollF DrummelsmithJ MullerM MadoreE BoilardNOuelletteM and PapadopoulouB (2006) A combinedproteomic and transcriptomic approach to the study ofstage differentiation in Leishmania infantum Proteomics 63567ndash3581

6 LahavT SivamD VolpinH RonenM TsigankovPGreenA HollandN KuzykM BorchersC ZilbersteinDet al (2011) Multiple levels of gene regulation mediatedifferentiation of the intracellular pathogen LeishmaniaFASEB J 25 515ndash525

7 SchwanhausserB BusseD LiN DittmarG SchuchhardtJWolfJ ChenW and SelbachM (2011) Global quantificationof mammalian gene expression control Nature 473 337ndash342

8 FevreEM WissmannBV WelburnSC and LutumbaP (2008)The burden of human African trypanosomiasis PLoS Negl TropDis 2 e333

9 LozanoR NaghaviM ForemanK LimS ShibuyaKAboyansV AbrahamJ AdairT AggarwalR AhnSY et al(2012) Global and regional mortality from 235 causes ofdeath for 20 age groups in 1990 and 2010 a systematic analysisfor the Global Burden of Disease Study 2010 Lancet 3802095ndash2128

10 Martinez-CalvilloS YanS NguyenD FoxM StuartK andMylerPJ (2003) Transcription of Leishmania major Friedlinchromosome 1 initiates in both directions within a single regionMol Cell 11 1291ndash1299

3634 Nucleic Acids Research 2014 Vol 42 No 6

11 Martinez-CalvilloS NguyenD StuartK and MylerPJ (2004)Transcription initiation and termination on Leishmania majorchromosome 3 Eukaryot Cell 3 506ndash517

12 SiegelTN HekstraDR KempLE FigueiredoLMLowellJE FenyoD WangX DewellS and CrossGAM(2009) Four histone variants mark the boundaries of polycistronictranscription units in Trypanosoma brucei Genes Dev 231063ndash1076

13 KolevNG FranklinJB CarmiS ShiH MichaeliS andTschudiC (2010) The transcriptome of the human pathogenTrypanosoma brucei at single-nucleotide resolutionPLoS Pathog 6 e1001090

14 LeBowitzJH SmithHQ RuscheL and BeverleySM (1993)Coupling of poly(A) site selection and trans-splicing inLeishmania Genes Dev 7 996ndash1007

15 UlluE MatthewsKR and TschudiC (1993) Temporal order ofRNA-processing reactions in trypanosomes rapid trans splicingprecedes polyadenylation of newly synthesized tubulin transcriptsMol Cell Biol 13 720ndash725

16 MatthewsKR TschudiC and UlluE (1994) A commonpyrimidine-rich motif governs trans-splicing and polyadenylationof tubulin polycistronic pre-mRNA in trypanosomes Genes Dev8 491ndash501

17 WrightJR SiegelTN and CrossGAM (2010) Histone H3trimethylated at lysine 4 is enriched at probable transcriptionstart sites in Trypanosoma brucei Mol Biochem Parasitol 136434ndash450

18 ClaytonC and ShapiraM (2007) Post-transcriptional regulationof gene expression in trypanosomes and leishmanias MolBiochem Parasitol 156 93ndash101

19 MatthewsKR (2011) Controlling and coordinating developmentin vector-transmitted parasites Science 331 1149ndash1153

20 CrossGA KleinRA and LinsteadDJ (1975) Utilization ofamino acids by Trypanosoma brucei in culture L-threonine as aprecursor for acetate Parasitology 71 311ndash326

21 BrunR and SchonenbergerM (1979) Cultivation and in vitrocloning or procyclic culture forms of Trypanosoma brucei in asemi-defined medium Acta Trop 36 289ndash292

22 FurgerA SchurchN KurathU and RoditiI (1997) Elementsin the 30 untranslated region of procyclin mRNA regulateexpression in insect forms of Trypanosoma brucei bymodulating RNA stability and translation Mol Cell Biol 174372ndash4380

23 HehlA VassellaE BraunR and RoditiI (1994) A conservedstem-loop structure in the 30 untranslated region of procyclinmRNAs regulates expression in Trypanosoma brucei Proc NatlAcad Sci USA 91 370ndash374

24 HotzHR HartmannC HuoberK HugM and ClaytonC(1997) Mechanisms of developmental regulation in Trypanosomabrucei a polypyrimidine tract in the 30-untranslated region of asurface protein mRNA affects RNA abundance and translationNucleic Acids Res 25 3017ndash3026

25 MayhoM FennK CraddyP CrosthwaiteS and MatthewsK(2006) Post-transcriptional control of nuclear-encoded cytochromeoxidase subunits in Trypanosoma brucei evidence for genome-wide conservation of life-cycle stage-specific regulatory elementsNucleic Acids Res 34 5312ndash5324

26 WalradP PaterouA Acosta-SerranoA and MatthewsKR(2009) Differential trypanosome surface coat regulation by aCCCH protein that co-associates with procyclin mRNA cis-elements PLoS Pathog 5 e1000317

27 HelmJR WilsonME and DonelsonJE (2009) Differentialexpression of a protease gene family in African trypanosomesMol Biochem Parasitol 163 8ndash18

28 HornD (2008) Codon usage suggests that translational selectionhas a major impact on protein expression in trypanosomatidsBMC Genomics 9 2

29 ManfulT FaddaA and ClaytonC (2011) The role of the 50-30

exoribonuclease XRNA in transcriptome-wide mRNAdegradation RNA 17 2039ndash2047

30 YoffeY ZuberekJ LewdorowiczM ZeiraZ KeasarCOrr-DahanI Jankowska-AnyszkaM StepinskiJDarzynkiewiczE and ShapiraM (2004) Cap-binding activity ofan eIF4E homolog from Leishmania RNA 10 1764ndash1775

31 YoffeY ZuberekJ LererA LewdorowiczM StepinskiJAltmannM DarzynkiewiczE and ShapiraM (2006) Bindingspecificities and potential roles of isoforms of eukaryotic initiationfactor 4E in Leishmania Eukaryot Cell 5 1969ndash1979

32 De GaudenziJ FraschAC and ClaytonC (2005) RNA-bindingdomain proteins in Kinetoplastids a comparative analysisEukaryot Cell 4 2106ndash2114

33 DhaliaR ReisCR FreireER RochaPO KatzRMunizJR StandartN and de Melo NetoOP (2005)Translation initiation in Leishmania major characterisation ofmultiple eIF4F subunit homologues Mol Biochem Parasitol140 23ndash41

34 ZinovievA and ShapiraM (2012) Evolutionary conservation anddiversification of the translation initiation apparatus intrypanosomatids Comp Funct Genomics 2012 813718

35 SiegelTN HekstraDR WangX DewellS and CrossGAM(2010) Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicingand polyadenylation sites Nucleic Acids Res 38 4946ndash4957

36 NilssonD GunasekeraK ManiJ OsterasM FarinelliLBaerlocherL RoditiI and OchsenreiterT (2010) Spliced leadertrapping reveals widespread alternative splicing patterns in thehighly dynamic transcriptome of Trypanosoma brucei PLoSPathog 6 e1001037

37 SiegelTN GunasekeraK CrossGA and OchsenreiterT(2011) Gene expression in Trypanosoma brucei lessons fromhigh-throughput RNA sequencing Trends Parasitol 27 434ndash441

38 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principles ofits regulation Nat Rev Mol Cell Biol 11 113ndash127

39 KozakM (1984) Selection of initiation sites by eucaryoticribosomes effect of inserting AUG triplets upstream from thecoding sequence for preproinsulin Nucleic Acids Res 123873ndash3893

40 SomersJ PoyryT and WillisAE (2013) A perspective onmammalian upstream open reading frame function Int JBiochem Cell Biol 45 1690ndash1700

41 SiegelTN TanKS and CrossGAM (2005) Systematic studyof sequence motifs for RNA trans splicing in Trypanosomabrucei Mol Cell Biol 25 9586ndash9594

42 AravaY WangY StoreyJD LiuCL BrownPO andHerschlagD (2003) Genome-wide analysis of mRNA translationprofiles in Saccharomyces cerevisiae Proc Natl Acad Sci USA100 3889ndash3894

43 CapewellP MonkS IvensA MacgregorP FennKWalradP BringaudF SmithTK and MatthewsKR (2013)Regulation of trypanosoma brucei total and polysomal mRNAduring development within its mammalian host PLoS One 8e67069

44 BrechtM and ParsonsM (1998) Changes in polysome profilesaccompany trypanosome development Mol Biochem Parasitol97 189ndash198

45 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

46 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

47 OhE BeckerAH SandikciA HuberD ChabaR GlogeFNicholsRJ TypasA GrossCA KramerG et al (2011)Selective ribosome profiling reveals the cotranslational chaperoneaction of trigger factor in vivo Cell 147 1295ndash1308

48 MichelAM and BaranovPV (2013) Ribosome profiling aHi-Def monitor for protein synthesis at the genome-wide scaleWiley Interdiscip Rev RNA 349 4184ndash4188

49 IngoliaNT (2010) Genome-wide translational profiling byribosome footprinting Methods Enzymol 470 119ndash142

50 IngoliaNT BrarGA RouskinS McGeachyAM andWeissmanJS (2012) The ribosome profiling strategy formonitoring translation in vivo by deep sequencing ofribosome-protected mRNA fragments Nat Protoc 7 1534ndash1550

51 LangmeadB and SalzbergSL (2012) Fast gapped-readalignment with Bowtie 2 Nat Methods 9 357ndash359

Nucleic Acids Research 2014 Vol 42 No 6 3635

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 4: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

versus differentiated cells induced and BF uninducedversus PF induced

Analysis of proteomics data

All mass spectrometric raw data files from Butter et al(57) were processed as described except that the data weresearched additionally with the previously un-annotatedCDSs listed in Supplementary Tables S4 and S5 BrieflyMaxQuant 1412 including the Andromeda search engine(5859) was used for processing raw data and databasesearching The experimental design template used was asdescribed before (57) The search was performed against acombination of three databases a T brucei database con-taining 19 103 annotated protein entries for strains 427and 927 (version 50 Tb427 and Tb927 downloadedfrom httptritrypdborg) a list of uORFs and putativeCDSs identified in this study (Supplementary Tables S4and S5) and a database containing typical contaminantsEnzyme search specificity was TrypsinP with threemiscleavages for tryptic digests and LysC with twomiscleavages for digests with lysyl endopeptidaseCarbamidomethylation on cysteines was set as fixed modi-fication methionine oxidation and protein N-terminalacetylation were considered as variable modificationsMass accuracy tolerances (after recalibration) were6 ppm for precursor ions and 05Da for CID MSMSand 20 ppm for HCD MSMS spectra respectivelyFalse discovery rate was fixed at 1 at peptide andprotein level with posterior error probability (PEP) assorting criterion Identifications were matched betweenruns within a 2-min window Protein groups identified inthe potential CDS database had at least one uniquepeptide and a PEPlt 00005

RESULTS

Ribosome footprints reveal ribosome position at sub-codonresolution

To obtain genome-wide information on protein synthesiswe adapted a ribosome profiling protocol published inyeast and mice (4546) for use in T brucei We sequencedribosome footprints and fragmented mRNA from the twotrypanosome stages that have been adapted to in vitroculture the PF of the tsetse fly midgut and theproliferating BF that lives in the blood of the mammalianhost Cultured cells were treated with cycloheximide toarrest translating ribosomes and harvested Becausecycloheximide has been shown to stabilize RNA tran-scripts in T brucei (60) the treatment was limited to2min and applied to both cells used for ribosomeprofililng and cells used for RNA-sequencing analysesTo maximize the accuracy of the ribosome footprintswe optimized the digestion efficiency by testing differentnuclease concentrations and reaction temperatures(lsquoMaterials and Methodsrsquo section and SupplementaryFigure S1) Next to map translated regions of RNA andto exclude RNase-protected regions that are scanned bythe 43S pre-initiation complex we used sucrose gradientsto specifically enrich for monosomes (SupplementaryFigure S1) The high reproducibility between libraries

prepared under different conditions demonstrated the ro-bustness of the ribosome profiling approach despite theelaborate library preparation protocol (SupplementaryFigure S2 BF digestion at 4C versus digestion at roomtemperature R2=09755)

To ensure that nuclease-resistant RNA fragments rep-resent ribosome footprints and not short RNA fragmentsprotected by other RNA-binding proteins we comparedthe genome-wide distribution of sequence reads obtainedfrom ribosome profiling and RNA-sequencing librariesAlignment of ribosome profiling reads revealed that thenuclease-protected reads predominantly mapped alongCDSs with very few reads aligning to intergenic regions(Figure 1A) The 50 nucleotide of the ribosome profilingreads aligned from 12 nt upstream of the ATG codon to18 nt upstream of the stop codon (Figure 1B) Given afootprint length of 28ndash30 nt the ribosome protects mRNAfrom 12 nt upstream of the start codon to 9ndash11 nt down-stream of the stop codon In contrast to ribosome foot-print reads RNA-sequencing reads were aligned along theCDSs and UTRs as expected (Figure 1A) In additionto the enrichment across CDSs the alignment ofribosome profiling reads revealed a distinct 3-nt period-icity with 68 of BF ribosome profiling reads starting atthe first nucleotide of a codon (Figure 1B and C andSupplementary Figure S3) No such 3-nt periodicity wasobserved for the mRNA sequence reads

Only two T brucei genes have introns one encoding apoly(A) polymerase and the other encoding a DNARNAhelicase (6162) Surprisingly there is almost no decreasein sequence reads detectable by RNA-sequencing acrossthe introns while the ribosome profiling data clearly showthat the intronic RNA is not translated (Figure 2A) Thesedata indicate that the process of cis-splicing must be un-usually inefficient in trypanosomes but that a tightcontrol mechanism ensures that only mature mRNAsenter the translational machinery

The strong enrichment of ribosome profiling readsacross CDSs the lack of ribosome profiling reads acrossintrons and the observed 3-nt periodicity are characteristicof ribosome-protected RNA footprints and argue againstunspecific RNA protection (45) Thus ribosome profilingreads indeed represent ribosome footprints and providea means to measure protein synthesis and in additionto identify both previously unannotated CDSs as well asnon-translated transcripts previously annotated as coding(for example Figure 1A Tb927102360) The latter maybe either incorrectly annotated or not translated in thetwo life cycle stages examined

Identification of translation initiation sites

Genome-wide mapping of spliced leader acceptor sites(SASs) ie the 50 end of the 50 UTR to which thespliced leader RNA is trans-spliced revealed hundredsof instances in which SASs mapped within annotatedCDSs or far upstream of annotated CDSs (133536)These findings indicated that for many genes the trueCDS was shorter than the annotated CDS while forothers it allowed the possibility of translation initiationat an AUG upstream of the annotated translation

3626 Nucleic Acids Research 2014 Vol 42 No 6

initiation site However while RNA-sequencing allowedthe unambiguous identification of mis-annotation formany genes translation initiation does not always occurat the first AUG downstream of the SAS Thus the use-fulness of RNA-sequencing to identify the true translationinitiation sites is limited and depends on the genomiccontext

In contrast to RNA-sequencing ribosome profilingallows the determination of the translational landscapeThus it allows the identification of CDS mis-annotationsand the determination of the most probable translationinitiation sites Exemplary for numerous genes forTb92751990 we detected no translation between thefirst and the second AUG (green bars) indicating thatthis region is not translated in the BF and PF and thattranslation initiates at the second AUG (Figure 2Bleft panel) For Tb927102720 we observed strong trans-lation upstream of the annotated CDS The fact thattranslation starts at an upstream AUG which is inframe with the annotated translation initiation sitesuggests that the CDS of Tb927102720 is longer thanannotated (Figure 2B right panel) These examplesshow that ribosome profiling data can be used for there-annotation of incorrectly annotated CDSs

Ribosome profiling reveals extensive translational control

While the role of RNA stability has been investigated on agenome-wide scale in T brucei (29) global regulation atthe levels of protein translation has not been analysedTherefore we decided to apply the ribosome profilingapproach to evaluate the degree of regulation occurringat the level of translation in trypanosomesTo estimate the rate of protein synthesis in trypano-

somes we determined the ribosome footprint density

AUG

28 nt - ribosome footprint

CDS position (nt from start)

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

P s

iteA

site

UAA

CDS position (nt from stop)

P s

iteA

site

ribosome footprint

frame

perc

enta

ge o

f rea

ds

0

20

0

40

60

80

1 2 0 1 2

mRNA

-24 -18 -12 -6 0 6 12 18 24 -24 -18 -12 -6 0 6 12

foot

prin

t (R

PM

)m

RN

A (

RP

M)

mR

NA

(R

PM

)

C

B

A

0

2

4

0

2

4

6

0

2

4mRNA density in PF

ribosome footprint density in PF

mRNA denstiy in BF

foot

prin

t (R

PM

)

0

0

02

04

1

2

3

600000

Tb927102320

Tb927102330

Tb927102340

Tb927102350

Tb927102370

Tb927102360

Chr X 602000 604000 606000 608000

ribosome footprint density in BF

Figure 1 Ribosome footprints reveal coding sequences at sub-codonresolution (A) Ribosome footprints are strongly enriched acrossCDSs mRNA densities and ribosome densities are shown as readsper nucleotide per million reads (RPM) to normalize for differencesin library size (B) Alignment of the 50 nucleotide from ribosome foot-print reads that map close to translation start or translation termin-ation sites Blue boxes mark the approximate size of the ribosomefootprint (C) Percentage of position of sequence reads relative toreading frame

00

10

20

813000Chr III Chr VIII

814000

815000

816000

00010203

0

2

4

490000

492000

494000

00

10

20

0

2

4

0

2

4

00

10

20

615500Chr V Chr X

616000

616500

617000

699000

700000

701000

00

02

04

mR

NA

(R

PM

)

mR

NA

(R

PM

)

foot

prin

t (R

PM

)

foot

prin

t (R

PM

)

mR

NA

(R

PM

)

mR

NA

(R

PM

)

foot

prin

t (R

PM

)

foot

prin

t (R

PM

)

Tb92751990 Tb927102720

Tb92733160 Tb92781510

B

A

Figure 2 Ribosome footprints reveal translated regions (A) Ribosomeprofiles for the two intron-containing genes Introns are represented asa dashed line (B) Ribosome profiles of two possibly mis-annotatedCDSs Black bars mark annotated CDSs Grey bars mark CDSs pre-dicted based on ribosome profiles Green boxes represent ATG codonsand red boxes represent stop codons

Nucleic Acids Research 2014 Vol 42 No 6 3627

across CDSs [measured as reads per million reads per kb(rpkm) see lsquoMaterials andMethodsrsquo section] Of 77 millionPF and 44 million BF ribosome footprint sequence reads28 (PF) and 14 (BF) could be aligned to annotatedCDSs while 70 (PF) and 85 (BF) mapped to knownstructural non-coding RNA (ncRNA) The high percent-age of structural ncRNA in the footprint fraction has beenreported previously (45) and results from the large amountof rRNA in the monosome fraction The lower percentageof CDS reads in the BF sample compared with the PFsample may be due to libraries being generated from thewild-type 427 strain while reads were aligned to the better-annotated genome of the TREU 927 strain In the BFsome of the most highly expressed genes encoding forvariant surface glycoproteins differ in sequence betweenthe 427 and 927 strains and footprints from these geneswere not aligned in this analysisNevertheless for 8072 genes (82 of annotated CDSs)

we were able to detect 10 ribosome footprint reads athreshold that we defined as translation (SupplementaryTable S1) We found the rate of translation to differgreatly among proteins In the PF we observed a11 448-fold difference in ribosome density (3548-fold dif-ference in mRNA levels) between the 1 most highly andthe 1 most weakly translated proteins while in the BFwe observed a 1623-fold difference in ribosome densityand a 262-fold difference in mRNA levels (Figure 3A)These findings suggest translational efficiency to signifi-cantly contribute to the regulation of gene expressionTo determine the translational efficiency for individual

genes we normalized for differences in mRNA abundanceie we divided the ribosome footprint density by the levelsof mRNA abundance Our ribosome profiling dataindicate a relatively unbiased distribution of ribosomedensity across CDSs in the BF and a very minor increasein density towards the 50 end of CDSs in the PF(Supplementary Figure S3) Nevertheless wheneverdetermining the translational efficiency we excludedmRNA and footprint reads that mapped to the first 40 ntof a CDS to avoid possible artifacts from ribosome stallingat translation initiation sites Between the 1 most effi-ciently and the 1 least efficiently translated proteins weobserved in the PF a 117-fold and in the BF a 64-fold rangein translational efficiency ie in the amount of proteinsproduced per transcript (Figure 3B and SupplementaryTable S2) This range in translational efficiency is similarto what has been observed in yeast and roughly 10-foldhigher than what has been described for mice (4546)Interestingly we observed no correlation between RNAabundance and translational efficiency (R2=003) sug-gesting that translational efficiency is regulated independ-ently of RNA stability

Life cycle-specific regulation of translational efficiency

Next we addressed the question if translational efficiencyis regulated in a life cycle-specific manner Ribosomedensity measurements only allow meaningful comparisonsof the rate of protein synthesis when the speed of transla-tion is assumed to remain constant This assumption issupported by measurements in mouse embryonic stem

cells (4664) however the speed of translation may verywell not be identical in two life cycle stages that live atdifferent temperatures In addition our current protocoldoes not allow the absolute quantification of mRNA and

-4 -2 0 2 4 6 8 10 12 140

1000

2000

3000

0

500

1000

1500

-80

-60

-40

-20 0

02

04

0

-6 -3 0 3 6 9 12 150

1000

2000

3000

0

1

2

1459000Chr IX

1461000

1463000

1459000Chr IX

1461000

1463000

0

2

4

0

1

2

0

2

4

translational efficiency (log2)

PF rank in translational efficiencyBF

ran

k in

tran

slat

iona

l effi

cien

cy

log2 rpkm log2 rpkm

num

ber

of g

enes

num

ber

of g

enes

0

500

1000

1500

num

ber

of g

enes

num

ber

of g

enesmRNA

footprintsmRNAfootprints

BF

BF PF

PF

BF mRNA density PF mRNA density

BF footprint density PF footprint density

high

all genes N = 7782

PUF (Pumillio) N = 10

cytochrome oxidases N = 8

alternative oxidase

enzymes requiredfor glycolysis N = 10

A

B

D

C

mR

NA

(R

PM

)fo

otpr

int (

RP

M)

mR

NA

(R

PM

)fo

otpr

int (

RP

M)

Tb092111070 Tb092111070

low

low

-80

-60

-40

-20 0

02

04

0

translational efficiency (log2)

Figure 3 Translational efficiency is regulated (A) Histograms ofmRNA abundance and ribosome density (rate of protein synthesis)for BF (left panel) and PF parasites (right panel) (B) Histogram oftranslational efficiency (ratio of ribosome footprint density to mRNAabundance) for BF (left panel) and PF parasites (right panel) (C) Pair-wise comparisons of translational efficiency in PF and BF CDSs wereranked based on translational efficiency (1=highest translational effi-ciency 7782= lowest translational efficiency) in BF and PF Ranksbetween life cycle stages show a correlation of R=07428 Genefamilies with developmentally regulated translational efficiencies arecolour coded Pumillio genes (red) cytochromes oxidase (blue) genesrequired for glycolysis (63) (green) and the alternative oxidase(Tb927107090 black) (D) Ribosome footprint profile of a gene withdevelopmentally regulated translational efficiency Green arrow indi-cates direction of transcription

3628 Nucleic Acids Research 2014 Vol 42 No 6

ribosome footprint levels in the two different life cyclestages Therefore we decided to compare the overallchanges in translational efficiency within each life cyclestage To this end we ranked the genes in each life cyclestage based on their relative translational efficiency andcompared their rank between the two life cycle stagesA pair-wise comparison indicated a generally positive cor-relation between translational efficiency in the two stages(Figure 3C Pearsonrsquos correlation coefficient=07428Plt 00001 95 CI 07327ndash07526) but it also revealeddistinct life cycle-specific differences for subsets of genesFor example 58 genes found in the lowest 25 in terms oftranslational efficiency in the PF were among the top 25most efficiently translated proteins in the BF Thesecontain various RNA binding proteins numerous phos-phatases and expression site-associated genes (Table 1 andSupplementary Table S3) For an example of a genemore efficiently translated in the BF than in the PF seeFigure 3D

Based on mRNA and protein level measurements differ-ential translational efficiency has been predicted foraconitase the cytochrome oxidases V and VI and procyclins(222565) For all of these genes our data indicate anincreased translational efficiency in the PF (Table 1) thusconfirming previous observations and validating our meas-urements For selected groups of proteins eg annotated ascytochrome oxidase or those required for glycolysis wenoticed the translational efficiency to correspond to the

changes that have been described for the glucosemetabolism(Figure 3C) Whereas the BF is entirely dependent on theglycolytic pathway to generate energy the PF living in theinsect midgut where glucose availability is uncommon orabsent has been shown to contain a much more elaborateenergy metabolism (66) In contrast the efficient translationof the group of PUF (Pumilio and FBF) proteins in PF wasunexpected (Figure 3C) PUF proteins regulate translationand mRNA stability by binding sequences in their targetRNAs and have been shown to affect gene expression inorganisms as divergent as Plasmodium falciparumT brucei humans and yeast (67ndash69) Interestingly wefound both poly(A) binding proteins all four isoforms ofeIF4Es and four of the five eIF4G isoforms all proteinsinvolved in control of translation initiation to be more effi-ciently translated in the PF than in the BF (SupplementaryTable S2)Given the large dynamic range and the life cycle-specific

regulation we propose that differences in translationalefficiency contribute substantially to the control of geneexpression in T brucei

uORFs are frequently translated and may regulatetranslational efficiency

The broad 100-fold range in translational efficiencyamong genes and the differences in translationalefficiencies between life cycle stages raise the question of

Table 1 Developmentally regulated translation

Gene ID Description BF rank PF rank Change inrank (PFndashBF)

Translation up-regulated in BFTb92773250 Expression site-associated gene 6 (ESAG6) protein putative 1372 7405 6033Tb92743980 Chaperone protein DNAj putative 1301 7248 5947Tb927104780 GPI inositol deacylase precursor (GPIdeAc) 390 5994 5604Tb92788140 Small GTP-binding rab protein putative 1217 6804 5587Tb92763480 RNA-binding protein putative (DRBD5) 1856 7262 5406Tb11014701 Membrane-bound acid phosphatase 1 precursor (MBAP1) 689 6065 5376Tb92714650 Cyclin-like F-box protein (CFB2) 10 5289 5279Tb92735660 UDP-Gal or UDP-GlcNAc-dependent glycosyltransferase putative 797 6041 5244Tb92726000 Glycosylphosphatidylinositol-specific phospholipase C (GPI-PLC) 1634 6793 5159Tb92745310 Serinethreonine-protein kinase a putative 1656 6776 5120

Translation up-regulated in PFTb9275440 trans-Sialidase putative 6785 418 6367Tb92776850 trans-Sialidase (TbTS) 7513 1445 6068Tb92777470 Receptor-type adenylate cyclase GRESAG 4 putative 7001 1268 5733Tb92712120 Calpain putative 6258 739 5519Tb9274360 12-Dihydroxy-3-keto-5-methylthiopentene dioxygenase putative 6587 1089 5498Tb091605550 Calpain-like cysteine peptidase putative 6936 1868 5068Tb92787690 Amino acid transporter (pseudogene) putative 6693 1731 4962Tb92781610 MSP-B putative 7199 2272 4927Tb11016650 Serinethreonine-protein kinase putative 5280 363 4917Tb92777110 Leucine-rich repeat protein (LRRP) putative 6501 1601 4900

Genes for which translational up-regulation in PF was previously shownTb92710280 Cytochrome oxidase subunit VI (COXVI) 2820 810 2010Tb091601820 Cytochrome oxidase subunit V (COXV) 935 291 644Tb9271014000 Aconitase (ACO) 657 31 626Tb9276510 GPEET2 procyclin precursor 3848 365 3483Tb9271010260 EP1 procyclin (EP1) 4682 34 4648Tb9271010220 Procyclin-associated gene 2 (PAG2) protein (PAG2) 6903 5375 1528Tb9275330 Receptor-type adenylate cyclase GRESAG 4 putative 3779 809 2970

List of genes with highest change in translational efficiency between PF and BF List does not include hypothetical genes

Nucleic Acids Research 2014 Vol 42 No 6 3629

how protein translation is regulated For a small numberof genes in T brucei sequence motifs in the 30 UTR havebeen linked to developmentally regulated changes in trans-lational efficiency but for the large majority of genes nosuch motifs have been identified (70) Numerous reportsfrom other eukaryotes demonstrated that uORFs canfunction as important regulators of gene expression (71)For example analysis of mRNA and protein levels fromgt10 000 mammalian genes revealed that the presence ofan uORF correlates with a significant reduction of expres-sion from the downstream CDS Furthermore usingreporter constructs it was estimated that the presence ofan uORF results in a decrease in gene expression of 30ndash80 with only a minor reduction of mRNA levels (72)No such analyses have been performed for T brucei but ithas been shown that the removal of a uATG leads to a 7-fold increase in protein levels of a luciferase reporter (41)In this manuscript we use the term ORF or uORF to referto regions of DNA sequences beginning with an ATG andending with a termination codon that may or may notencode for proteins The term coding sequence (CDS) isused to refer to ORFs that encode proteins MinimallyuORFs consist of 9 nt containing an upstream start codon(uATG) an additional sense codon and a terminationcodon whereby the termination codon may be locateddownstream of the start codon of the main CDS (73)Even though 50 UTRs have only been assigned for 4909

genes (927 version 42) applying the criteria listed abovewe identified 8310 uORFs and found 1092 (22) of 50

UTRs to contain at least one uORF (SupplementaryTable S4) Thus uORF are much less common inT brucei than in mammals where 40ndash50 of genescontain uORFs (7475) Nevertheless while uORFscannot explain the translational regulation for all tran-scripts they may represent one of several factors import-ant for the regulation of gene expression in T bruceiTo determine the degree to what uORFs are translated

a prerequisite for affecting translational efficiency of thedownstream CDS we calculated the ribosome densityacross uORFs and identified 1834 uORFs (22) with 2 read-coverage and 70 of the ORF coveredThe average length of these uORFs was 22 aa Just likefor annotated CDSs the alignment of ribosome profilingreads across these uORFs revealed a distinct 3-nt period-icity with 63 of BF ribosome profiling reads starting atthe first nucleotide of a codon (Figure 4A and B) Whilethe periodicity of ribosome footprints across uORFs wasslightly less pronounced than for annotated CDS ourdata suggest that a large proportion of uORFs istranslatedTo evaluate the regulatory potential of uORFs we

compared the ribosome densities between transcripts con-taining at least one uORF (N=1092) and those withoutuORFs (N=3817) In the BF median ribosome densityacross the CDSs of transcripts with uORFs was3031 rpkm compared with 4486 rpkm for genes withoutuORFs (Table 2) In the PF median ribosome densitieswere 2001 rpkm (genes with uORF) and 3650 rpkm(genes without uORF) Thus in both life cycle stages weobserved a higher ribosome density for genes withoutuORFs than for genes with uORFs (Plt 00001) While

we also observed a higher mRNA level for geneswithout uORFs compared with genes with uORFs(Plt 00001) the differences in mRNA levels were lowerthan the differences in ribosome density (Table 2) Thusour data indicate that median translational efficiencyis higher for genes without uORF than for genes withuORF suggesting that the presence of uORFs may nega-tively impact translation of downstream CDSs

Noteworthy while median ribosome density for CDSswas higher in the BF than in the PF (Table 2) we observedthe opposite for 50 UTRs (Plt 00001) For 50 UTRs wefound a higher ribosome density in the PF (median rpkm773) than in the BF (median rpkm 289 SupplementaryTable S4 for an example see Figure 4C) In addition wefound that in the BF 50 UTRs with uORF (408 rpkm)have a higher ribosome density than 50 UTRs withoutuORFs (254 rpkm Plt 00001) Unexpectedly thecontrary was true in PF In the PF we found 50 UTRswithout uORF (834 rpkm) to have a higher ribosomedensity than 50 UTRs with uORF (501 rpkmPlt 00001) For an example of a non-AUG uORF seeFigure 4C right panel Many of such non-AUG uORFshave been observed in yeast and mice embryonic stem cells(mESCs) Interestingly an increase in 50 UTR translationsimilar to what we observed in the PF compared with theBF was observed in yeast on starvation and a decreasein 50 UTR translation was observed on differentiation ofpluripotent mESCs into embryoid bodies (4546)

Genome-wide analyses to prove or disprove a directcorrelation between translation of uORFs and

00

04

08

411000Chr I

411500 412000

00

02

04

00

04

08

Chr I420000 420200 420400

0

2

4

foot

prin

t (R

PM

)fo

otpr

int (

RP

M)

BF BF

PF PF

Tb92711610 Tb92711670

-24 -18 -12 -6 0 6 12

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

uORF

uORF

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

A B

C

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

-24 -18 -12 -6 0 6

Figure 4 Ribosome footprints reveal translation of uORFs(A) Alignment of the 50 nucleotides from ribosome footprint readsthat map close to translation start or translation termination sites ofuORFs (B) Percentage of position of sequence reads relative to readingframe (C) Ribosome profiles of two genes with uORF (left panel) andwithout uORF (right panel) Narrow grey boxes represent 50 UTRsgreen box represents AUG-codon red box represents terminationcodon

3630 Nucleic Acids Research 2014 Vol 42 No 6

translational efficiency is complicated by the existence ofmultiple 50 UTR isoforms resulting from widespreaddifferential trans-splicing (353676) At the same timethe heterogeneity in UTR lengths raises the intriguingpossibility that inclusion or exclusion of an uORFduring alternative trans-splicing may represent a regula-tory mechanism to modulate translational efficiency

Ribosome footprints reveal the presence of hundreds ofpreviously un-annotated CDSs

Even though small proteins (lt200 aa) have been shown toplay major roles in plant and animal development (7778)the complexity of the short proteome remains largelyunexplored because algorithms to reliably predict shortCDSs are lacking (79) We observed that while almostall ribosome footprints aligned to annotated features inthe BF 101 197 reads (023) and in the PF 1 116 030reads (144) did not It is noteworthy that just as weobserved for 50 UTRs the percentage of reads not aligningto CDSs is higher in the PF than in the BF We suspectthat many of these reads originate from un-annotatedUTRs but some may stem from previously un-annotatedCDSs A previous RNA-sequencing-based analysisdetected 1114 new un-annotated transcripts in the PF1011 of which had the potential to encode one or morepeptides ( 25 aa) Using available proteomics data theauthors were able to confirm translation for 19 of the 1114transcripts (13) Given our ability to determine ribosomepositions at nucleotide resolution we looked for new pre-viously un-annotated CDSs

We mapped ribosome footprint reads from the BF andPF and determined the ribosome footprint density for allpotential CDSs at least 10 aa in length To avoid overlapwith annotated CDSs we only considered ORFs locatedat least 20 nt away from annotated features Thisapproach enabled us to identify a set of 2021 candidateCDSs with 2 read-coverage and 70 of the ORFcovered The candidate CDSs ranged in length from 10to 378 aa (average 30 aa Supplementary Table S5) For797 of the 1114 previously identified transcripts we foundthe average ribosome footprint read-coverage to be 2(Supplementary Table S6 Previously identified transcriptsfrom chromosome 10 were not considered because oursequencing reads were mapped to a newer version of the

chromosome 10 assembly and thus genomic coordinatesmight be incompatible)To determine whether our set of candidate CDSs is

translated we analysed published proteomics datasets(57) to search for protein products For 24 of the 2021candidate CDSs (average size 117 aa 13 kDa) proteinproducts could be identified however four of theseidentified peptides also matched annotated CDSs Inaddition we identified protein products for 31 uORFs(Supplementary Table S4) We suspect that the lownumber of identified protein products relative to thelarge number of candidate CDSs may be caused (i) bythe experimental set-up of the original proteomicsanalysis where proteins were separated by 1D-SDS-PAGE prior to in-gel protease digest and mass spectro-metric analysis whereby proteins with a size smaller than5ndash10 kDa typically run out of the gel (ii) by the limitedsensitivity of mass spectrometric analyses compared withDNA sequencing-based techniques and (iii) by the factthat not all ribosome-protected RNAs are translatedThe first point is supported by the fact that only 67

(3) of our candidate CDSs were 90 aa (10 kDa) butfor 28 of those large candidate CDSs we could identifypeptides Evidence supporting the possibility that not allribosome-protected RNAs are translated exists in micewhere even known long ncRNAs have been found to beassociated with ribosomes (46) To evaluate the codingpotential of a transcript and to separate long ncRNAfrom small coding genes a new metric was establishedthe so-called RRS (53) The RRS takes advantage of thefact that translating ribosomes are released on encounter-ing a stop codon This release results in a sharp decrease inribosome occupancy between protein-coding regions andthe subsequent 30 UTR Consequently the RRS wasdefined as the ratio of footprint reads in the putativeCDS to footprint reads in the corresponding 30 UTRdivided by the ratio of RNA reads in the putative CDSto RNA reads in the corresponding 30 UTR (seelsquoMaterials and Methodsrsquo section)Similarly to annotated CDSs and uORFs our averaged

ribosome footprint data indicated a well-defined 3-nt peri-odicity for sequence reads occurring near translationinitiation sites However we observed a less well-defineddrop in ribosome densities across translation terminationsites (Figure 5A and B) which may indicate a lack of

Table 2 Translational efficiency of genes with and without uORF

Transcripts withuORF (N=3817)

Transcripts withoutuORF (N=1092)

MannndashWhitney test

Bloodstream formRibosome density (median rpkm) 3031 4466 Plt 00001mRNA levels (median rpkm) 3083 3613 Plt 00001Translational efficiency (ribosome densitymRNA levels) 100 127 Plt 00001

Procyclic formRibosome density (median rpkm) 2001 3650 Plt 00001mRNA levels (median rpkm) 2404 3038 Plt 00001Translational efficiency (ribosome densitymRNA levels) 087 126 Plt 00001

Reads mapping within the first 40 nt of a CDS were not considered for the measurements of translational efficiency Number of genes with annotated50 UTR N=4909

Nucleic Acids Research 2014 Vol 42 No 6 3631

productive translation Because we observed an abruptdrop in footprint density downstream of the stop codonof annotated genes in T brucei (Figure 1A and B) wedecided to apply the same parameters to evaluate the like-lihood that ORFs with high footprint density are indeedbeing translated into functional proteins Again candidateCDSs were defined as ORFs with a minimum length of10 aa and an average ribosome footprint coverage of 2over at least 70 of the ORF To define the correspond-ing putative 30 UTR we applied the same parameters usedin mice (53) ie the 30 UTR was defined as the regionbeginning immediately downstream of the ORF andending at the first subsequent start codon (in anyreading frame) A limitation of the RRS is that it canonly be calculated for an ORF if at least one entire foot-print read and one RNA read map to the putative CDSand the corresponding 30 UTR Because many 30 UTRswere too short to fulfill this criterion or simply did notcontain any footprint reads a sign for efficient translationtermination we were able to determine the RRS for only

1445 of genes annotated in T brucei (SupplementaryTable S7) For these 1445 genes the median RRS was839 compared with an RRS of 019 for the subset ofgenes annotated as hypothetical unlikely (N=90) Thussimilar to the observation in mice the RRS appears to bea good predictor of productive translation Next wedetermined the RRS for the previously un-annotated can-didate CDSs that fulfilled the criteria for RRS calculation(N=265) The median RRS of these candidate CDSs waswith 399 lower than the median RRS of the annotatedgenes (839) Thirteen candidate CDSs had an RRS largerthan 839 and 98 candidate CDSs had an RRS higher thanthe well-described intron-containing poly(A) polymerase(RRS=963) The RRS could only be calculated forone of the candidate CDSs for which we had identifiedprotein products (RRS=5744)

Finally in an attempt to learn about the function of theputative CDSs we searched for known protein motifsin those candidate CDSs for which we had identifiedprotein products and for those with an RRS above 10However except for one candidate CDS located withinthe histone H2B gene array and which contained anH2B signature motif (Supplementary Table S5) nomotifs were identified

Taken together proteomics data and a well-definedribosome release (high RRS) suggest extensive translationbeyond the annotated CDSs in T brucei

Newly identified CDSs are important for parasite fitness

The combination of ribosome profiling data with previ-ously published mass spectrometric data allowed us toidentify a large number of new CDSs Nonetheless the bio-logical significance of theseCDSs remains to be determinedIn addition while ORFs with a high ribosome footprintdensity a low RRS and without mass spectrometricevidence may not be translated into functional proteinsthey may nevertheless play important regulatory roles

Therefore we decided to evaluate the importance of ourcandidate CDSs independent of RRS and mass spectro-metric evidence in parasite survival A previously pub-lished high-throughput phenotyping approach termedRIT-seq measured the fitnessndashcost associations usingRNA interference (RNAi) The RIT-seq data revealed asignificant loss in fitness on RNAi induction of 2724 CDSsin the BF 1972 CDSs in the PF and 2677 CDSs in inducedand differentiated parasites (54) The approach is based onthe integration of RNAi libraries into trypanosomes and acomparison of the recovery of RNAi targets from tryp-anosome populations before and after RNAi inductionThe RNAi library was generated using genomic DNAbut the effect on parasite fitness was only determined forannotated genes Re-analysing the same datasets we finda significant loss of fitness on RNAi induction for 214putative CDSs in the BF (6 days after RNAi induction)16 CDSs in the PF and 227 CDSs on differentiation(Supplementary Table S8) For examples of candidateCDSs potentially essential for viability in the BF seeFigure 5C It is important to note that the RNAi libraryused for the RIT-seq experiments contains a median insertsize of 600 bp (average 1 kb) thus somewhat limiting the

01234

0

4

8

0005101520

02468

05

101520

02468

Chr III

622400

622000

622800

0

4

8

Chr X

2577800

2578200

01234

mR

NA

fo

otpr

int

RN

Ai r

eads

R

NA

i rea

ds

A

C

B

putative CDS putative CDS

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

-24 -18 -12 -6 0 6 12 -24 -18 -12 -6 0 6

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

putative CDS

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

(RPM)

Figure 5 Ribosome footprints reveal previously un-annotated CDSs(A) Alignment of the 50 nucleotides from ribosome footprint reads thatmap close to translation start or translation termination sites of candi-date CDSs (B) Percentage of position of sequence reads relative toreading frame (C) Ribosome mRNA and RIT-seq (RNAi targetsequencing) profiles of two previously un-annotated putative CDSs

3632 Nucleic Acids Research 2014 Vol 42 No 6

resolution of the RIT-seq data (8081) Nevertheless thesefindings suggest an important biological role for a subsetof genes that have been missed in previous genome anno-tations Intriguingly the RNAi data reveal large develop-mental differences in the fitness costs associated with thedown-regulation of putative CDSs suggesting that the setof candidate CDSs may be more important during themammalian stage

DISCUSSION

In this study we report the first genome-wide analysis ofprotein synthesis and a strand-specific analysis of RNAtranscript levels for T brucei It is the first such analysisfor a eukaryotic pathogen and an organism without tran-scriptional control Our data reveal large differences intranslational efficiency among transcripts in the same lifecycle stage and between the same transcript in differentstages In addition the sequencing of ribosome-protectedRNA enabled us to identify transcripts that are likely tobe translated Hundreds of putative previously unidenti-fied CDSs appear to be essential for parasite fitness andthe analysis of available proteomics data confirmed theexistence of at least 20 of these previously unknown CDSs

Eukaryotic gene expression is regulated at multiplelevels with translation of mRNA into proteins beingone of the most important (17) The ability to determineboth translatome and transcriptome enabled us toevaluate the efficiency with which individual transcriptsare translated into proteins and suggests translationalcontrol to be an important regulatory mechanism inT brucei Under a single condition we observed transla-tional efficiency to vary over two orders of magnitudethus its importance equals that of RNA stability forwhich a similar range has been measured (82) Howeverwe observed no correlation between RNA abundance andtranslational efficiency suggesting that translational effi-ciency is regulated independently of RNA stability

To use ribosome density to estimate the rate of proteinsynthesis the speed of translation must be constant Thisassumption is supported by measurements in mouse em-bryonic stem cells that demonstrate the rate of translationto be consistent between different classes of mRNA thekinetics of elongation to be independent of length andprotein abundance and the speed of translation to be in-dependent of codon usage (4664) Nevertheless the speedof translation may very well be different between the twolife cycle stages of the parasite that live at 37C and 27CFurthermore our approach does not permit measurementof the absolute mRNA and footprint abundance for theindividual life cycle stages making direct comparisons oftranslation efficiencies for individual transcripts unreli-able Therefore we did not compare translational effi-ciency directly but rather determined the lsquorankrsquo intranslational efficiency for all genes in each life cyclestage assuming that a lsquochange in rankrsquo indicated lifecycle-dependent translational control While we see ageneral positive correlation in translational efficiencybetween the two life cycle stages (Pearson r=07428)for a large number of genes we observed developmental

regulation of translation Importantly the regulation weobserve agrees well with that reported in previous studiesfor a small subset of genes but also includes many proteinspreviously unknown to be developmentally regulated Inaddition the efficient translation of proteins annotated ascytochrome oxidases in the PF and of enzymes importantfor glycolysis in the BF is consistent with differences inenergy metabolism with the BF entirely dependent on gly-colysis to generate energyHow translational control is achieved in T brucei

remains to be seen but our analysis suggests that uORFmay be an important contributor Work in several organ-isms has shown that generally the presence of uORFs cor-relates with reduced gene expression (71) However forsome genes like the transcription factor GCN4 in yeastand the activating transcription factor ATF4 in mammalsthis correlation is reversed on stress (8384) Thus uORFcan exert positive and negative effects on translation ofdownstream CDSs Based on previous 50 UTR annota-tions we identified uORFs in 22 of T brucei genesbut it will be interesting to learn if the regulatory effectsof uORFs may be supplemented by translation from non-AUG uORFs Our ribosome profiling data indicate anincrease in translation from 50 UTRs and a more ubiqui-tous translation initiation from non-AUG sites in thePF compared with the BF Similarly an increase in thetranslation from 50 UTRs has been observed in yeast onstarvation (45) One of the most highly conserved stress-induced mechanisms to regulate translation involves theinactivation of the translation initiation factor eIF2aby phosphorylation at a conserved serine (8586)Phosphorylation of the T brucei and L major orthologsoccurs at the corresponding Thr residues (687) but thegeneration of a T brucei cell line exclusively expressing amutant form of eIF2a demonstrated that phosphorylationof eIF2a is not required for heat-induced translationalarrest or for the formation of heat shock stress granulesin T brucei (88) Given the finding that eIF2a may also beimportant in determining the stringency in initiator codonselection (89) it will be interesting to see whether eIF2aphosphorylation affects the selection of initiator codonsand whether this plays a role in the fine-tuning of geneexpressionBesides being a valuable tool to study translational

regulation ribosome profiling will also be invaluable forthe discovery of small peptides While only 2 of thehuman genome contains annotated CDSs (90) transcrip-tome analyses have revealed almost genome-wide tran-scription of RNA mostly considered to be non-protein-coding In lower organisms the percentage of thegenome encoding for proteins is higher (629192) butpervasive transcription of non-protein coding regionscan be found in essentially all organisms (93) The vastamount of ncRNA has triggered great interest inelucidating the biological significance of these transcriptsand has led to the identification of numerous regulatorymechanisms controlled by short and long ncRNAInterestingly the majority of transcription occurs aslong ncRNA in humans (94) many of which containORFs making it difficult to unambiguously classify a tran-script as non-protein-coding To distinguish long ncRNA

Nucleic Acids Research 2014 Vol 42 No 6 3633

from mRNA ORF length has served as the mostcommonly applied criterion because even without select-ive pressure short ORFs will occur frequently by chancewhile the probability that long ORFs occur by chance arelow Nevertheless recent findings reveal that eukaryoticgenomes contain a large number of small CDSs (95ndash97)and that small peptides play important biological roles(77789899) In addition numerous long ncRNAs con-taining ORFs longer than 100 aa have been found to as-sociate with ribosomes (4653) While these findingscomplicate the annotation of CDSs and demand for newapproaches to annotate genes they raise the exciting pos-sibility that a large number of unknown small CDSs existsand awaits discoveryFor this study RNA was obtained from a 427 strain

the most common laboratory strain and sequence readswere aligned to the more complete genome of the 927strain nevertheless RNA-sequencing analysis indicateswidespread transcription Thus as seen in other organ-isms active transcription is not necessarily a good pre-dictor of CDSs In contrast we found ribosomefootprint density to be highly enriched across annotatedCDSs excluding introns It should thus serve as a reliabletool to identify novel genes To search for new CDSs andto evaluate their biological function we combinedribosome profiling proteomics and genome-wide RNAidata While proteomics data allowed us to identifyprotein products for only 20 relatively large candidateCDSs the experimental set-up of the proteomic analysisthat involved the removal of small peptides made itimpossible to verify the existence of many small CDSsThus more important than the verification of putativeCDSs with proteomics data may be our finding thatgt200 of the putative CDSs appear to be essential forparasite fitness Furthermore RIT-seq data suggest thatcandidate CDSs are 13-fold less likely to be essentialfor viability in the PF than in the BF or during differen-tiation of cells from BF to PF These findings point to apossible role of small peptides in the adaptation of tryp-anosomes to life in the mammalian host Generally ourdata suggest the existence of a large yet to be exploredsmall proteome but further characterizations of thesmall proteome will have to include a more targetedproteomics analysisIn this analysis we focused on measuring translational

control and the identification of new CDSs Howeverribosome profiling data should also aid in the identifica-tion of bona fide ncRNAs For many annotated CDSs weobserved RNA transcripts but not ribosome footprintssuggesting the presence of ncRNA Though given thatwe have only analysed two life cycle stages theseputative ncRNAs may be translated during other stagesof the parasitersquos life cycle Thus a comprehensive searchfor ncRNA should include ribosome profiling analyses ofall life cycle stages

ACCESSION NUMBERS

All sequencing data have been deposited in the EuropeanNucleotide Archive PRJEB4801

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Stan Gorski Susanne Kramer Christian Janzenand Jose-Juan Lopez-Rubio for valuable discussions andcritical reading of the manuscript and Yanjie Chao andJan Medenbach for much appreciated advice ongenerating ribosome profiling libraries

FUNDING

Young Investigator Program of the Research Center ofInfectious Diseases (ZINF) of the University ofWuerzburg Germany German Research FoundationDFG [SI 16102-1] Human Frontier Science Program(to TNS) French National Research Agency [ANR-2010-GENM-011-01 GENAMIBE to CCH] Fundingfor open access charge German Research Foundation(DFG) and the University of Wuerzburg in the fundingprogramme Open Access Publishing

Conflict of interest statement None declared

REFERENCES

1 MooreMJ (2005) From birth to death the complex lives ofeukaryotic mRNAs Science 309 1514ndash1518

2 NagarajN WisniewskiJR GeigerT CoxJ KircherMKelsoJ PaaboS and MannM (2011) Deep proteome andtranscriptome mapping of a human cancer cell line Mol SystBiol 7 548

3 LuP VogelC WangR YaoX and MarcotteEM (2007)Absolute protein expression profiling estimates the relativecontributions of transcriptional and translational regulationNat Biotechnol 25 117ndash124

4 de GodoyLM OlsenJV CoxJ NielsenML HubnerNCFrohlichF WaltherTC and MannM (2008) Comprehensivemass-spectrometry-based proteome quantification of haploidversus diploid yeast Nature 455 1251ndash1254

5 McNicollF DrummelsmithJ MullerM MadoreE BoilardNOuelletteM and PapadopoulouB (2006) A combinedproteomic and transcriptomic approach to the study ofstage differentiation in Leishmania infantum Proteomics 63567ndash3581

6 LahavT SivamD VolpinH RonenM TsigankovPGreenA HollandN KuzykM BorchersC ZilbersteinDet al (2011) Multiple levels of gene regulation mediatedifferentiation of the intracellular pathogen LeishmaniaFASEB J 25 515ndash525

7 SchwanhausserB BusseD LiN DittmarG SchuchhardtJWolfJ ChenW and SelbachM (2011) Global quantificationof mammalian gene expression control Nature 473 337ndash342

8 FevreEM WissmannBV WelburnSC and LutumbaP (2008)The burden of human African trypanosomiasis PLoS Negl TropDis 2 e333

9 LozanoR NaghaviM ForemanK LimS ShibuyaKAboyansV AbrahamJ AdairT AggarwalR AhnSY et al(2012) Global and regional mortality from 235 causes ofdeath for 20 age groups in 1990 and 2010 a systematic analysisfor the Global Burden of Disease Study 2010 Lancet 3802095ndash2128

10 Martinez-CalvilloS YanS NguyenD FoxM StuartK andMylerPJ (2003) Transcription of Leishmania major Friedlinchromosome 1 initiates in both directions within a single regionMol Cell 11 1291ndash1299

3634 Nucleic Acids Research 2014 Vol 42 No 6

11 Martinez-CalvilloS NguyenD StuartK and MylerPJ (2004)Transcription initiation and termination on Leishmania majorchromosome 3 Eukaryot Cell 3 506ndash517

12 SiegelTN HekstraDR KempLE FigueiredoLMLowellJE FenyoD WangX DewellS and CrossGAM(2009) Four histone variants mark the boundaries of polycistronictranscription units in Trypanosoma brucei Genes Dev 231063ndash1076

13 KolevNG FranklinJB CarmiS ShiH MichaeliS andTschudiC (2010) The transcriptome of the human pathogenTrypanosoma brucei at single-nucleotide resolutionPLoS Pathog 6 e1001090

14 LeBowitzJH SmithHQ RuscheL and BeverleySM (1993)Coupling of poly(A) site selection and trans-splicing inLeishmania Genes Dev 7 996ndash1007

15 UlluE MatthewsKR and TschudiC (1993) Temporal order ofRNA-processing reactions in trypanosomes rapid trans splicingprecedes polyadenylation of newly synthesized tubulin transcriptsMol Cell Biol 13 720ndash725

16 MatthewsKR TschudiC and UlluE (1994) A commonpyrimidine-rich motif governs trans-splicing and polyadenylationof tubulin polycistronic pre-mRNA in trypanosomes Genes Dev8 491ndash501

17 WrightJR SiegelTN and CrossGAM (2010) Histone H3trimethylated at lysine 4 is enriched at probable transcriptionstart sites in Trypanosoma brucei Mol Biochem Parasitol 136434ndash450

18 ClaytonC and ShapiraM (2007) Post-transcriptional regulationof gene expression in trypanosomes and leishmanias MolBiochem Parasitol 156 93ndash101

19 MatthewsKR (2011) Controlling and coordinating developmentin vector-transmitted parasites Science 331 1149ndash1153

20 CrossGA KleinRA and LinsteadDJ (1975) Utilization ofamino acids by Trypanosoma brucei in culture L-threonine as aprecursor for acetate Parasitology 71 311ndash326

21 BrunR and SchonenbergerM (1979) Cultivation and in vitrocloning or procyclic culture forms of Trypanosoma brucei in asemi-defined medium Acta Trop 36 289ndash292

22 FurgerA SchurchN KurathU and RoditiI (1997) Elementsin the 30 untranslated region of procyclin mRNA regulateexpression in insect forms of Trypanosoma brucei bymodulating RNA stability and translation Mol Cell Biol 174372ndash4380

23 HehlA VassellaE BraunR and RoditiI (1994) A conservedstem-loop structure in the 30 untranslated region of procyclinmRNAs regulates expression in Trypanosoma brucei Proc NatlAcad Sci USA 91 370ndash374

24 HotzHR HartmannC HuoberK HugM and ClaytonC(1997) Mechanisms of developmental regulation in Trypanosomabrucei a polypyrimidine tract in the 30-untranslated region of asurface protein mRNA affects RNA abundance and translationNucleic Acids Res 25 3017ndash3026

25 MayhoM FennK CraddyP CrosthwaiteS and MatthewsK(2006) Post-transcriptional control of nuclear-encoded cytochromeoxidase subunits in Trypanosoma brucei evidence for genome-wide conservation of life-cycle stage-specific regulatory elementsNucleic Acids Res 34 5312ndash5324

26 WalradP PaterouA Acosta-SerranoA and MatthewsKR(2009) Differential trypanosome surface coat regulation by aCCCH protein that co-associates with procyclin mRNA cis-elements PLoS Pathog 5 e1000317

27 HelmJR WilsonME and DonelsonJE (2009) Differentialexpression of a protease gene family in African trypanosomesMol Biochem Parasitol 163 8ndash18

28 HornD (2008) Codon usage suggests that translational selectionhas a major impact on protein expression in trypanosomatidsBMC Genomics 9 2

29 ManfulT FaddaA and ClaytonC (2011) The role of the 50-30

exoribonuclease XRNA in transcriptome-wide mRNAdegradation RNA 17 2039ndash2047

30 YoffeY ZuberekJ LewdorowiczM ZeiraZ KeasarCOrr-DahanI Jankowska-AnyszkaM StepinskiJDarzynkiewiczE and ShapiraM (2004) Cap-binding activity ofan eIF4E homolog from Leishmania RNA 10 1764ndash1775

31 YoffeY ZuberekJ LererA LewdorowiczM StepinskiJAltmannM DarzynkiewiczE and ShapiraM (2006) Bindingspecificities and potential roles of isoforms of eukaryotic initiationfactor 4E in Leishmania Eukaryot Cell 5 1969ndash1979

32 De GaudenziJ FraschAC and ClaytonC (2005) RNA-bindingdomain proteins in Kinetoplastids a comparative analysisEukaryot Cell 4 2106ndash2114

33 DhaliaR ReisCR FreireER RochaPO KatzRMunizJR StandartN and de Melo NetoOP (2005)Translation initiation in Leishmania major characterisation ofmultiple eIF4F subunit homologues Mol Biochem Parasitol140 23ndash41

34 ZinovievA and ShapiraM (2012) Evolutionary conservation anddiversification of the translation initiation apparatus intrypanosomatids Comp Funct Genomics 2012 813718

35 SiegelTN HekstraDR WangX DewellS and CrossGAM(2010) Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicingand polyadenylation sites Nucleic Acids Res 38 4946ndash4957

36 NilssonD GunasekeraK ManiJ OsterasM FarinelliLBaerlocherL RoditiI and OchsenreiterT (2010) Spliced leadertrapping reveals widespread alternative splicing patterns in thehighly dynamic transcriptome of Trypanosoma brucei PLoSPathog 6 e1001037

37 SiegelTN GunasekeraK CrossGA and OchsenreiterT(2011) Gene expression in Trypanosoma brucei lessons fromhigh-throughput RNA sequencing Trends Parasitol 27 434ndash441

38 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principles ofits regulation Nat Rev Mol Cell Biol 11 113ndash127

39 KozakM (1984) Selection of initiation sites by eucaryoticribosomes effect of inserting AUG triplets upstream from thecoding sequence for preproinsulin Nucleic Acids Res 123873ndash3893

40 SomersJ PoyryT and WillisAE (2013) A perspective onmammalian upstream open reading frame function Int JBiochem Cell Biol 45 1690ndash1700

41 SiegelTN TanKS and CrossGAM (2005) Systematic studyof sequence motifs for RNA trans splicing in Trypanosomabrucei Mol Cell Biol 25 9586ndash9594

42 AravaY WangY StoreyJD LiuCL BrownPO andHerschlagD (2003) Genome-wide analysis of mRNA translationprofiles in Saccharomyces cerevisiae Proc Natl Acad Sci USA100 3889ndash3894

43 CapewellP MonkS IvensA MacgregorP FennKWalradP BringaudF SmithTK and MatthewsKR (2013)Regulation of trypanosoma brucei total and polysomal mRNAduring development within its mammalian host PLoS One 8e67069

44 BrechtM and ParsonsM (1998) Changes in polysome profilesaccompany trypanosome development Mol Biochem Parasitol97 189ndash198

45 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

46 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

47 OhE BeckerAH SandikciA HuberD ChabaR GlogeFNicholsRJ TypasA GrossCA KramerG et al (2011)Selective ribosome profiling reveals the cotranslational chaperoneaction of trigger factor in vivo Cell 147 1295ndash1308

48 MichelAM and BaranovPV (2013) Ribosome profiling aHi-Def monitor for protein synthesis at the genome-wide scaleWiley Interdiscip Rev RNA 349 4184ndash4188

49 IngoliaNT (2010) Genome-wide translational profiling byribosome footprinting Methods Enzymol 470 119ndash142

50 IngoliaNT BrarGA RouskinS McGeachyAM andWeissmanJS (2012) The ribosome profiling strategy formonitoring translation in vivo by deep sequencing ofribosome-protected mRNA fragments Nat Protoc 7 1534ndash1550

51 LangmeadB and SalzbergSL (2012) Fast gapped-readalignment with Bowtie 2 Nat Methods 9 357ndash359

Nucleic Acids Research 2014 Vol 42 No 6 3635

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 5: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

initiation site However while RNA-sequencing allowedthe unambiguous identification of mis-annotation formany genes translation initiation does not always occurat the first AUG downstream of the SAS Thus the use-fulness of RNA-sequencing to identify the true translationinitiation sites is limited and depends on the genomiccontext

In contrast to RNA-sequencing ribosome profilingallows the determination of the translational landscapeThus it allows the identification of CDS mis-annotationsand the determination of the most probable translationinitiation sites Exemplary for numerous genes forTb92751990 we detected no translation between thefirst and the second AUG (green bars) indicating thatthis region is not translated in the BF and PF and thattranslation initiates at the second AUG (Figure 2Bleft panel) For Tb927102720 we observed strong trans-lation upstream of the annotated CDS The fact thattranslation starts at an upstream AUG which is inframe with the annotated translation initiation sitesuggests that the CDS of Tb927102720 is longer thanannotated (Figure 2B right panel) These examplesshow that ribosome profiling data can be used for there-annotation of incorrectly annotated CDSs

Ribosome profiling reveals extensive translational control

While the role of RNA stability has been investigated on agenome-wide scale in T brucei (29) global regulation atthe levels of protein translation has not been analysedTherefore we decided to apply the ribosome profilingapproach to evaluate the degree of regulation occurringat the level of translation in trypanosomesTo estimate the rate of protein synthesis in trypano-

somes we determined the ribosome footprint density

AUG

28 nt - ribosome footprint

CDS position (nt from start)

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

P s

iteA

site

UAA

CDS position (nt from stop)

P s

iteA

site

ribosome footprint

frame

perc

enta

ge o

f rea

ds

0

20

0

40

60

80

1 2 0 1 2

mRNA

-24 -18 -12 -6 0 6 12 18 24 -24 -18 -12 -6 0 6 12

foot

prin

t (R

PM

)m

RN

A (

RP

M)

mR

NA

(R

PM

)

C

B

A

0

2

4

0

2

4

6

0

2

4mRNA density in PF

ribosome footprint density in PF

mRNA denstiy in BF

foot

prin

t (R

PM

)

0

0

02

04

1

2

3

600000

Tb927102320

Tb927102330

Tb927102340

Tb927102350

Tb927102370

Tb927102360

Chr X 602000 604000 606000 608000

ribosome footprint density in BF

Figure 1 Ribosome footprints reveal coding sequences at sub-codonresolution (A) Ribosome footprints are strongly enriched acrossCDSs mRNA densities and ribosome densities are shown as readsper nucleotide per million reads (RPM) to normalize for differencesin library size (B) Alignment of the 50 nucleotide from ribosome foot-print reads that map close to translation start or translation termin-ation sites Blue boxes mark the approximate size of the ribosomefootprint (C) Percentage of position of sequence reads relative toreading frame

00

10

20

813000Chr III Chr VIII

814000

815000

816000

00010203

0

2

4

490000

492000

494000

00

10

20

0

2

4

0

2

4

00

10

20

615500Chr V Chr X

616000

616500

617000

699000

700000

701000

00

02

04

mR

NA

(R

PM

)

mR

NA

(R

PM

)

foot

prin

t (R

PM

)

foot

prin

t (R

PM

)

mR

NA

(R

PM

)

mR

NA

(R

PM

)

foot

prin

t (R

PM

)

foot

prin

t (R

PM

)

Tb92751990 Tb927102720

Tb92733160 Tb92781510

B

A

Figure 2 Ribosome footprints reveal translated regions (A) Ribosomeprofiles for the two intron-containing genes Introns are represented asa dashed line (B) Ribosome profiles of two possibly mis-annotatedCDSs Black bars mark annotated CDSs Grey bars mark CDSs pre-dicted based on ribosome profiles Green boxes represent ATG codonsand red boxes represent stop codons

Nucleic Acids Research 2014 Vol 42 No 6 3627

across CDSs [measured as reads per million reads per kb(rpkm) see lsquoMaterials andMethodsrsquo section] Of 77 millionPF and 44 million BF ribosome footprint sequence reads28 (PF) and 14 (BF) could be aligned to annotatedCDSs while 70 (PF) and 85 (BF) mapped to knownstructural non-coding RNA (ncRNA) The high percent-age of structural ncRNA in the footprint fraction has beenreported previously (45) and results from the large amountof rRNA in the monosome fraction The lower percentageof CDS reads in the BF sample compared with the PFsample may be due to libraries being generated from thewild-type 427 strain while reads were aligned to the better-annotated genome of the TREU 927 strain In the BFsome of the most highly expressed genes encoding forvariant surface glycoproteins differ in sequence betweenthe 427 and 927 strains and footprints from these geneswere not aligned in this analysisNevertheless for 8072 genes (82 of annotated CDSs)

we were able to detect 10 ribosome footprint reads athreshold that we defined as translation (SupplementaryTable S1) We found the rate of translation to differgreatly among proteins In the PF we observed a11 448-fold difference in ribosome density (3548-fold dif-ference in mRNA levels) between the 1 most highly andthe 1 most weakly translated proteins while in the BFwe observed a 1623-fold difference in ribosome densityand a 262-fold difference in mRNA levels (Figure 3A)These findings suggest translational efficiency to signifi-cantly contribute to the regulation of gene expressionTo determine the translational efficiency for individual

genes we normalized for differences in mRNA abundanceie we divided the ribosome footprint density by the levelsof mRNA abundance Our ribosome profiling dataindicate a relatively unbiased distribution of ribosomedensity across CDSs in the BF and a very minor increasein density towards the 50 end of CDSs in the PF(Supplementary Figure S3) Nevertheless wheneverdetermining the translational efficiency we excludedmRNA and footprint reads that mapped to the first 40 ntof a CDS to avoid possible artifacts from ribosome stallingat translation initiation sites Between the 1 most effi-ciently and the 1 least efficiently translated proteins weobserved in the PF a 117-fold and in the BF a 64-fold rangein translational efficiency ie in the amount of proteinsproduced per transcript (Figure 3B and SupplementaryTable S2) This range in translational efficiency is similarto what has been observed in yeast and roughly 10-foldhigher than what has been described for mice (4546)Interestingly we observed no correlation between RNAabundance and translational efficiency (R2=003) sug-gesting that translational efficiency is regulated independ-ently of RNA stability

Life cycle-specific regulation of translational efficiency

Next we addressed the question if translational efficiencyis regulated in a life cycle-specific manner Ribosomedensity measurements only allow meaningful comparisonsof the rate of protein synthesis when the speed of transla-tion is assumed to remain constant This assumption issupported by measurements in mouse embryonic stem

cells (4664) however the speed of translation may verywell not be identical in two life cycle stages that live atdifferent temperatures In addition our current protocoldoes not allow the absolute quantification of mRNA and

-4 -2 0 2 4 6 8 10 12 140

1000

2000

3000

0

500

1000

1500

-80

-60

-40

-20 0

02

04

0

-6 -3 0 3 6 9 12 150

1000

2000

3000

0

1

2

1459000Chr IX

1461000

1463000

1459000Chr IX

1461000

1463000

0

2

4

0

1

2

0

2

4

translational efficiency (log2)

PF rank in translational efficiencyBF

ran

k in

tran

slat

iona

l effi

cien

cy

log2 rpkm log2 rpkm

num

ber

of g

enes

num

ber

of g

enes

0

500

1000

1500

num

ber

of g

enes

num

ber

of g

enesmRNA

footprintsmRNAfootprints

BF

BF PF

PF

BF mRNA density PF mRNA density

BF footprint density PF footprint density

high

all genes N = 7782

PUF (Pumillio) N = 10

cytochrome oxidases N = 8

alternative oxidase

enzymes requiredfor glycolysis N = 10

A

B

D

C

mR

NA

(R

PM

)fo

otpr

int (

RP

M)

mR

NA

(R

PM

)fo

otpr

int (

RP

M)

Tb092111070 Tb092111070

low

low

-80

-60

-40

-20 0

02

04

0

translational efficiency (log2)

Figure 3 Translational efficiency is regulated (A) Histograms ofmRNA abundance and ribosome density (rate of protein synthesis)for BF (left panel) and PF parasites (right panel) (B) Histogram oftranslational efficiency (ratio of ribosome footprint density to mRNAabundance) for BF (left panel) and PF parasites (right panel) (C) Pair-wise comparisons of translational efficiency in PF and BF CDSs wereranked based on translational efficiency (1=highest translational effi-ciency 7782= lowest translational efficiency) in BF and PF Ranksbetween life cycle stages show a correlation of R=07428 Genefamilies with developmentally regulated translational efficiencies arecolour coded Pumillio genes (red) cytochromes oxidase (blue) genesrequired for glycolysis (63) (green) and the alternative oxidase(Tb927107090 black) (D) Ribosome footprint profile of a gene withdevelopmentally regulated translational efficiency Green arrow indi-cates direction of transcription

3628 Nucleic Acids Research 2014 Vol 42 No 6

ribosome footprint levels in the two different life cyclestages Therefore we decided to compare the overallchanges in translational efficiency within each life cyclestage To this end we ranked the genes in each life cyclestage based on their relative translational efficiency andcompared their rank between the two life cycle stagesA pair-wise comparison indicated a generally positive cor-relation between translational efficiency in the two stages(Figure 3C Pearsonrsquos correlation coefficient=07428Plt 00001 95 CI 07327ndash07526) but it also revealeddistinct life cycle-specific differences for subsets of genesFor example 58 genes found in the lowest 25 in terms oftranslational efficiency in the PF were among the top 25most efficiently translated proteins in the BF Thesecontain various RNA binding proteins numerous phos-phatases and expression site-associated genes (Table 1 andSupplementary Table S3) For an example of a genemore efficiently translated in the BF than in the PF seeFigure 3D

Based on mRNA and protein level measurements differ-ential translational efficiency has been predicted foraconitase the cytochrome oxidases V and VI and procyclins(222565) For all of these genes our data indicate anincreased translational efficiency in the PF (Table 1) thusconfirming previous observations and validating our meas-urements For selected groups of proteins eg annotated ascytochrome oxidase or those required for glycolysis wenoticed the translational efficiency to correspond to the

changes that have been described for the glucosemetabolism(Figure 3C) Whereas the BF is entirely dependent on theglycolytic pathway to generate energy the PF living in theinsect midgut where glucose availability is uncommon orabsent has been shown to contain a much more elaborateenergy metabolism (66) In contrast the efficient translationof the group of PUF (Pumilio and FBF) proteins in PF wasunexpected (Figure 3C) PUF proteins regulate translationand mRNA stability by binding sequences in their targetRNAs and have been shown to affect gene expression inorganisms as divergent as Plasmodium falciparumT brucei humans and yeast (67ndash69) Interestingly wefound both poly(A) binding proteins all four isoforms ofeIF4Es and four of the five eIF4G isoforms all proteinsinvolved in control of translation initiation to be more effi-ciently translated in the PF than in the BF (SupplementaryTable S2)Given the large dynamic range and the life cycle-specific

regulation we propose that differences in translationalefficiency contribute substantially to the control of geneexpression in T brucei

uORFs are frequently translated and may regulatetranslational efficiency

The broad 100-fold range in translational efficiencyamong genes and the differences in translationalefficiencies between life cycle stages raise the question of

Table 1 Developmentally regulated translation

Gene ID Description BF rank PF rank Change inrank (PFndashBF)

Translation up-regulated in BFTb92773250 Expression site-associated gene 6 (ESAG6) protein putative 1372 7405 6033Tb92743980 Chaperone protein DNAj putative 1301 7248 5947Tb927104780 GPI inositol deacylase precursor (GPIdeAc) 390 5994 5604Tb92788140 Small GTP-binding rab protein putative 1217 6804 5587Tb92763480 RNA-binding protein putative (DRBD5) 1856 7262 5406Tb11014701 Membrane-bound acid phosphatase 1 precursor (MBAP1) 689 6065 5376Tb92714650 Cyclin-like F-box protein (CFB2) 10 5289 5279Tb92735660 UDP-Gal or UDP-GlcNAc-dependent glycosyltransferase putative 797 6041 5244Tb92726000 Glycosylphosphatidylinositol-specific phospholipase C (GPI-PLC) 1634 6793 5159Tb92745310 Serinethreonine-protein kinase a putative 1656 6776 5120

Translation up-regulated in PFTb9275440 trans-Sialidase putative 6785 418 6367Tb92776850 trans-Sialidase (TbTS) 7513 1445 6068Tb92777470 Receptor-type adenylate cyclase GRESAG 4 putative 7001 1268 5733Tb92712120 Calpain putative 6258 739 5519Tb9274360 12-Dihydroxy-3-keto-5-methylthiopentene dioxygenase putative 6587 1089 5498Tb091605550 Calpain-like cysteine peptidase putative 6936 1868 5068Tb92787690 Amino acid transporter (pseudogene) putative 6693 1731 4962Tb92781610 MSP-B putative 7199 2272 4927Tb11016650 Serinethreonine-protein kinase putative 5280 363 4917Tb92777110 Leucine-rich repeat protein (LRRP) putative 6501 1601 4900

Genes for which translational up-regulation in PF was previously shownTb92710280 Cytochrome oxidase subunit VI (COXVI) 2820 810 2010Tb091601820 Cytochrome oxidase subunit V (COXV) 935 291 644Tb9271014000 Aconitase (ACO) 657 31 626Tb9276510 GPEET2 procyclin precursor 3848 365 3483Tb9271010260 EP1 procyclin (EP1) 4682 34 4648Tb9271010220 Procyclin-associated gene 2 (PAG2) protein (PAG2) 6903 5375 1528Tb9275330 Receptor-type adenylate cyclase GRESAG 4 putative 3779 809 2970

List of genes with highest change in translational efficiency between PF and BF List does not include hypothetical genes

Nucleic Acids Research 2014 Vol 42 No 6 3629

how protein translation is regulated For a small numberof genes in T brucei sequence motifs in the 30 UTR havebeen linked to developmentally regulated changes in trans-lational efficiency but for the large majority of genes nosuch motifs have been identified (70) Numerous reportsfrom other eukaryotes demonstrated that uORFs canfunction as important regulators of gene expression (71)For example analysis of mRNA and protein levels fromgt10 000 mammalian genes revealed that the presence ofan uORF correlates with a significant reduction of expres-sion from the downstream CDS Furthermore usingreporter constructs it was estimated that the presence ofan uORF results in a decrease in gene expression of 30ndash80 with only a minor reduction of mRNA levels (72)No such analyses have been performed for T brucei but ithas been shown that the removal of a uATG leads to a 7-fold increase in protein levels of a luciferase reporter (41)In this manuscript we use the term ORF or uORF to referto regions of DNA sequences beginning with an ATG andending with a termination codon that may or may notencode for proteins The term coding sequence (CDS) isused to refer to ORFs that encode proteins MinimallyuORFs consist of 9 nt containing an upstream start codon(uATG) an additional sense codon and a terminationcodon whereby the termination codon may be locateddownstream of the start codon of the main CDS (73)Even though 50 UTRs have only been assigned for 4909

genes (927 version 42) applying the criteria listed abovewe identified 8310 uORFs and found 1092 (22) of 50

UTRs to contain at least one uORF (SupplementaryTable S4) Thus uORF are much less common inT brucei than in mammals where 40ndash50 of genescontain uORFs (7475) Nevertheless while uORFscannot explain the translational regulation for all tran-scripts they may represent one of several factors import-ant for the regulation of gene expression in T bruceiTo determine the degree to what uORFs are translated

a prerequisite for affecting translational efficiency of thedownstream CDS we calculated the ribosome densityacross uORFs and identified 1834 uORFs (22) with 2 read-coverage and 70 of the ORF coveredThe average length of these uORFs was 22 aa Just likefor annotated CDSs the alignment of ribosome profilingreads across these uORFs revealed a distinct 3-nt period-icity with 63 of BF ribosome profiling reads starting atthe first nucleotide of a codon (Figure 4A and B) Whilethe periodicity of ribosome footprints across uORFs wasslightly less pronounced than for annotated CDS ourdata suggest that a large proportion of uORFs istranslatedTo evaluate the regulatory potential of uORFs we

compared the ribosome densities between transcripts con-taining at least one uORF (N=1092) and those withoutuORFs (N=3817) In the BF median ribosome densityacross the CDSs of transcripts with uORFs was3031 rpkm compared with 4486 rpkm for genes withoutuORFs (Table 2) In the PF median ribosome densitieswere 2001 rpkm (genes with uORF) and 3650 rpkm(genes without uORF) Thus in both life cycle stages weobserved a higher ribosome density for genes withoutuORFs than for genes with uORFs (Plt 00001) While

we also observed a higher mRNA level for geneswithout uORFs compared with genes with uORFs(Plt 00001) the differences in mRNA levels were lowerthan the differences in ribosome density (Table 2) Thusour data indicate that median translational efficiencyis higher for genes without uORF than for genes withuORF suggesting that the presence of uORFs may nega-tively impact translation of downstream CDSs

Noteworthy while median ribosome density for CDSswas higher in the BF than in the PF (Table 2) we observedthe opposite for 50 UTRs (Plt 00001) For 50 UTRs wefound a higher ribosome density in the PF (median rpkm773) than in the BF (median rpkm 289 SupplementaryTable S4 for an example see Figure 4C) In addition wefound that in the BF 50 UTRs with uORF (408 rpkm)have a higher ribosome density than 50 UTRs withoutuORFs (254 rpkm Plt 00001) Unexpectedly thecontrary was true in PF In the PF we found 50 UTRswithout uORF (834 rpkm) to have a higher ribosomedensity than 50 UTRs with uORF (501 rpkmPlt 00001) For an example of a non-AUG uORF seeFigure 4C right panel Many of such non-AUG uORFshave been observed in yeast and mice embryonic stem cells(mESCs) Interestingly an increase in 50 UTR translationsimilar to what we observed in the PF compared with theBF was observed in yeast on starvation and a decreasein 50 UTR translation was observed on differentiation ofpluripotent mESCs into embryoid bodies (4546)

Genome-wide analyses to prove or disprove a directcorrelation between translation of uORFs and

00

04

08

411000Chr I

411500 412000

00

02

04

00

04

08

Chr I420000 420200 420400

0

2

4

foot

prin

t (R

PM

)fo

otpr

int (

RP

M)

BF BF

PF PF

Tb92711610 Tb92711670

-24 -18 -12 -6 0 6 12

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

uORF

uORF

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

A B

C

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

-24 -18 -12 -6 0 6

Figure 4 Ribosome footprints reveal translation of uORFs(A) Alignment of the 50 nucleotides from ribosome footprint readsthat map close to translation start or translation termination sites ofuORFs (B) Percentage of position of sequence reads relative to readingframe (C) Ribosome profiles of two genes with uORF (left panel) andwithout uORF (right panel) Narrow grey boxes represent 50 UTRsgreen box represents AUG-codon red box represents terminationcodon

3630 Nucleic Acids Research 2014 Vol 42 No 6

translational efficiency is complicated by the existence ofmultiple 50 UTR isoforms resulting from widespreaddifferential trans-splicing (353676) At the same timethe heterogeneity in UTR lengths raises the intriguingpossibility that inclusion or exclusion of an uORFduring alternative trans-splicing may represent a regula-tory mechanism to modulate translational efficiency

Ribosome footprints reveal the presence of hundreds ofpreviously un-annotated CDSs

Even though small proteins (lt200 aa) have been shown toplay major roles in plant and animal development (7778)the complexity of the short proteome remains largelyunexplored because algorithms to reliably predict shortCDSs are lacking (79) We observed that while almostall ribosome footprints aligned to annotated features inthe BF 101 197 reads (023) and in the PF 1 116 030reads (144) did not It is noteworthy that just as weobserved for 50 UTRs the percentage of reads not aligningto CDSs is higher in the PF than in the BF We suspectthat many of these reads originate from un-annotatedUTRs but some may stem from previously un-annotatedCDSs A previous RNA-sequencing-based analysisdetected 1114 new un-annotated transcripts in the PF1011 of which had the potential to encode one or morepeptides ( 25 aa) Using available proteomics data theauthors were able to confirm translation for 19 of the 1114transcripts (13) Given our ability to determine ribosomepositions at nucleotide resolution we looked for new pre-viously un-annotated CDSs

We mapped ribosome footprint reads from the BF andPF and determined the ribosome footprint density for allpotential CDSs at least 10 aa in length To avoid overlapwith annotated CDSs we only considered ORFs locatedat least 20 nt away from annotated features Thisapproach enabled us to identify a set of 2021 candidateCDSs with 2 read-coverage and 70 of the ORFcovered The candidate CDSs ranged in length from 10to 378 aa (average 30 aa Supplementary Table S5) For797 of the 1114 previously identified transcripts we foundthe average ribosome footprint read-coverage to be 2(Supplementary Table S6 Previously identified transcriptsfrom chromosome 10 were not considered because oursequencing reads were mapped to a newer version of the

chromosome 10 assembly and thus genomic coordinatesmight be incompatible)To determine whether our set of candidate CDSs is

translated we analysed published proteomics datasets(57) to search for protein products For 24 of the 2021candidate CDSs (average size 117 aa 13 kDa) proteinproducts could be identified however four of theseidentified peptides also matched annotated CDSs Inaddition we identified protein products for 31 uORFs(Supplementary Table S4) We suspect that the lownumber of identified protein products relative to thelarge number of candidate CDSs may be caused (i) bythe experimental set-up of the original proteomicsanalysis where proteins were separated by 1D-SDS-PAGE prior to in-gel protease digest and mass spectro-metric analysis whereby proteins with a size smaller than5ndash10 kDa typically run out of the gel (ii) by the limitedsensitivity of mass spectrometric analyses compared withDNA sequencing-based techniques and (iii) by the factthat not all ribosome-protected RNAs are translatedThe first point is supported by the fact that only 67

(3) of our candidate CDSs were 90 aa (10 kDa) butfor 28 of those large candidate CDSs we could identifypeptides Evidence supporting the possibility that not allribosome-protected RNAs are translated exists in micewhere even known long ncRNAs have been found to beassociated with ribosomes (46) To evaluate the codingpotential of a transcript and to separate long ncRNAfrom small coding genes a new metric was establishedthe so-called RRS (53) The RRS takes advantage of thefact that translating ribosomes are released on encounter-ing a stop codon This release results in a sharp decrease inribosome occupancy between protein-coding regions andthe subsequent 30 UTR Consequently the RRS wasdefined as the ratio of footprint reads in the putativeCDS to footprint reads in the corresponding 30 UTRdivided by the ratio of RNA reads in the putative CDSto RNA reads in the corresponding 30 UTR (seelsquoMaterials and Methodsrsquo section)Similarly to annotated CDSs and uORFs our averaged

ribosome footprint data indicated a well-defined 3-nt peri-odicity for sequence reads occurring near translationinitiation sites However we observed a less well-defineddrop in ribosome densities across translation terminationsites (Figure 5A and B) which may indicate a lack of

Table 2 Translational efficiency of genes with and without uORF

Transcripts withuORF (N=3817)

Transcripts withoutuORF (N=1092)

MannndashWhitney test

Bloodstream formRibosome density (median rpkm) 3031 4466 Plt 00001mRNA levels (median rpkm) 3083 3613 Plt 00001Translational efficiency (ribosome densitymRNA levels) 100 127 Plt 00001

Procyclic formRibosome density (median rpkm) 2001 3650 Plt 00001mRNA levels (median rpkm) 2404 3038 Plt 00001Translational efficiency (ribosome densitymRNA levels) 087 126 Plt 00001

Reads mapping within the first 40 nt of a CDS were not considered for the measurements of translational efficiency Number of genes with annotated50 UTR N=4909

Nucleic Acids Research 2014 Vol 42 No 6 3631

productive translation Because we observed an abruptdrop in footprint density downstream of the stop codonof annotated genes in T brucei (Figure 1A and B) wedecided to apply the same parameters to evaluate the like-lihood that ORFs with high footprint density are indeedbeing translated into functional proteins Again candidateCDSs were defined as ORFs with a minimum length of10 aa and an average ribosome footprint coverage of 2over at least 70 of the ORF To define the correspond-ing putative 30 UTR we applied the same parameters usedin mice (53) ie the 30 UTR was defined as the regionbeginning immediately downstream of the ORF andending at the first subsequent start codon (in anyreading frame) A limitation of the RRS is that it canonly be calculated for an ORF if at least one entire foot-print read and one RNA read map to the putative CDSand the corresponding 30 UTR Because many 30 UTRswere too short to fulfill this criterion or simply did notcontain any footprint reads a sign for efficient translationtermination we were able to determine the RRS for only

1445 of genes annotated in T brucei (SupplementaryTable S7) For these 1445 genes the median RRS was839 compared with an RRS of 019 for the subset ofgenes annotated as hypothetical unlikely (N=90) Thussimilar to the observation in mice the RRS appears to bea good predictor of productive translation Next wedetermined the RRS for the previously un-annotated can-didate CDSs that fulfilled the criteria for RRS calculation(N=265) The median RRS of these candidate CDSs waswith 399 lower than the median RRS of the annotatedgenes (839) Thirteen candidate CDSs had an RRS largerthan 839 and 98 candidate CDSs had an RRS higher thanthe well-described intron-containing poly(A) polymerase(RRS=963) The RRS could only be calculated forone of the candidate CDSs for which we had identifiedprotein products (RRS=5744)

Finally in an attempt to learn about the function of theputative CDSs we searched for known protein motifsin those candidate CDSs for which we had identifiedprotein products and for those with an RRS above 10However except for one candidate CDS located withinthe histone H2B gene array and which contained anH2B signature motif (Supplementary Table S5) nomotifs were identified

Taken together proteomics data and a well-definedribosome release (high RRS) suggest extensive translationbeyond the annotated CDSs in T brucei

Newly identified CDSs are important for parasite fitness

The combination of ribosome profiling data with previ-ously published mass spectrometric data allowed us toidentify a large number of new CDSs Nonetheless the bio-logical significance of theseCDSs remains to be determinedIn addition while ORFs with a high ribosome footprintdensity a low RRS and without mass spectrometricevidence may not be translated into functional proteinsthey may nevertheless play important regulatory roles

Therefore we decided to evaluate the importance of ourcandidate CDSs independent of RRS and mass spectro-metric evidence in parasite survival A previously pub-lished high-throughput phenotyping approach termedRIT-seq measured the fitnessndashcost associations usingRNA interference (RNAi) The RIT-seq data revealed asignificant loss in fitness on RNAi induction of 2724 CDSsin the BF 1972 CDSs in the PF and 2677 CDSs in inducedand differentiated parasites (54) The approach is based onthe integration of RNAi libraries into trypanosomes and acomparison of the recovery of RNAi targets from tryp-anosome populations before and after RNAi inductionThe RNAi library was generated using genomic DNAbut the effect on parasite fitness was only determined forannotated genes Re-analysing the same datasets we finda significant loss of fitness on RNAi induction for 214putative CDSs in the BF (6 days after RNAi induction)16 CDSs in the PF and 227 CDSs on differentiation(Supplementary Table S8) For examples of candidateCDSs potentially essential for viability in the BF seeFigure 5C It is important to note that the RNAi libraryused for the RIT-seq experiments contains a median insertsize of 600 bp (average 1 kb) thus somewhat limiting the

01234

0

4

8

0005101520

02468

05

101520

02468

Chr III

622400

622000

622800

0

4

8

Chr X

2577800

2578200

01234

mR

NA

fo

otpr

int

RN

Ai r

eads

R

NA

i rea

ds

A

C

B

putative CDS putative CDS

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

-24 -18 -12 -6 0 6 12 -24 -18 -12 -6 0 6

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

putative CDS

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

(RPM)

Figure 5 Ribosome footprints reveal previously un-annotated CDSs(A) Alignment of the 50 nucleotides from ribosome footprint reads thatmap close to translation start or translation termination sites of candi-date CDSs (B) Percentage of position of sequence reads relative toreading frame (C) Ribosome mRNA and RIT-seq (RNAi targetsequencing) profiles of two previously un-annotated putative CDSs

3632 Nucleic Acids Research 2014 Vol 42 No 6

resolution of the RIT-seq data (8081) Nevertheless thesefindings suggest an important biological role for a subsetof genes that have been missed in previous genome anno-tations Intriguingly the RNAi data reveal large develop-mental differences in the fitness costs associated with thedown-regulation of putative CDSs suggesting that the setof candidate CDSs may be more important during themammalian stage

DISCUSSION

In this study we report the first genome-wide analysis ofprotein synthesis and a strand-specific analysis of RNAtranscript levels for T brucei It is the first such analysisfor a eukaryotic pathogen and an organism without tran-scriptional control Our data reveal large differences intranslational efficiency among transcripts in the same lifecycle stage and between the same transcript in differentstages In addition the sequencing of ribosome-protectedRNA enabled us to identify transcripts that are likely tobe translated Hundreds of putative previously unidenti-fied CDSs appear to be essential for parasite fitness andthe analysis of available proteomics data confirmed theexistence of at least 20 of these previously unknown CDSs

Eukaryotic gene expression is regulated at multiplelevels with translation of mRNA into proteins beingone of the most important (17) The ability to determineboth translatome and transcriptome enabled us toevaluate the efficiency with which individual transcriptsare translated into proteins and suggests translationalcontrol to be an important regulatory mechanism inT brucei Under a single condition we observed transla-tional efficiency to vary over two orders of magnitudethus its importance equals that of RNA stability forwhich a similar range has been measured (82) Howeverwe observed no correlation between RNA abundance andtranslational efficiency suggesting that translational effi-ciency is regulated independently of RNA stability

To use ribosome density to estimate the rate of proteinsynthesis the speed of translation must be constant Thisassumption is supported by measurements in mouse em-bryonic stem cells that demonstrate the rate of translationto be consistent between different classes of mRNA thekinetics of elongation to be independent of length andprotein abundance and the speed of translation to be in-dependent of codon usage (4664) Nevertheless the speedof translation may very well be different between the twolife cycle stages of the parasite that live at 37C and 27CFurthermore our approach does not permit measurementof the absolute mRNA and footprint abundance for theindividual life cycle stages making direct comparisons oftranslation efficiencies for individual transcripts unreli-able Therefore we did not compare translational effi-ciency directly but rather determined the lsquorankrsquo intranslational efficiency for all genes in each life cyclestage assuming that a lsquochange in rankrsquo indicated lifecycle-dependent translational control While we see ageneral positive correlation in translational efficiencybetween the two life cycle stages (Pearson r=07428)for a large number of genes we observed developmental

regulation of translation Importantly the regulation weobserve agrees well with that reported in previous studiesfor a small subset of genes but also includes many proteinspreviously unknown to be developmentally regulated Inaddition the efficient translation of proteins annotated ascytochrome oxidases in the PF and of enzymes importantfor glycolysis in the BF is consistent with differences inenergy metabolism with the BF entirely dependent on gly-colysis to generate energyHow translational control is achieved in T brucei

remains to be seen but our analysis suggests that uORFmay be an important contributor Work in several organ-isms has shown that generally the presence of uORFs cor-relates with reduced gene expression (71) However forsome genes like the transcription factor GCN4 in yeastand the activating transcription factor ATF4 in mammalsthis correlation is reversed on stress (8384) Thus uORFcan exert positive and negative effects on translation ofdownstream CDSs Based on previous 50 UTR annota-tions we identified uORFs in 22 of T brucei genesbut it will be interesting to learn if the regulatory effectsof uORFs may be supplemented by translation from non-AUG uORFs Our ribosome profiling data indicate anincrease in translation from 50 UTRs and a more ubiqui-tous translation initiation from non-AUG sites in thePF compared with the BF Similarly an increase in thetranslation from 50 UTRs has been observed in yeast onstarvation (45) One of the most highly conserved stress-induced mechanisms to regulate translation involves theinactivation of the translation initiation factor eIF2aby phosphorylation at a conserved serine (8586)Phosphorylation of the T brucei and L major orthologsoccurs at the corresponding Thr residues (687) but thegeneration of a T brucei cell line exclusively expressing amutant form of eIF2a demonstrated that phosphorylationof eIF2a is not required for heat-induced translationalarrest or for the formation of heat shock stress granulesin T brucei (88) Given the finding that eIF2a may also beimportant in determining the stringency in initiator codonselection (89) it will be interesting to see whether eIF2aphosphorylation affects the selection of initiator codonsand whether this plays a role in the fine-tuning of geneexpressionBesides being a valuable tool to study translational

regulation ribosome profiling will also be invaluable forthe discovery of small peptides While only 2 of thehuman genome contains annotated CDSs (90) transcrip-tome analyses have revealed almost genome-wide tran-scription of RNA mostly considered to be non-protein-coding In lower organisms the percentage of thegenome encoding for proteins is higher (629192) butpervasive transcription of non-protein coding regionscan be found in essentially all organisms (93) The vastamount of ncRNA has triggered great interest inelucidating the biological significance of these transcriptsand has led to the identification of numerous regulatorymechanisms controlled by short and long ncRNAInterestingly the majority of transcription occurs aslong ncRNA in humans (94) many of which containORFs making it difficult to unambiguously classify a tran-script as non-protein-coding To distinguish long ncRNA

Nucleic Acids Research 2014 Vol 42 No 6 3633

from mRNA ORF length has served as the mostcommonly applied criterion because even without select-ive pressure short ORFs will occur frequently by chancewhile the probability that long ORFs occur by chance arelow Nevertheless recent findings reveal that eukaryoticgenomes contain a large number of small CDSs (95ndash97)and that small peptides play important biological roles(77789899) In addition numerous long ncRNAs con-taining ORFs longer than 100 aa have been found to as-sociate with ribosomes (4653) While these findingscomplicate the annotation of CDSs and demand for newapproaches to annotate genes they raise the exciting pos-sibility that a large number of unknown small CDSs existsand awaits discoveryFor this study RNA was obtained from a 427 strain

the most common laboratory strain and sequence readswere aligned to the more complete genome of the 927strain nevertheless RNA-sequencing analysis indicateswidespread transcription Thus as seen in other organ-isms active transcription is not necessarily a good pre-dictor of CDSs In contrast we found ribosomefootprint density to be highly enriched across annotatedCDSs excluding introns It should thus serve as a reliabletool to identify novel genes To search for new CDSs andto evaluate their biological function we combinedribosome profiling proteomics and genome-wide RNAidata While proteomics data allowed us to identifyprotein products for only 20 relatively large candidateCDSs the experimental set-up of the proteomic analysisthat involved the removal of small peptides made itimpossible to verify the existence of many small CDSsThus more important than the verification of putativeCDSs with proteomics data may be our finding thatgt200 of the putative CDSs appear to be essential forparasite fitness Furthermore RIT-seq data suggest thatcandidate CDSs are 13-fold less likely to be essentialfor viability in the PF than in the BF or during differen-tiation of cells from BF to PF These findings point to apossible role of small peptides in the adaptation of tryp-anosomes to life in the mammalian host Generally ourdata suggest the existence of a large yet to be exploredsmall proteome but further characterizations of thesmall proteome will have to include a more targetedproteomics analysisIn this analysis we focused on measuring translational

control and the identification of new CDSs Howeverribosome profiling data should also aid in the identifica-tion of bona fide ncRNAs For many annotated CDSs weobserved RNA transcripts but not ribosome footprintssuggesting the presence of ncRNA Though given thatwe have only analysed two life cycle stages theseputative ncRNAs may be translated during other stagesof the parasitersquos life cycle Thus a comprehensive searchfor ncRNA should include ribosome profiling analyses ofall life cycle stages

ACCESSION NUMBERS

All sequencing data have been deposited in the EuropeanNucleotide Archive PRJEB4801

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Stan Gorski Susanne Kramer Christian Janzenand Jose-Juan Lopez-Rubio for valuable discussions andcritical reading of the manuscript and Yanjie Chao andJan Medenbach for much appreciated advice ongenerating ribosome profiling libraries

FUNDING

Young Investigator Program of the Research Center ofInfectious Diseases (ZINF) of the University ofWuerzburg Germany German Research FoundationDFG [SI 16102-1] Human Frontier Science Program(to TNS) French National Research Agency [ANR-2010-GENM-011-01 GENAMIBE to CCH] Fundingfor open access charge German Research Foundation(DFG) and the University of Wuerzburg in the fundingprogramme Open Access Publishing

Conflict of interest statement None declared

REFERENCES

1 MooreMJ (2005) From birth to death the complex lives ofeukaryotic mRNAs Science 309 1514ndash1518

2 NagarajN WisniewskiJR GeigerT CoxJ KircherMKelsoJ PaaboS and MannM (2011) Deep proteome andtranscriptome mapping of a human cancer cell line Mol SystBiol 7 548

3 LuP VogelC WangR YaoX and MarcotteEM (2007)Absolute protein expression profiling estimates the relativecontributions of transcriptional and translational regulationNat Biotechnol 25 117ndash124

4 de GodoyLM OlsenJV CoxJ NielsenML HubnerNCFrohlichF WaltherTC and MannM (2008) Comprehensivemass-spectrometry-based proteome quantification of haploidversus diploid yeast Nature 455 1251ndash1254

5 McNicollF DrummelsmithJ MullerM MadoreE BoilardNOuelletteM and PapadopoulouB (2006) A combinedproteomic and transcriptomic approach to the study ofstage differentiation in Leishmania infantum Proteomics 63567ndash3581

6 LahavT SivamD VolpinH RonenM TsigankovPGreenA HollandN KuzykM BorchersC ZilbersteinDet al (2011) Multiple levels of gene regulation mediatedifferentiation of the intracellular pathogen LeishmaniaFASEB J 25 515ndash525

7 SchwanhausserB BusseD LiN DittmarG SchuchhardtJWolfJ ChenW and SelbachM (2011) Global quantificationof mammalian gene expression control Nature 473 337ndash342

8 FevreEM WissmannBV WelburnSC and LutumbaP (2008)The burden of human African trypanosomiasis PLoS Negl TropDis 2 e333

9 LozanoR NaghaviM ForemanK LimS ShibuyaKAboyansV AbrahamJ AdairT AggarwalR AhnSY et al(2012) Global and regional mortality from 235 causes ofdeath for 20 age groups in 1990 and 2010 a systematic analysisfor the Global Burden of Disease Study 2010 Lancet 3802095ndash2128

10 Martinez-CalvilloS YanS NguyenD FoxM StuartK andMylerPJ (2003) Transcription of Leishmania major Friedlinchromosome 1 initiates in both directions within a single regionMol Cell 11 1291ndash1299

3634 Nucleic Acids Research 2014 Vol 42 No 6

11 Martinez-CalvilloS NguyenD StuartK and MylerPJ (2004)Transcription initiation and termination on Leishmania majorchromosome 3 Eukaryot Cell 3 506ndash517

12 SiegelTN HekstraDR KempLE FigueiredoLMLowellJE FenyoD WangX DewellS and CrossGAM(2009) Four histone variants mark the boundaries of polycistronictranscription units in Trypanosoma brucei Genes Dev 231063ndash1076

13 KolevNG FranklinJB CarmiS ShiH MichaeliS andTschudiC (2010) The transcriptome of the human pathogenTrypanosoma brucei at single-nucleotide resolutionPLoS Pathog 6 e1001090

14 LeBowitzJH SmithHQ RuscheL and BeverleySM (1993)Coupling of poly(A) site selection and trans-splicing inLeishmania Genes Dev 7 996ndash1007

15 UlluE MatthewsKR and TschudiC (1993) Temporal order ofRNA-processing reactions in trypanosomes rapid trans splicingprecedes polyadenylation of newly synthesized tubulin transcriptsMol Cell Biol 13 720ndash725

16 MatthewsKR TschudiC and UlluE (1994) A commonpyrimidine-rich motif governs trans-splicing and polyadenylationof tubulin polycistronic pre-mRNA in trypanosomes Genes Dev8 491ndash501

17 WrightJR SiegelTN and CrossGAM (2010) Histone H3trimethylated at lysine 4 is enriched at probable transcriptionstart sites in Trypanosoma brucei Mol Biochem Parasitol 136434ndash450

18 ClaytonC and ShapiraM (2007) Post-transcriptional regulationof gene expression in trypanosomes and leishmanias MolBiochem Parasitol 156 93ndash101

19 MatthewsKR (2011) Controlling and coordinating developmentin vector-transmitted parasites Science 331 1149ndash1153

20 CrossGA KleinRA and LinsteadDJ (1975) Utilization ofamino acids by Trypanosoma brucei in culture L-threonine as aprecursor for acetate Parasitology 71 311ndash326

21 BrunR and SchonenbergerM (1979) Cultivation and in vitrocloning or procyclic culture forms of Trypanosoma brucei in asemi-defined medium Acta Trop 36 289ndash292

22 FurgerA SchurchN KurathU and RoditiI (1997) Elementsin the 30 untranslated region of procyclin mRNA regulateexpression in insect forms of Trypanosoma brucei bymodulating RNA stability and translation Mol Cell Biol 174372ndash4380

23 HehlA VassellaE BraunR and RoditiI (1994) A conservedstem-loop structure in the 30 untranslated region of procyclinmRNAs regulates expression in Trypanosoma brucei Proc NatlAcad Sci USA 91 370ndash374

24 HotzHR HartmannC HuoberK HugM and ClaytonC(1997) Mechanisms of developmental regulation in Trypanosomabrucei a polypyrimidine tract in the 30-untranslated region of asurface protein mRNA affects RNA abundance and translationNucleic Acids Res 25 3017ndash3026

25 MayhoM FennK CraddyP CrosthwaiteS and MatthewsK(2006) Post-transcriptional control of nuclear-encoded cytochromeoxidase subunits in Trypanosoma brucei evidence for genome-wide conservation of life-cycle stage-specific regulatory elementsNucleic Acids Res 34 5312ndash5324

26 WalradP PaterouA Acosta-SerranoA and MatthewsKR(2009) Differential trypanosome surface coat regulation by aCCCH protein that co-associates with procyclin mRNA cis-elements PLoS Pathog 5 e1000317

27 HelmJR WilsonME and DonelsonJE (2009) Differentialexpression of a protease gene family in African trypanosomesMol Biochem Parasitol 163 8ndash18

28 HornD (2008) Codon usage suggests that translational selectionhas a major impact on protein expression in trypanosomatidsBMC Genomics 9 2

29 ManfulT FaddaA and ClaytonC (2011) The role of the 50-30

exoribonuclease XRNA in transcriptome-wide mRNAdegradation RNA 17 2039ndash2047

30 YoffeY ZuberekJ LewdorowiczM ZeiraZ KeasarCOrr-DahanI Jankowska-AnyszkaM StepinskiJDarzynkiewiczE and ShapiraM (2004) Cap-binding activity ofan eIF4E homolog from Leishmania RNA 10 1764ndash1775

31 YoffeY ZuberekJ LererA LewdorowiczM StepinskiJAltmannM DarzynkiewiczE and ShapiraM (2006) Bindingspecificities and potential roles of isoforms of eukaryotic initiationfactor 4E in Leishmania Eukaryot Cell 5 1969ndash1979

32 De GaudenziJ FraschAC and ClaytonC (2005) RNA-bindingdomain proteins in Kinetoplastids a comparative analysisEukaryot Cell 4 2106ndash2114

33 DhaliaR ReisCR FreireER RochaPO KatzRMunizJR StandartN and de Melo NetoOP (2005)Translation initiation in Leishmania major characterisation ofmultiple eIF4F subunit homologues Mol Biochem Parasitol140 23ndash41

34 ZinovievA and ShapiraM (2012) Evolutionary conservation anddiversification of the translation initiation apparatus intrypanosomatids Comp Funct Genomics 2012 813718

35 SiegelTN HekstraDR WangX DewellS and CrossGAM(2010) Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicingand polyadenylation sites Nucleic Acids Res 38 4946ndash4957

36 NilssonD GunasekeraK ManiJ OsterasM FarinelliLBaerlocherL RoditiI and OchsenreiterT (2010) Spliced leadertrapping reveals widespread alternative splicing patterns in thehighly dynamic transcriptome of Trypanosoma brucei PLoSPathog 6 e1001037

37 SiegelTN GunasekeraK CrossGA and OchsenreiterT(2011) Gene expression in Trypanosoma brucei lessons fromhigh-throughput RNA sequencing Trends Parasitol 27 434ndash441

38 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principles ofits regulation Nat Rev Mol Cell Biol 11 113ndash127

39 KozakM (1984) Selection of initiation sites by eucaryoticribosomes effect of inserting AUG triplets upstream from thecoding sequence for preproinsulin Nucleic Acids Res 123873ndash3893

40 SomersJ PoyryT and WillisAE (2013) A perspective onmammalian upstream open reading frame function Int JBiochem Cell Biol 45 1690ndash1700

41 SiegelTN TanKS and CrossGAM (2005) Systematic studyof sequence motifs for RNA trans splicing in Trypanosomabrucei Mol Cell Biol 25 9586ndash9594

42 AravaY WangY StoreyJD LiuCL BrownPO andHerschlagD (2003) Genome-wide analysis of mRNA translationprofiles in Saccharomyces cerevisiae Proc Natl Acad Sci USA100 3889ndash3894

43 CapewellP MonkS IvensA MacgregorP FennKWalradP BringaudF SmithTK and MatthewsKR (2013)Regulation of trypanosoma brucei total and polysomal mRNAduring development within its mammalian host PLoS One 8e67069

44 BrechtM and ParsonsM (1998) Changes in polysome profilesaccompany trypanosome development Mol Biochem Parasitol97 189ndash198

45 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

46 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

47 OhE BeckerAH SandikciA HuberD ChabaR GlogeFNicholsRJ TypasA GrossCA KramerG et al (2011)Selective ribosome profiling reveals the cotranslational chaperoneaction of trigger factor in vivo Cell 147 1295ndash1308

48 MichelAM and BaranovPV (2013) Ribosome profiling aHi-Def monitor for protein synthesis at the genome-wide scaleWiley Interdiscip Rev RNA 349 4184ndash4188

49 IngoliaNT (2010) Genome-wide translational profiling byribosome footprinting Methods Enzymol 470 119ndash142

50 IngoliaNT BrarGA RouskinS McGeachyAM andWeissmanJS (2012) The ribosome profiling strategy formonitoring translation in vivo by deep sequencing ofribosome-protected mRNA fragments Nat Protoc 7 1534ndash1550

51 LangmeadB and SalzbergSL (2012) Fast gapped-readalignment with Bowtie 2 Nat Methods 9 357ndash359

Nucleic Acids Research 2014 Vol 42 No 6 3635

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 6: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

across CDSs [measured as reads per million reads per kb(rpkm) see lsquoMaterials andMethodsrsquo section] Of 77 millionPF and 44 million BF ribosome footprint sequence reads28 (PF) and 14 (BF) could be aligned to annotatedCDSs while 70 (PF) and 85 (BF) mapped to knownstructural non-coding RNA (ncRNA) The high percent-age of structural ncRNA in the footprint fraction has beenreported previously (45) and results from the large amountof rRNA in the monosome fraction The lower percentageof CDS reads in the BF sample compared with the PFsample may be due to libraries being generated from thewild-type 427 strain while reads were aligned to the better-annotated genome of the TREU 927 strain In the BFsome of the most highly expressed genes encoding forvariant surface glycoproteins differ in sequence betweenthe 427 and 927 strains and footprints from these geneswere not aligned in this analysisNevertheless for 8072 genes (82 of annotated CDSs)

we were able to detect 10 ribosome footprint reads athreshold that we defined as translation (SupplementaryTable S1) We found the rate of translation to differgreatly among proteins In the PF we observed a11 448-fold difference in ribosome density (3548-fold dif-ference in mRNA levels) between the 1 most highly andthe 1 most weakly translated proteins while in the BFwe observed a 1623-fold difference in ribosome densityand a 262-fold difference in mRNA levels (Figure 3A)These findings suggest translational efficiency to signifi-cantly contribute to the regulation of gene expressionTo determine the translational efficiency for individual

genes we normalized for differences in mRNA abundanceie we divided the ribosome footprint density by the levelsof mRNA abundance Our ribosome profiling dataindicate a relatively unbiased distribution of ribosomedensity across CDSs in the BF and a very minor increasein density towards the 50 end of CDSs in the PF(Supplementary Figure S3) Nevertheless wheneverdetermining the translational efficiency we excludedmRNA and footprint reads that mapped to the first 40 ntof a CDS to avoid possible artifacts from ribosome stallingat translation initiation sites Between the 1 most effi-ciently and the 1 least efficiently translated proteins weobserved in the PF a 117-fold and in the BF a 64-fold rangein translational efficiency ie in the amount of proteinsproduced per transcript (Figure 3B and SupplementaryTable S2) This range in translational efficiency is similarto what has been observed in yeast and roughly 10-foldhigher than what has been described for mice (4546)Interestingly we observed no correlation between RNAabundance and translational efficiency (R2=003) sug-gesting that translational efficiency is regulated independ-ently of RNA stability

Life cycle-specific regulation of translational efficiency

Next we addressed the question if translational efficiencyis regulated in a life cycle-specific manner Ribosomedensity measurements only allow meaningful comparisonsof the rate of protein synthesis when the speed of transla-tion is assumed to remain constant This assumption issupported by measurements in mouse embryonic stem

cells (4664) however the speed of translation may verywell not be identical in two life cycle stages that live atdifferent temperatures In addition our current protocoldoes not allow the absolute quantification of mRNA and

-4 -2 0 2 4 6 8 10 12 140

1000

2000

3000

0

500

1000

1500

-80

-60

-40

-20 0

02

04

0

-6 -3 0 3 6 9 12 150

1000

2000

3000

0

1

2

1459000Chr IX

1461000

1463000

1459000Chr IX

1461000

1463000

0

2

4

0

1

2

0

2

4

translational efficiency (log2)

PF rank in translational efficiencyBF

ran

k in

tran

slat

iona

l effi

cien

cy

log2 rpkm log2 rpkm

num

ber

of g

enes

num

ber

of g

enes

0

500

1000

1500

num

ber

of g

enes

num

ber

of g

enesmRNA

footprintsmRNAfootprints

BF

BF PF

PF

BF mRNA density PF mRNA density

BF footprint density PF footprint density

high

all genes N = 7782

PUF (Pumillio) N = 10

cytochrome oxidases N = 8

alternative oxidase

enzymes requiredfor glycolysis N = 10

A

B

D

C

mR

NA

(R

PM

)fo

otpr

int (

RP

M)

mR

NA

(R

PM

)fo

otpr

int (

RP

M)

Tb092111070 Tb092111070

low

low

-80

-60

-40

-20 0

02

04

0

translational efficiency (log2)

Figure 3 Translational efficiency is regulated (A) Histograms ofmRNA abundance and ribosome density (rate of protein synthesis)for BF (left panel) and PF parasites (right panel) (B) Histogram oftranslational efficiency (ratio of ribosome footprint density to mRNAabundance) for BF (left panel) and PF parasites (right panel) (C) Pair-wise comparisons of translational efficiency in PF and BF CDSs wereranked based on translational efficiency (1=highest translational effi-ciency 7782= lowest translational efficiency) in BF and PF Ranksbetween life cycle stages show a correlation of R=07428 Genefamilies with developmentally regulated translational efficiencies arecolour coded Pumillio genes (red) cytochromes oxidase (blue) genesrequired for glycolysis (63) (green) and the alternative oxidase(Tb927107090 black) (D) Ribosome footprint profile of a gene withdevelopmentally regulated translational efficiency Green arrow indi-cates direction of transcription

3628 Nucleic Acids Research 2014 Vol 42 No 6

ribosome footprint levels in the two different life cyclestages Therefore we decided to compare the overallchanges in translational efficiency within each life cyclestage To this end we ranked the genes in each life cyclestage based on their relative translational efficiency andcompared their rank between the two life cycle stagesA pair-wise comparison indicated a generally positive cor-relation between translational efficiency in the two stages(Figure 3C Pearsonrsquos correlation coefficient=07428Plt 00001 95 CI 07327ndash07526) but it also revealeddistinct life cycle-specific differences for subsets of genesFor example 58 genes found in the lowest 25 in terms oftranslational efficiency in the PF were among the top 25most efficiently translated proteins in the BF Thesecontain various RNA binding proteins numerous phos-phatases and expression site-associated genes (Table 1 andSupplementary Table S3) For an example of a genemore efficiently translated in the BF than in the PF seeFigure 3D

Based on mRNA and protein level measurements differ-ential translational efficiency has been predicted foraconitase the cytochrome oxidases V and VI and procyclins(222565) For all of these genes our data indicate anincreased translational efficiency in the PF (Table 1) thusconfirming previous observations and validating our meas-urements For selected groups of proteins eg annotated ascytochrome oxidase or those required for glycolysis wenoticed the translational efficiency to correspond to the

changes that have been described for the glucosemetabolism(Figure 3C) Whereas the BF is entirely dependent on theglycolytic pathway to generate energy the PF living in theinsect midgut where glucose availability is uncommon orabsent has been shown to contain a much more elaborateenergy metabolism (66) In contrast the efficient translationof the group of PUF (Pumilio and FBF) proteins in PF wasunexpected (Figure 3C) PUF proteins regulate translationand mRNA stability by binding sequences in their targetRNAs and have been shown to affect gene expression inorganisms as divergent as Plasmodium falciparumT brucei humans and yeast (67ndash69) Interestingly wefound both poly(A) binding proteins all four isoforms ofeIF4Es and four of the five eIF4G isoforms all proteinsinvolved in control of translation initiation to be more effi-ciently translated in the PF than in the BF (SupplementaryTable S2)Given the large dynamic range and the life cycle-specific

regulation we propose that differences in translationalefficiency contribute substantially to the control of geneexpression in T brucei

uORFs are frequently translated and may regulatetranslational efficiency

The broad 100-fold range in translational efficiencyamong genes and the differences in translationalefficiencies between life cycle stages raise the question of

Table 1 Developmentally regulated translation

Gene ID Description BF rank PF rank Change inrank (PFndashBF)

Translation up-regulated in BFTb92773250 Expression site-associated gene 6 (ESAG6) protein putative 1372 7405 6033Tb92743980 Chaperone protein DNAj putative 1301 7248 5947Tb927104780 GPI inositol deacylase precursor (GPIdeAc) 390 5994 5604Tb92788140 Small GTP-binding rab protein putative 1217 6804 5587Tb92763480 RNA-binding protein putative (DRBD5) 1856 7262 5406Tb11014701 Membrane-bound acid phosphatase 1 precursor (MBAP1) 689 6065 5376Tb92714650 Cyclin-like F-box protein (CFB2) 10 5289 5279Tb92735660 UDP-Gal or UDP-GlcNAc-dependent glycosyltransferase putative 797 6041 5244Tb92726000 Glycosylphosphatidylinositol-specific phospholipase C (GPI-PLC) 1634 6793 5159Tb92745310 Serinethreonine-protein kinase a putative 1656 6776 5120

Translation up-regulated in PFTb9275440 trans-Sialidase putative 6785 418 6367Tb92776850 trans-Sialidase (TbTS) 7513 1445 6068Tb92777470 Receptor-type adenylate cyclase GRESAG 4 putative 7001 1268 5733Tb92712120 Calpain putative 6258 739 5519Tb9274360 12-Dihydroxy-3-keto-5-methylthiopentene dioxygenase putative 6587 1089 5498Tb091605550 Calpain-like cysteine peptidase putative 6936 1868 5068Tb92787690 Amino acid transporter (pseudogene) putative 6693 1731 4962Tb92781610 MSP-B putative 7199 2272 4927Tb11016650 Serinethreonine-protein kinase putative 5280 363 4917Tb92777110 Leucine-rich repeat protein (LRRP) putative 6501 1601 4900

Genes for which translational up-regulation in PF was previously shownTb92710280 Cytochrome oxidase subunit VI (COXVI) 2820 810 2010Tb091601820 Cytochrome oxidase subunit V (COXV) 935 291 644Tb9271014000 Aconitase (ACO) 657 31 626Tb9276510 GPEET2 procyclin precursor 3848 365 3483Tb9271010260 EP1 procyclin (EP1) 4682 34 4648Tb9271010220 Procyclin-associated gene 2 (PAG2) protein (PAG2) 6903 5375 1528Tb9275330 Receptor-type adenylate cyclase GRESAG 4 putative 3779 809 2970

List of genes with highest change in translational efficiency between PF and BF List does not include hypothetical genes

Nucleic Acids Research 2014 Vol 42 No 6 3629

how protein translation is regulated For a small numberof genes in T brucei sequence motifs in the 30 UTR havebeen linked to developmentally regulated changes in trans-lational efficiency but for the large majority of genes nosuch motifs have been identified (70) Numerous reportsfrom other eukaryotes demonstrated that uORFs canfunction as important regulators of gene expression (71)For example analysis of mRNA and protein levels fromgt10 000 mammalian genes revealed that the presence ofan uORF correlates with a significant reduction of expres-sion from the downstream CDS Furthermore usingreporter constructs it was estimated that the presence ofan uORF results in a decrease in gene expression of 30ndash80 with only a minor reduction of mRNA levels (72)No such analyses have been performed for T brucei but ithas been shown that the removal of a uATG leads to a 7-fold increase in protein levels of a luciferase reporter (41)In this manuscript we use the term ORF or uORF to referto regions of DNA sequences beginning with an ATG andending with a termination codon that may or may notencode for proteins The term coding sequence (CDS) isused to refer to ORFs that encode proteins MinimallyuORFs consist of 9 nt containing an upstream start codon(uATG) an additional sense codon and a terminationcodon whereby the termination codon may be locateddownstream of the start codon of the main CDS (73)Even though 50 UTRs have only been assigned for 4909

genes (927 version 42) applying the criteria listed abovewe identified 8310 uORFs and found 1092 (22) of 50

UTRs to contain at least one uORF (SupplementaryTable S4) Thus uORF are much less common inT brucei than in mammals where 40ndash50 of genescontain uORFs (7475) Nevertheless while uORFscannot explain the translational regulation for all tran-scripts they may represent one of several factors import-ant for the regulation of gene expression in T bruceiTo determine the degree to what uORFs are translated

a prerequisite for affecting translational efficiency of thedownstream CDS we calculated the ribosome densityacross uORFs and identified 1834 uORFs (22) with 2 read-coverage and 70 of the ORF coveredThe average length of these uORFs was 22 aa Just likefor annotated CDSs the alignment of ribosome profilingreads across these uORFs revealed a distinct 3-nt period-icity with 63 of BF ribosome profiling reads starting atthe first nucleotide of a codon (Figure 4A and B) Whilethe periodicity of ribosome footprints across uORFs wasslightly less pronounced than for annotated CDS ourdata suggest that a large proportion of uORFs istranslatedTo evaluate the regulatory potential of uORFs we

compared the ribosome densities between transcripts con-taining at least one uORF (N=1092) and those withoutuORFs (N=3817) In the BF median ribosome densityacross the CDSs of transcripts with uORFs was3031 rpkm compared with 4486 rpkm for genes withoutuORFs (Table 2) In the PF median ribosome densitieswere 2001 rpkm (genes with uORF) and 3650 rpkm(genes without uORF) Thus in both life cycle stages weobserved a higher ribosome density for genes withoutuORFs than for genes with uORFs (Plt 00001) While

we also observed a higher mRNA level for geneswithout uORFs compared with genes with uORFs(Plt 00001) the differences in mRNA levels were lowerthan the differences in ribosome density (Table 2) Thusour data indicate that median translational efficiencyis higher for genes without uORF than for genes withuORF suggesting that the presence of uORFs may nega-tively impact translation of downstream CDSs

Noteworthy while median ribosome density for CDSswas higher in the BF than in the PF (Table 2) we observedthe opposite for 50 UTRs (Plt 00001) For 50 UTRs wefound a higher ribosome density in the PF (median rpkm773) than in the BF (median rpkm 289 SupplementaryTable S4 for an example see Figure 4C) In addition wefound that in the BF 50 UTRs with uORF (408 rpkm)have a higher ribosome density than 50 UTRs withoutuORFs (254 rpkm Plt 00001) Unexpectedly thecontrary was true in PF In the PF we found 50 UTRswithout uORF (834 rpkm) to have a higher ribosomedensity than 50 UTRs with uORF (501 rpkmPlt 00001) For an example of a non-AUG uORF seeFigure 4C right panel Many of such non-AUG uORFshave been observed in yeast and mice embryonic stem cells(mESCs) Interestingly an increase in 50 UTR translationsimilar to what we observed in the PF compared with theBF was observed in yeast on starvation and a decreasein 50 UTR translation was observed on differentiation ofpluripotent mESCs into embryoid bodies (4546)

Genome-wide analyses to prove or disprove a directcorrelation between translation of uORFs and

00

04

08

411000Chr I

411500 412000

00

02

04

00

04

08

Chr I420000 420200 420400

0

2

4

foot

prin

t (R

PM

)fo

otpr

int (

RP

M)

BF BF

PF PF

Tb92711610 Tb92711670

-24 -18 -12 -6 0 6 12

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

uORF

uORF

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

A B

C

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

-24 -18 -12 -6 0 6

Figure 4 Ribosome footprints reveal translation of uORFs(A) Alignment of the 50 nucleotides from ribosome footprint readsthat map close to translation start or translation termination sites ofuORFs (B) Percentage of position of sequence reads relative to readingframe (C) Ribosome profiles of two genes with uORF (left panel) andwithout uORF (right panel) Narrow grey boxes represent 50 UTRsgreen box represents AUG-codon red box represents terminationcodon

3630 Nucleic Acids Research 2014 Vol 42 No 6

translational efficiency is complicated by the existence ofmultiple 50 UTR isoforms resulting from widespreaddifferential trans-splicing (353676) At the same timethe heterogeneity in UTR lengths raises the intriguingpossibility that inclusion or exclusion of an uORFduring alternative trans-splicing may represent a regula-tory mechanism to modulate translational efficiency

Ribosome footprints reveal the presence of hundreds ofpreviously un-annotated CDSs

Even though small proteins (lt200 aa) have been shown toplay major roles in plant and animal development (7778)the complexity of the short proteome remains largelyunexplored because algorithms to reliably predict shortCDSs are lacking (79) We observed that while almostall ribosome footprints aligned to annotated features inthe BF 101 197 reads (023) and in the PF 1 116 030reads (144) did not It is noteworthy that just as weobserved for 50 UTRs the percentage of reads not aligningto CDSs is higher in the PF than in the BF We suspectthat many of these reads originate from un-annotatedUTRs but some may stem from previously un-annotatedCDSs A previous RNA-sequencing-based analysisdetected 1114 new un-annotated transcripts in the PF1011 of which had the potential to encode one or morepeptides ( 25 aa) Using available proteomics data theauthors were able to confirm translation for 19 of the 1114transcripts (13) Given our ability to determine ribosomepositions at nucleotide resolution we looked for new pre-viously un-annotated CDSs

We mapped ribosome footprint reads from the BF andPF and determined the ribosome footprint density for allpotential CDSs at least 10 aa in length To avoid overlapwith annotated CDSs we only considered ORFs locatedat least 20 nt away from annotated features Thisapproach enabled us to identify a set of 2021 candidateCDSs with 2 read-coverage and 70 of the ORFcovered The candidate CDSs ranged in length from 10to 378 aa (average 30 aa Supplementary Table S5) For797 of the 1114 previously identified transcripts we foundthe average ribosome footprint read-coverage to be 2(Supplementary Table S6 Previously identified transcriptsfrom chromosome 10 were not considered because oursequencing reads were mapped to a newer version of the

chromosome 10 assembly and thus genomic coordinatesmight be incompatible)To determine whether our set of candidate CDSs is

translated we analysed published proteomics datasets(57) to search for protein products For 24 of the 2021candidate CDSs (average size 117 aa 13 kDa) proteinproducts could be identified however four of theseidentified peptides also matched annotated CDSs Inaddition we identified protein products for 31 uORFs(Supplementary Table S4) We suspect that the lownumber of identified protein products relative to thelarge number of candidate CDSs may be caused (i) bythe experimental set-up of the original proteomicsanalysis where proteins were separated by 1D-SDS-PAGE prior to in-gel protease digest and mass spectro-metric analysis whereby proteins with a size smaller than5ndash10 kDa typically run out of the gel (ii) by the limitedsensitivity of mass spectrometric analyses compared withDNA sequencing-based techniques and (iii) by the factthat not all ribosome-protected RNAs are translatedThe first point is supported by the fact that only 67

(3) of our candidate CDSs were 90 aa (10 kDa) butfor 28 of those large candidate CDSs we could identifypeptides Evidence supporting the possibility that not allribosome-protected RNAs are translated exists in micewhere even known long ncRNAs have been found to beassociated with ribosomes (46) To evaluate the codingpotential of a transcript and to separate long ncRNAfrom small coding genes a new metric was establishedthe so-called RRS (53) The RRS takes advantage of thefact that translating ribosomes are released on encounter-ing a stop codon This release results in a sharp decrease inribosome occupancy between protein-coding regions andthe subsequent 30 UTR Consequently the RRS wasdefined as the ratio of footprint reads in the putativeCDS to footprint reads in the corresponding 30 UTRdivided by the ratio of RNA reads in the putative CDSto RNA reads in the corresponding 30 UTR (seelsquoMaterials and Methodsrsquo section)Similarly to annotated CDSs and uORFs our averaged

ribosome footprint data indicated a well-defined 3-nt peri-odicity for sequence reads occurring near translationinitiation sites However we observed a less well-defineddrop in ribosome densities across translation terminationsites (Figure 5A and B) which may indicate a lack of

Table 2 Translational efficiency of genes with and without uORF

Transcripts withuORF (N=3817)

Transcripts withoutuORF (N=1092)

MannndashWhitney test

Bloodstream formRibosome density (median rpkm) 3031 4466 Plt 00001mRNA levels (median rpkm) 3083 3613 Plt 00001Translational efficiency (ribosome densitymRNA levels) 100 127 Plt 00001

Procyclic formRibosome density (median rpkm) 2001 3650 Plt 00001mRNA levels (median rpkm) 2404 3038 Plt 00001Translational efficiency (ribosome densitymRNA levels) 087 126 Plt 00001

Reads mapping within the first 40 nt of a CDS were not considered for the measurements of translational efficiency Number of genes with annotated50 UTR N=4909

Nucleic Acids Research 2014 Vol 42 No 6 3631

productive translation Because we observed an abruptdrop in footprint density downstream of the stop codonof annotated genes in T brucei (Figure 1A and B) wedecided to apply the same parameters to evaluate the like-lihood that ORFs with high footprint density are indeedbeing translated into functional proteins Again candidateCDSs were defined as ORFs with a minimum length of10 aa and an average ribosome footprint coverage of 2over at least 70 of the ORF To define the correspond-ing putative 30 UTR we applied the same parameters usedin mice (53) ie the 30 UTR was defined as the regionbeginning immediately downstream of the ORF andending at the first subsequent start codon (in anyreading frame) A limitation of the RRS is that it canonly be calculated for an ORF if at least one entire foot-print read and one RNA read map to the putative CDSand the corresponding 30 UTR Because many 30 UTRswere too short to fulfill this criterion or simply did notcontain any footprint reads a sign for efficient translationtermination we were able to determine the RRS for only

1445 of genes annotated in T brucei (SupplementaryTable S7) For these 1445 genes the median RRS was839 compared with an RRS of 019 for the subset ofgenes annotated as hypothetical unlikely (N=90) Thussimilar to the observation in mice the RRS appears to bea good predictor of productive translation Next wedetermined the RRS for the previously un-annotated can-didate CDSs that fulfilled the criteria for RRS calculation(N=265) The median RRS of these candidate CDSs waswith 399 lower than the median RRS of the annotatedgenes (839) Thirteen candidate CDSs had an RRS largerthan 839 and 98 candidate CDSs had an RRS higher thanthe well-described intron-containing poly(A) polymerase(RRS=963) The RRS could only be calculated forone of the candidate CDSs for which we had identifiedprotein products (RRS=5744)

Finally in an attempt to learn about the function of theputative CDSs we searched for known protein motifsin those candidate CDSs for which we had identifiedprotein products and for those with an RRS above 10However except for one candidate CDS located withinthe histone H2B gene array and which contained anH2B signature motif (Supplementary Table S5) nomotifs were identified

Taken together proteomics data and a well-definedribosome release (high RRS) suggest extensive translationbeyond the annotated CDSs in T brucei

Newly identified CDSs are important for parasite fitness

The combination of ribosome profiling data with previ-ously published mass spectrometric data allowed us toidentify a large number of new CDSs Nonetheless the bio-logical significance of theseCDSs remains to be determinedIn addition while ORFs with a high ribosome footprintdensity a low RRS and without mass spectrometricevidence may not be translated into functional proteinsthey may nevertheless play important regulatory roles

Therefore we decided to evaluate the importance of ourcandidate CDSs independent of RRS and mass spectro-metric evidence in parasite survival A previously pub-lished high-throughput phenotyping approach termedRIT-seq measured the fitnessndashcost associations usingRNA interference (RNAi) The RIT-seq data revealed asignificant loss in fitness on RNAi induction of 2724 CDSsin the BF 1972 CDSs in the PF and 2677 CDSs in inducedand differentiated parasites (54) The approach is based onthe integration of RNAi libraries into trypanosomes and acomparison of the recovery of RNAi targets from tryp-anosome populations before and after RNAi inductionThe RNAi library was generated using genomic DNAbut the effect on parasite fitness was only determined forannotated genes Re-analysing the same datasets we finda significant loss of fitness on RNAi induction for 214putative CDSs in the BF (6 days after RNAi induction)16 CDSs in the PF and 227 CDSs on differentiation(Supplementary Table S8) For examples of candidateCDSs potentially essential for viability in the BF seeFigure 5C It is important to note that the RNAi libraryused for the RIT-seq experiments contains a median insertsize of 600 bp (average 1 kb) thus somewhat limiting the

01234

0

4

8

0005101520

02468

05

101520

02468

Chr III

622400

622000

622800

0

4

8

Chr X

2577800

2578200

01234

mR

NA

fo

otpr

int

RN

Ai r

eads

R

NA

i rea

ds

A

C

B

putative CDS putative CDS

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

-24 -18 -12 -6 0 6 12 -24 -18 -12 -6 0 6

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

putative CDS

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

(RPM)

Figure 5 Ribosome footprints reveal previously un-annotated CDSs(A) Alignment of the 50 nucleotides from ribosome footprint reads thatmap close to translation start or translation termination sites of candi-date CDSs (B) Percentage of position of sequence reads relative toreading frame (C) Ribosome mRNA and RIT-seq (RNAi targetsequencing) profiles of two previously un-annotated putative CDSs

3632 Nucleic Acids Research 2014 Vol 42 No 6

resolution of the RIT-seq data (8081) Nevertheless thesefindings suggest an important biological role for a subsetof genes that have been missed in previous genome anno-tations Intriguingly the RNAi data reveal large develop-mental differences in the fitness costs associated with thedown-regulation of putative CDSs suggesting that the setof candidate CDSs may be more important during themammalian stage

DISCUSSION

In this study we report the first genome-wide analysis ofprotein synthesis and a strand-specific analysis of RNAtranscript levels for T brucei It is the first such analysisfor a eukaryotic pathogen and an organism without tran-scriptional control Our data reveal large differences intranslational efficiency among transcripts in the same lifecycle stage and between the same transcript in differentstages In addition the sequencing of ribosome-protectedRNA enabled us to identify transcripts that are likely tobe translated Hundreds of putative previously unidenti-fied CDSs appear to be essential for parasite fitness andthe analysis of available proteomics data confirmed theexistence of at least 20 of these previously unknown CDSs

Eukaryotic gene expression is regulated at multiplelevels with translation of mRNA into proteins beingone of the most important (17) The ability to determineboth translatome and transcriptome enabled us toevaluate the efficiency with which individual transcriptsare translated into proteins and suggests translationalcontrol to be an important regulatory mechanism inT brucei Under a single condition we observed transla-tional efficiency to vary over two orders of magnitudethus its importance equals that of RNA stability forwhich a similar range has been measured (82) Howeverwe observed no correlation between RNA abundance andtranslational efficiency suggesting that translational effi-ciency is regulated independently of RNA stability

To use ribosome density to estimate the rate of proteinsynthesis the speed of translation must be constant Thisassumption is supported by measurements in mouse em-bryonic stem cells that demonstrate the rate of translationto be consistent between different classes of mRNA thekinetics of elongation to be independent of length andprotein abundance and the speed of translation to be in-dependent of codon usage (4664) Nevertheless the speedof translation may very well be different between the twolife cycle stages of the parasite that live at 37C and 27CFurthermore our approach does not permit measurementof the absolute mRNA and footprint abundance for theindividual life cycle stages making direct comparisons oftranslation efficiencies for individual transcripts unreli-able Therefore we did not compare translational effi-ciency directly but rather determined the lsquorankrsquo intranslational efficiency for all genes in each life cyclestage assuming that a lsquochange in rankrsquo indicated lifecycle-dependent translational control While we see ageneral positive correlation in translational efficiencybetween the two life cycle stages (Pearson r=07428)for a large number of genes we observed developmental

regulation of translation Importantly the regulation weobserve agrees well with that reported in previous studiesfor a small subset of genes but also includes many proteinspreviously unknown to be developmentally regulated Inaddition the efficient translation of proteins annotated ascytochrome oxidases in the PF and of enzymes importantfor glycolysis in the BF is consistent with differences inenergy metabolism with the BF entirely dependent on gly-colysis to generate energyHow translational control is achieved in T brucei

remains to be seen but our analysis suggests that uORFmay be an important contributor Work in several organ-isms has shown that generally the presence of uORFs cor-relates with reduced gene expression (71) However forsome genes like the transcription factor GCN4 in yeastand the activating transcription factor ATF4 in mammalsthis correlation is reversed on stress (8384) Thus uORFcan exert positive and negative effects on translation ofdownstream CDSs Based on previous 50 UTR annota-tions we identified uORFs in 22 of T brucei genesbut it will be interesting to learn if the regulatory effectsof uORFs may be supplemented by translation from non-AUG uORFs Our ribosome profiling data indicate anincrease in translation from 50 UTRs and a more ubiqui-tous translation initiation from non-AUG sites in thePF compared with the BF Similarly an increase in thetranslation from 50 UTRs has been observed in yeast onstarvation (45) One of the most highly conserved stress-induced mechanisms to regulate translation involves theinactivation of the translation initiation factor eIF2aby phosphorylation at a conserved serine (8586)Phosphorylation of the T brucei and L major orthologsoccurs at the corresponding Thr residues (687) but thegeneration of a T brucei cell line exclusively expressing amutant form of eIF2a demonstrated that phosphorylationof eIF2a is not required for heat-induced translationalarrest or for the formation of heat shock stress granulesin T brucei (88) Given the finding that eIF2a may also beimportant in determining the stringency in initiator codonselection (89) it will be interesting to see whether eIF2aphosphorylation affects the selection of initiator codonsand whether this plays a role in the fine-tuning of geneexpressionBesides being a valuable tool to study translational

regulation ribosome profiling will also be invaluable forthe discovery of small peptides While only 2 of thehuman genome contains annotated CDSs (90) transcrip-tome analyses have revealed almost genome-wide tran-scription of RNA mostly considered to be non-protein-coding In lower organisms the percentage of thegenome encoding for proteins is higher (629192) butpervasive transcription of non-protein coding regionscan be found in essentially all organisms (93) The vastamount of ncRNA has triggered great interest inelucidating the biological significance of these transcriptsand has led to the identification of numerous regulatorymechanisms controlled by short and long ncRNAInterestingly the majority of transcription occurs aslong ncRNA in humans (94) many of which containORFs making it difficult to unambiguously classify a tran-script as non-protein-coding To distinguish long ncRNA

Nucleic Acids Research 2014 Vol 42 No 6 3633

from mRNA ORF length has served as the mostcommonly applied criterion because even without select-ive pressure short ORFs will occur frequently by chancewhile the probability that long ORFs occur by chance arelow Nevertheless recent findings reveal that eukaryoticgenomes contain a large number of small CDSs (95ndash97)and that small peptides play important biological roles(77789899) In addition numerous long ncRNAs con-taining ORFs longer than 100 aa have been found to as-sociate with ribosomes (4653) While these findingscomplicate the annotation of CDSs and demand for newapproaches to annotate genes they raise the exciting pos-sibility that a large number of unknown small CDSs existsand awaits discoveryFor this study RNA was obtained from a 427 strain

the most common laboratory strain and sequence readswere aligned to the more complete genome of the 927strain nevertheless RNA-sequencing analysis indicateswidespread transcription Thus as seen in other organ-isms active transcription is not necessarily a good pre-dictor of CDSs In contrast we found ribosomefootprint density to be highly enriched across annotatedCDSs excluding introns It should thus serve as a reliabletool to identify novel genes To search for new CDSs andto evaluate their biological function we combinedribosome profiling proteomics and genome-wide RNAidata While proteomics data allowed us to identifyprotein products for only 20 relatively large candidateCDSs the experimental set-up of the proteomic analysisthat involved the removal of small peptides made itimpossible to verify the existence of many small CDSsThus more important than the verification of putativeCDSs with proteomics data may be our finding thatgt200 of the putative CDSs appear to be essential forparasite fitness Furthermore RIT-seq data suggest thatcandidate CDSs are 13-fold less likely to be essentialfor viability in the PF than in the BF or during differen-tiation of cells from BF to PF These findings point to apossible role of small peptides in the adaptation of tryp-anosomes to life in the mammalian host Generally ourdata suggest the existence of a large yet to be exploredsmall proteome but further characterizations of thesmall proteome will have to include a more targetedproteomics analysisIn this analysis we focused on measuring translational

control and the identification of new CDSs Howeverribosome profiling data should also aid in the identifica-tion of bona fide ncRNAs For many annotated CDSs weobserved RNA transcripts but not ribosome footprintssuggesting the presence of ncRNA Though given thatwe have only analysed two life cycle stages theseputative ncRNAs may be translated during other stagesof the parasitersquos life cycle Thus a comprehensive searchfor ncRNA should include ribosome profiling analyses ofall life cycle stages

ACCESSION NUMBERS

All sequencing data have been deposited in the EuropeanNucleotide Archive PRJEB4801

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Stan Gorski Susanne Kramer Christian Janzenand Jose-Juan Lopez-Rubio for valuable discussions andcritical reading of the manuscript and Yanjie Chao andJan Medenbach for much appreciated advice ongenerating ribosome profiling libraries

FUNDING

Young Investigator Program of the Research Center ofInfectious Diseases (ZINF) of the University ofWuerzburg Germany German Research FoundationDFG [SI 16102-1] Human Frontier Science Program(to TNS) French National Research Agency [ANR-2010-GENM-011-01 GENAMIBE to CCH] Fundingfor open access charge German Research Foundation(DFG) and the University of Wuerzburg in the fundingprogramme Open Access Publishing

Conflict of interest statement None declared

REFERENCES

1 MooreMJ (2005) From birth to death the complex lives ofeukaryotic mRNAs Science 309 1514ndash1518

2 NagarajN WisniewskiJR GeigerT CoxJ KircherMKelsoJ PaaboS and MannM (2011) Deep proteome andtranscriptome mapping of a human cancer cell line Mol SystBiol 7 548

3 LuP VogelC WangR YaoX and MarcotteEM (2007)Absolute protein expression profiling estimates the relativecontributions of transcriptional and translational regulationNat Biotechnol 25 117ndash124

4 de GodoyLM OlsenJV CoxJ NielsenML HubnerNCFrohlichF WaltherTC and MannM (2008) Comprehensivemass-spectrometry-based proteome quantification of haploidversus diploid yeast Nature 455 1251ndash1254

5 McNicollF DrummelsmithJ MullerM MadoreE BoilardNOuelletteM and PapadopoulouB (2006) A combinedproteomic and transcriptomic approach to the study ofstage differentiation in Leishmania infantum Proteomics 63567ndash3581

6 LahavT SivamD VolpinH RonenM TsigankovPGreenA HollandN KuzykM BorchersC ZilbersteinDet al (2011) Multiple levels of gene regulation mediatedifferentiation of the intracellular pathogen LeishmaniaFASEB J 25 515ndash525

7 SchwanhausserB BusseD LiN DittmarG SchuchhardtJWolfJ ChenW and SelbachM (2011) Global quantificationof mammalian gene expression control Nature 473 337ndash342

8 FevreEM WissmannBV WelburnSC and LutumbaP (2008)The burden of human African trypanosomiasis PLoS Negl TropDis 2 e333

9 LozanoR NaghaviM ForemanK LimS ShibuyaKAboyansV AbrahamJ AdairT AggarwalR AhnSY et al(2012) Global and regional mortality from 235 causes ofdeath for 20 age groups in 1990 and 2010 a systematic analysisfor the Global Burden of Disease Study 2010 Lancet 3802095ndash2128

10 Martinez-CalvilloS YanS NguyenD FoxM StuartK andMylerPJ (2003) Transcription of Leishmania major Friedlinchromosome 1 initiates in both directions within a single regionMol Cell 11 1291ndash1299

3634 Nucleic Acids Research 2014 Vol 42 No 6

11 Martinez-CalvilloS NguyenD StuartK and MylerPJ (2004)Transcription initiation and termination on Leishmania majorchromosome 3 Eukaryot Cell 3 506ndash517

12 SiegelTN HekstraDR KempLE FigueiredoLMLowellJE FenyoD WangX DewellS and CrossGAM(2009) Four histone variants mark the boundaries of polycistronictranscription units in Trypanosoma brucei Genes Dev 231063ndash1076

13 KolevNG FranklinJB CarmiS ShiH MichaeliS andTschudiC (2010) The transcriptome of the human pathogenTrypanosoma brucei at single-nucleotide resolutionPLoS Pathog 6 e1001090

14 LeBowitzJH SmithHQ RuscheL and BeverleySM (1993)Coupling of poly(A) site selection and trans-splicing inLeishmania Genes Dev 7 996ndash1007

15 UlluE MatthewsKR and TschudiC (1993) Temporal order ofRNA-processing reactions in trypanosomes rapid trans splicingprecedes polyadenylation of newly synthesized tubulin transcriptsMol Cell Biol 13 720ndash725

16 MatthewsKR TschudiC and UlluE (1994) A commonpyrimidine-rich motif governs trans-splicing and polyadenylationof tubulin polycistronic pre-mRNA in trypanosomes Genes Dev8 491ndash501

17 WrightJR SiegelTN and CrossGAM (2010) Histone H3trimethylated at lysine 4 is enriched at probable transcriptionstart sites in Trypanosoma brucei Mol Biochem Parasitol 136434ndash450

18 ClaytonC and ShapiraM (2007) Post-transcriptional regulationof gene expression in trypanosomes and leishmanias MolBiochem Parasitol 156 93ndash101

19 MatthewsKR (2011) Controlling and coordinating developmentin vector-transmitted parasites Science 331 1149ndash1153

20 CrossGA KleinRA and LinsteadDJ (1975) Utilization ofamino acids by Trypanosoma brucei in culture L-threonine as aprecursor for acetate Parasitology 71 311ndash326

21 BrunR and SchonenbergerM (1979) Cultivation and in vitrocloning or procyclic culture forms of Trypanosoma brucei in asemi-defined medium Acta Trop 36 289ndash292

22 FurgerA SchurchN KurathU and RoditiI (1997) Elementsin the 30 untranslated region of procyclin mRNA regulateexpression in insect forms of Trypanosoma brucei bymodulating RNA stability and translation Mol Cell Biol 174372ndash4380

23 HehlA VassellaE BraunR and RoditiI (1994) A conservedstem-loop structure in the 30 untranslated region of procyclinmRNAs regulates expression in Trypanosoma brucei Proc NatlAcad Sci USA 91 370ndash374

24 HotzHR HartmannC HuoberK HugM and ClaytonC(1997) Mechanisms of developmental regulation in Trypanosomabrucei a polypyrimidine tract in the 30-untranslated region of asurface protein mRNA affects RNA abundance and translationNucleic Acids Res 25 3017ndash3026

25 MayhoM FennK CraddyP CrosthwaiteS and MatthewsK(2006) Post-transcriptional control of nuclear-encoded cytochromeoxidase subunits in Trypanosoma brucei evidence for genome-wide conservation of life-cycle stage-specific regulatory elementsNucleic Acids Res 34 5312ndash5324

26 WalradP PaterouA Acosta-SerranoA and MatthewsKR(2009) Differential trypanosome surface coat regulation by aCCCH protein that co-associates with procyclin mRNA cis-elements PLoS Pathog 5 e1000317

27 HelmJR WilsonME and DonelsonJE (2009) Differentialexpression of a protease gene family in African trypanosomesMol Biochem Parasitol 163 8ndash18

28 HornD (2008) Codon usage suggests that translational selectionhas a major impact on protein expression in trypanosomatidsBMC Genomics 9 2

29 ManfulT FaddaA and ClaytonC (2011) The role of the 50-30

exoribonuclease XRNA in transcriptome-wide mRNAdegradation RNA 17 2039ndash2047

30 YoffeY ZuberekJ LewdorowiczM ZeiraZ KeasarCOrr-DahanI Jankowska-AnyszkaM StepinskiJDarzynkiewiczE and ShapiraM (2004) Cap-binding activity ofan eIF4E homolog from Leishmania RNA 10 1764ndash1775

31 YoffeY ZuberekJ LererA LewdorowiczM StepinskiJAltmannM DarzynkiewiczE and ShapiraM (2006) Bindingspecificities and potential roles of isoforms of eukaryotic initiationfactor 4E in Leishmania Eukaryot Cell 5 1969ndash1979

32 De GaudenziJ FraschAC and ClaytonC (2005) RNA-bindingdomain proteins in Kinetoplastids a comparative analysisEukaryot Cell 4 2106ndash2114

33 DhaliaR ReisCR FreireER RochaPO KatzRMunizJR StandartN and de Melo NetoOP (2005)Translation initiation in Leishmania major characterisation ofmultiple eIF4F subunit homologues Mol Biochem Parasitol140 23ndash41

34 ZinovievA and ShapiraM (2012) Evolutionary conservation anddiversification of the translation initiation apparatus intrypanosomatids Comp Funct Genomics 2012 813718

35 SiegelTN HekstraDR WangX DewellS and CrossGAM(2010) Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicingand polyadenylation sites Nucleic Acids Res 38 4946ndash4957

36 NilssonD GunasekeraK ManiJ OsterasM FarinelliLBaerlocherL RoditiI and OchsenreiterT (2010) Spliced leadertrapping reveals widespread alternative splicing patterns in thehighly dynamic transcriptome of Trypanosoma brucei PLoSPathog 6 e1001037

37 SiegelTN GunasekeraK CrossGA and OchsenreiterT(2011) Gene expression in Trypanosoma brucei lessons fromhigh-throughput RNA sequencing Trends Parasitol 27 434ndash441

38 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principles ofits regulation Nat Rev Mol Cell Biol 11 113ndash127

39 KozakM (1984) Selection of initiation sites by eucaryoticribosomes effect of inserting AUG triplets upstream from thecoding sequence for preproinsulin Nucleic Acids Res 123873ndash3893

40 SomersJ PoyryT and WillisAE (2013) A perspective onmammalian upstream open reading frame function Int JBiochem Cell Biol 45 1690ndash1700

41 SiegelTN TanKS and CrossGAM (2005) Systematic studyof sequence motifs for RNA trans splicing in Trypanosomabrucei Mol Cell Biol 25 9586ndash9594

42 AravaY WangY StoreyJD LiuCL BrownPO andHerschlagD (2003) Genome-wide analysis of mRNA translationprofiles in Saccharomyces cerevisiae Proc Natl Acad Sci USA100 3889ndash3894

43 CapewellP MonkS IvensA MacgregorP FennKWalradP BringaudF SmithTK and MatthewsKR (2013)Regulation of trypanosoma brucei total and polysomal mRNAduring development within its mammalian host PLoS One 8e67069

44 BrechtM and ParsonsM (1998) Changes in polysome profilesaccompany trypanosome development Mol Biochem Parasitol97 189ndash198

45 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

46 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

47 OhE BeckerAH SandikciA HuberD ChabaR GlogeFNicholsRJ TypasA GrossCA KramerG et al (2011)Selective ribosome profiling reveals the cotranslational chaperoneaction of trigger factor in vivo Cell 147 1295ndash1308

48 MichelAM and BaranovPV (2013) Ribosome profiling aHi-Def monitor for protein synthesis at the genome-wide scaleWiley Interdiscip Rev RNA 349 4184ndash4188

49 IngoliaNT (2010) Genome-wide translational profiling byribosome footprinting Methods Enzymol 470 119ndash142

50 IngoliaNT BrarGA RouskinS McGeachyAM andWeissmanJS (2012) The ribosome profiling strategy formonitoring translation in vivo by deep sequencing ofribosome-protected mRNA fragments Nat Protoc 7 1534ndash1550

51 LangmeadB and SalzbergSL (2012) Fast gapped-readalignment with Bowtie 2 Nat Methods 9 357ndash359

Nucleic Acids Research 2014 Vol 42 No 6 3635

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 7: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

ribosome footprint levels in the two different life cyclestages Therefore we decided to compare the overallchanges in translational efficiency within each life cyclestage To this end we ranked the genes in each life cyclestage based on their relative translational efficiency andcompared their rank between the two life cycle stagesA pair-wise comparison indicated a generally positive cor-relation between translational efficiency in the two stages(Figure 3C Pearsonrsquos correlation coefficient=07428Plt 00001 95 CI 07327ndash07526) but it also revealeddistinct life cycle-specific differences for subsets of genesFor example 58 genes found in the lowest 25 in terms oftranslational efficiency in the PF were among the top 25most efficiently translated proteins in the BF Thesecontain various RNA binding proteins numerous phos-phatases and expression site-associated genes (Table 1 andSupplementary Table S3) For an example of a genemore efficiently translated in the BF than in the PF seeFigure 3D

Based on mRNA and protein level measurements differ-ential translational efficiency has been predicted foraconitase the cytochrome oxidases V and VI and procyclins(222565) For all of these genes our data indicate anincreased translational efficiency in the PF (Table 1) thusconfirming previous observations and validating our meas-urements For selected groups of proteins eg annotated ascytochrome oxidase or those required for glycolysis wenoticed the translational efficiency to correspond to the

changes that have been described for the glucosemetabolism(Figure 3C) Whereas the BF is entirely dependent on theglycolytic pathway to generate energy the PF living in theinsect midgut where glucose availability is uncommon orabsent has been shown to contain a much more elaborateenergy metabolism (66) In contrast the efficient translationof the group of PUF (Pumilio and FBF) proteins in PF wasunexpected (Figure 3C) PUF proteins regulate translationand mRNA stability by binding sequences in their targetRNAs and have been shown to affect gene expression inorganisms as divergent as Plasmodium falciparumT brucei humans and yeast (67ndash69) Interestingly wefound both poly(A) binding proteins all four isoforms ofeIF4Es and four of the five eIF4G isoforms all proteinsinvolved in control of translation initiation to be more effi-ciently translated in the PF than in the BF (SupplementaryTable S2)Given the large dynamic range and the life cycle-specific

regulation we propose that differences in translationalefficiency contribute substantially to the control of geneexpression in T brucei

uORFs are frequently translated and may regulatetranslational efficiency

The broad 100-fold range in translational efficiencyamong genes and the differences in translationalefficiencies between life cycle stages raise the question of

Table 1 Developmentally regulated translation

Gene ID Description BF rank PF rank Change inrank (PFndashBF)

Translation up-regulated in BFTb92773250 Expression site-associated gene 6 (ESAG6) protein putative 1372 7405 6033Tb92743980 Chaperone protein DNAj putative 1301 7248 5947Tb927104780 GPI inositol deacylase precursor (GPIdeAc) 390 5994 5604Tb92788140 Small GTP-binding rab protein putative 1217 6804 5587Tb92763480 RNA-binding protein putative (DRBD5) 1856 7262 5406Tb11014701 Membrane-bound acid phosphatase 1 precursor (MBAP1) 689 6065 5376Tb92714650 Cyclin-like F-box protein (CFB2) 10 5289 5279Tb92735660 UDP-Gal or UDP-GlcNAc-dependent glycosyltransferase putative 797 6041 5244Tb92726000 Glycosylphosphatidylinositol-specific phospholipase C (GPI-PLC) 1634 6793 5159Tb92745310 Serinethreonine-protein kinase a putative 1656 6776 5120

Translation up-regulated in PFTb9275440 trans-Sialidase putative 6785 418 6367Tb92776850 trans-Sialidase (TbTS) 7513 1445 6068Tb92777470 Receptor-type adenylate cyclase GRESAG 4 putative 7001 1268 5733Tb92712120 Calpain putative 6258 739 5519Tb9274360 12-Dihydroxy-3-keto-5-methylthiopentene dioxygenase putative 6587 1089 5498Tb091605550 Calpain-like cysteine peptidase putative 6936 1868 5068Tb92787690 Amino acid transporter (pseudogene) putative 6693 1731 4962Tb92781610 MSP-B putative 7199 2272 4927Tb11016650 Serinethreonine-protein kinase putative 5280 363 4917Tb92777110 Leucine-rich repeat protein (LRRP) putative 6501 1601 4900

Genes for which translational up-regulation in PF was previously shownTb92710280 Cytochrome oxidase subunit VI (COXVI) 2820 810 2010Tb091601820 Cytochrome oxidase subunit V (COXV) 935 291 644Tb9271014000 Aconitase (ACO) 657 31 626Tb9276510 GPEET2 procyclin precursor 3848 365 3483Tb9271010260 EP1 procyclin (EP1) 4682 34 4648Tb9271010220 Procyclin-associated gene 2 (PAG2) protein (PAG2) 6903 5375 1528Tb9275330 Receptor-type adenylate cyclase GRESAG 4 putative 3779 809 2970

List of genes with highest change in translational efficiency between PF and BF List does not include hypothetical genes

Nucleic Acids Research 2014 Vol 42 No 6 3629

how protein translation is regulated For a small numberof genes in T brucei sequence motifs in the 30 UTR havebeen linked to developmentally regulated changes in trans-lational efficiency but for the large majority of genes nosuch motifs have been identified (70) Numerous reportsfrom other eukaryotes demonstrated that uORFs canfunction as important regulators of gene expression (71)For example analysis of mRNA and protein levels fromgt10 000 mammalian genes revealed that the presence ofan uORF correlates with a significant reduction of expres-sion from the downstream CDS Furthermore usingreporter constructs it was estimated that the presence ofan uORF results in a decrease in gene expression of 30ndash80 with only a minor reduction of mRNA levels (72)No such analyses have been performed for T brucei but ithas been shown that the removal of a uATG leads to a 7-fold increase in protein levels of a luciferase reporter (41)In this manuscript we use the term ORF or uORF to referto regions of DNA sequences beginning with an ATG andending with a termination codon that may or may notencode for proteins The term coding sequence (CDS) isused to refer to ORFs that encode proteins MinimallyuORFs consist of 9 nt containing an upstream start codon(uATG) an additional sense codon and a terminationcodon whereby the termination codon may be locateddownstream of the start codon of the main CDS (73)Even though 50 UTRs have only been assigned for 4909

genes (927 version 42) applying the criteria listed abovewe identified 8310 uORFs and found 1092 (22) of 50

UTRs to contain at least one uORF (SupplementaryTable S4) Thus uORF are much less common inT brucei than in mammals where 40ndash50 of genescontain uORFs (7475) Nevertheless while uORFscannot explain the translational regulation for all tran-scripts they may represent one of several factors import-ant for the regulation of gene expression in T bruceiTo determine the degree to what uORFs are translated

a prerequisite for affecting translational efficiency of thedownstream CDS we calculated the ribosome densityacross uORFs and identified 1834 uORFs (22) with 2 read-coverage and 70 of the ORF coveredThe average length of these uORFs was 22 aa Just likefor annotated CDSs the alignment of ribosome profilingreads across these uORFs revealed a distinct 3-nt period-icity with 63 of BF ribosome profiling reads starting atthe first nucleotide of a codon (Figure 4A and B) Whilethe periodicity of ribosome footprints across uORFs wasslightly less pronounced than for annotated CDS ourdata suggest that a large proportion of uORFs istranslatedTo evaluate the regulatory potential of uORFs we

compared the ribosome densities between transcripts con-taining at least one uORF (N=1092) and those withoutuORFs (N=3817) In the BF median ribosome densityacross the CDSs of transcripts with uORFs was3031 rpkm compared with 4486 rpkm for genes withoutuORFs (Table 2) In the PF median ribosome densitieswere 2001 rpkm (genes with uORF) and 3650 rpkm(genes without uORF) Thus in both life cycle stages weobserved a higher ribosome density for genes withoutuORFs than for genes with uORFs (Plt 00001) While

we also observed a higher mRNA level for geneswithout uORFs compared with genes with uORFs(Plt 00001) the differences in mRNA levels were lowerthan the differences in ribosome density (Table 2) Thusour data indicate that median translational efficiencyis higher for genes without uORF than for genes withuORF suggesting that the presence of uORFs may nega-tively impact translation of downstream CDSs

Noteworthy while median ribosome density for CDSswas higher in the BF than in the PF (Table 2) we observedthe opposite for 50 UTRs (Plt 00001) For 50 UTRs wefound a higher ribosome density in the PF (median rpkm773) than in the BF (median rpkm 289 SupplementaryTable S4 for an example see Figure 4C) In addition wefound that in the BF 50 UTRs with uORF (408 rpkm)have a higher ribosome density than 50 UTRs withoutuORFs (254 rpkm Plt 00001) Unexpectedly thecontrary was true in PF In the PF we found 50 UTRswithout uORF (834 rpkm) to have a higher ribosomedensity than 50 UTRs with uORF (501 rpkmPlt 00001) For an example of a non-AUG uORF seeFigure 4C right panel Many of such non-AUG uORFshave been observed in yeast and mice embryonic stem cells(mESCs) Interestingly an increase in 50 UTR translationsimilar to what we observed in the PF compared with theBF was observed in yeast on starvation and a decreasein 50 UTR translation was observed on differentiation ofpluripotent mESCs into embryoid bodies (4546)

Genome-wide analyses to prove or disprove a directcorrelation between translation of uORFs and

00

04

08

411000Chr I

411500 412000

00

02

04

00

04

08

Chr I420000 420200 420400

0

2

4

foot

prin

t (R

PM

)fo

otpr

int (

RP

M)

BF BF

PF PF

Tb92711610 Tb92711670

-24 -18 -12 -6 0 6 12

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

uORF

uORF

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

A B

C

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

-24 -18 -12 -6 0 6

Figure 4 Ribosome footprints reveal translation of uORFs(A) Alignment of the 50 nucleotides from ribosome footprint readsthat map close to translation start or translation termination sites ofuORFs (B) Percentage of position of sequence reads relative to readingframe (C) Ribosome profiles of two genes with uORF (left panel) andwithout uORF (right panel) Narrow grey boxes represent 50 UTRsgreen box represents AUG-codon red box represents terminationcodon

3630 Nucleic Acids Research 2014 Vol 42 No 6

translational efficiency is complicated by the existence ofmultiple 50 UTR isoforms resulting from widespreaddifferential trans-splicing (353676) At the same timethe heterogeneity in UTR lengths raises the intriguingpossibility that inclusion or exclusion of an uORFduring alternative trans-splicing may represent a regula-tory mechanism to modulate translational efficiency

Ribosome footprints reveal the presence of hundreds ofpreviously un-annotated CDSs

Even though small proteins (lt200 aa) have been shown toplay major roles in plant and animal development (7778)the complexity of the short proteome remains largelyunexplored because algorithms to reliably predict shortCDSs are lacking (79) We observed that while almostall ribosome footprints aligned to annotated features inthe BF 101 197 reads (023) and in the PF 1 116 030reads (144) did not It is noteworthy that just as weobserved for 50 UTRs the percentage of reads not aligningto CDSs is higher in the PF than in the BF We suspectthat many of these reads originate from un-annotatedUTRs but some may stem from previously un-annotatedCDSs A previous RNA-sequencing-based analysisdetected 1114 new un-annotated transcripts in the PF1011 of which had the potential to encode one or morepeptides ( 25 aa) Using available proteomics data theauthors were able to confirm translation for 19 of the 1114transcripts (13) Given our ability to determine ribosomepositions at nucleotide resolution we looked for new pre-viously un-annotated CDSs

We mapped ribosome footprint reads from the BF andPF and determined the ribosome footprint density for allpotential CDSs at least 10 aa in length To avoid overlapwith annotated CDSs we only considered ORFs locatedat least 20 nt away from annotated features Thisapproach enabled us to identify a set of 2021 candidateCDSs with 2 read-coverage and 70 of the ORFcovered The candidate CDSs ranged in length from 10to 378 aa (average 30 aa Supplementary Table S5) For797 of the 1114 previously identified transcripts we foundthe average ribosome footprint read-coverage to be 2(Supplementary Table S6 Previously identified transcriptsfrom chromosome 10 were not considered because oursequencing reads were mapped to a newer version of the

chromosome 10 assembly and thus genomic coordinatesmight be incompatible)To determine whether our set of candidate CDSs is

translated we analysed published proteomics datasets(57) to search for protein products For 24 of the 2021candidate CDSs (average size 117 aa 13 kDa) proteinproducts could be identified however four of theseidentified peptides also matched annotated CDSs Inaddition we identified protein products for 31 uORFs(Supplementary Table S4) We suspect that the lownumber of identified protein products relative to thelarge number of candidate CDSs may be caused (i) bythe experimental set-up of the original proteomicsanalysis where proteins were separated by 1D-SDS-PAGE prior to in-gel protease digest and mass spectro-metric analysis whereby proteins with a size smaller than5ndash10 kDa typically run out of the gel (ii) by the limitedsensitivity of mass spectrometric analyses compared withDNA sequencing-based techniques and (iii) by the factthat not all ribosome-protected RNAs are translatedThe first point is supported by the fact that only 67

(3) of our candidate CDSs were 90 aa (10 kDa) butfor 28 of those large candidate CDSs we could identifypeptides Evidence supporting the possibility that not allribosome-protected RNAs are translated exists in micewhere even known long ncRNAs have been found to beassociated with ribosomes (46) To evaluate the codingpotential of a transcript and to separate long ncRNAfrom small coding genes a new metric was establishedthe so-called RRS (53) The RRS takes advantage of thefact that translating ribosomes are released on encounter-ing a stop codon This release results in a sharp decrease inribosome occupancy between protein-coding regions andthe subsequent 30 UTR Consequently the RRS wasdefined as the ratio of footprint reads in the putativeCDS to footprint reads in the corresponding 30 UTRdivided by the ratio of RNA reads in the putative CDSto RNA reads in the corresponding 30 UTR (seelsquoMaterials and Methodsrsquo section)Similarly to annotated CDSs and uORFs our averaged

ribosome footprint data indicated a well-defined 3-nt peri-odicity for sequence reads occurring near translationinitiation sites However we observed a less well-defineddrop in ribosome densities across translation terminationsites (Figure 5A and B) which may indicate a lack of

Table 2 Translational efficiency of genes with and without uORF

Transcripts withuORF (N=3817)

Transcripts withoutuORF (N=1092)

MannndashWhitney test

Bloodstream formRibosome density (median rpkm) 3031 4466 Plt 00001mRNA levels (median rpkm) 3083 3613 Plt 00001Translational efficiency (ribosome densitymRNA levels) 100 127 Plt 00001

Procyclic formRibosome density (median rpkm) 2001 3650 Plt 00001mRNA levels (median rpkm) 2404 3038 Plt 00001Translational efficiency (ribosome densitymRNA levels) 087 126 Plt 00001

Reads mapping within the first 40 nt of a CDS were not considered for the measurements of translational efficiency Number of genes with annotated50 UTR N=4909

Nucleic Acids Research 2014 Vol 42 No 6 3631

productive translation Because we observed an abruptdrop in footprint density downstream of the stop codonof annotated genes in T brucei (Figure 1A and B) wedecided to apply the same parameters to evaluate the like-lihood that ORFs with high footprint density are indeedbeing translated into functional proteins Again candidateCDSs were defined as ORFs with a minimum length of10 aa and an average ribosome footprint coverage of 2over at least 70 of the ORF To define the correspond-ing putative 30 UTR we applied the same parameters usedin mice (53) ie the 30 UTR was defined as the regionbeginning immediately downstream of the ORF andending at the first subsequent start codon (in anyreading frame) A limitation of the RRS is that it canonly be calculated for an ORF if at least one entire foot-print read and one RNA read map to the putative CDSand the corresponding 30 UTR Because many 30 UTRswere too short to fulfill this criterion or simply did notcontain any footprint reads a sign for efficient translationtermination we were able to determine the RRS for only

1445 of genes annotated in T brucei (SupplementaryTable S7) For these 1445 genes the median RRS was839 compared with an RRS of 019 for the subset ofgenes annotated as hypothetical unlikely (N=90) Thussimilar to the observation in mice the RRS appears to bea good predictor of productive translation Next wedetermined the RRS for the previously un-annotated can-didate CDSs that fulfilled the criteria for RRS calculation(N=265) The median RRS of these candidate CDSs waswith 399 lower than the median RRS of the annotatedgenes (839) Thirteen candidate CDSs had an RRS largerthan 839 and 98 candidate CDSs had an RRS higher thanthe well-described intron-containing poly(A) polymerase(RRS=963) The RRS could only be calculated forone of the candidate CDSs for which we had identifiedprotein products (RRS=5744)

Finally in an attempt to learn about the function of theputative CDSs we searched for known protein motifsin those candidate CDSs for which we had identifiedprotein products and for those with an RRS above 10However except for one candidate CDS located withinthe histone H2B gene array and which contained anH2B signature motif (Supplementary Table S5) nomotifs were identified

Taken together proteomics data and a well-definedribosome release (high RRS) suggest extensive translationbeyond the annotated CDSs in T brucei

Newly identified CDSs are important for parasite fitness

The combination of ribosome profiling data with previ-ously published mass spectrometric data allowed us toidentify a large number of new CDSs Nonetheless the bio-logical significance of theseCDSs remains to be determinedIn addition while ORFs with a high ribosome footprintdensity a low RRS and without mass spectrometricevidence may not be translated into functional proteinsthey may nevertheless play important regulatory roles

Therefore we decided to evaluate the importance of ourcandidate CDSs independent of RRS and mass spectro-metric evidence in parasite survival A previously pub-lished high-throughput phenotyping approach termedRIT-seq measured the fitnessndashcost associations usingRNA interference (RNAi) The RIT-seq data revealed asignificant loss in fitness on RNAi induction of 2724 CDSsin the BF 1972 CDSs in the PF and 2677 CDSs in inducedand differentiated parasites (54) The approach is based onthe integration of RNAi libraries into trypanosomes and acomparison of the recovery of RNAi targets from tryp-anosome populations before and after RNAi inductionThe RNAi library was generated using genomic DNAbut the effect on parasite fitness was only determined forannotated genes Re-analysing the same datasets we finda significant loss of fitness on RNAi induction for 214putative CDSs in the BF (6 days after RNAi induction)16 CDSs in the PF and 227 CDSs on differentiation(Supplementary Table S8) For examples of candidateCDSs potentially essential for viability in the BF seeFigure 5C It is important to note that the RNAi libraryused for the RIT-seq experiments contains a median insertsize of 600 bp (average 1 kb) thus somewhat limiting the

01234

0

4

8

0005101520

02468

05

101520

02468

Chr III

622400

622000

622800

0

4

8

Chr X

2577800

2578200

01234

mR

NA

fo

otpr

int

RN

Ai r

eads

R

NA

i rea

ds

A

C

B

putative CDS putative CDS

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

-24 -18 -12 -6 0 6 12 -24 -18 -12 -6 0 6

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

putative CDS

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

(RPM)

Figure 5 Ribosome footprints reveal previously un-annotated CDSs(A) Alignment of the 50 nucleotides from ribosome footprint reads thatmap close to translation start or translation termination sites of candi-date CDSs (B) Percentage of position of sequence reads relative toreading frame (C) Ribosome mRNA and RIT-seq (RNAi targetsequencing) profiles of two previously un-annotated putative CDSs

3632 Nucleic Acids Research 2014 Vol 42 No 6

resolution of the RIT-seq data (8081) Nevertheless thesefindings suggest an important biological role for a subsetof genes that have been missed in previous genome anno-tations Intriguingly the RNAi data reveal large develop-mental differences in the fitness costs associated with thedown-regulation of putative CDSs suggesting that the setof candidate CDSs may be more important during themammalian stage

DISCUSSION

In this study we report the first genome-wide analysis ofprotein synthesis and a strand-specific analysis of RNAtranscript levels for T brucei It is the first such analysisfor a eukaryotic pathogen and an organism without tran-scriptional control Our data reveal large differences intranslational efficiency among transcripts in the same lifecycle stage and between the same transcript in differentstages In addition the sequencing of ribosome-protectedRNA enabled us to identify transcripts that are likely tobe translated Hundreds of putative previously unidenti-fied CDSs appear to be essential for parasite fitness andthe analysis of available proteomics data confirmed theexistence of at least 20 of these previously unknown CDSs

Eukaryotic gene expression is regulated at multiplelevels with translation of mRNA into proteins beingone of the most important (17) The ability to determineboth translatome and transcriptome enabled us toevaluate the efficiency with which individual transcriptsare translated into proteins and suggests translationalcontrol to be an important regulatory mechanism inT brucei Under a single condition we observed transla-tional efficiency to vary over two orders of magnitudethus its importance equals that of RNA stability forwhich a similar range has been measured (82) Howeverwe observed no correlation between RNA abundance andtranslational efficiency suggesting that translational effi-ciency is regulated independently of RNA stability

To use ribosome density to estimate the rate of proteinsynthesis the speed of translation must be constant Thisassumption is supported by measurements in mouse em-bryonic stem cells that demonstrate the rate of translationto be consistent between different classes of mRNA thekinetics of elongation to be independent of length andprotein abundance and the speed of translation to be in-dependent of codon usage (4664) Nevertheless the speedof translation may very well be different between the twolife cycle stages of the parasite that live at 37C and 27CFurthermore our approach does not permit measurementof the absolute mRNA and footprint abundance for theindividual life cycle stages making direct comparisons oftranslation efficiencies for individual transcripts unreli-able Therefore we did not compare translational effi-ciency directly but rather determined the lsquorankrsquo intranslational efficiency for all genes in each life cyclestage assuming that a lsquochange in rankrsquo indicated lifecycle-dependent translational control While we see ageneral positive correlation in translational efficiencybetween the two life cycle stages (Pearson r=07428)for a large number of genes we observed developmental

regulation of translation Importantly the regulation weobserve agrees well with that reported in previous studiesfor a small subset of genes but also includes many proteinspreviously unknown to be developmentally regulated Inaddition the efficient translation of proteins annotated ascytochrome oxidases in the PF and of enzymes importantfor glycolysis in the BF is consistent with differences inenergy metabolism with the BF entirely dependent on gly-colysis to generate energyHow translational control is achieved in T brucei

remains to be seen but our analysis suggests that uORFmay be an important contributor Work in several organ-isms has shown that generally the presence of uORFs cor-relates with reduced gene expression (71) However forsome genes like the transcription factor GCN4 in yeastand the activating transcription factor ATF4 in mammalsthis correlation is reversed on stress (8384) Thus uORFcan exert positive and negative effects on translation ofdownstream CDSs Based on previous 50 UTR annota-tions we identified uORFs in 22 of T brucei genesbut it will be interesting to learn if the regulatory effectsof uORFs may be supplemented by translation from non-AUG uORFs Our ribosome profiling data indicate anincrease in translation from 50 UTRs and a more ubiqui-tous translation initiation from non-AUG sites in thePF compared with the BF Similarly an increase in thetranslation from 50 UTRs has been observed in yeast onstarvation (45) One of the most highly conserved stress-induced mechanisms to regulate translation involves theinactivation of the translation initiation factor eIF2aby phosphorylation at a conserved serine (8586)Phosphorylation of the T brucei and L major orthologsoccurs at the corresponding Thr residues (687) but thegeneration of a T brucei cell line exclusively expressing amutant form of eIF2a demonstrated that phosphorylationof eIF2a is not required for heat-induced translationalarrest or for the formation of heat shock stress granulesin T brucei (88) Given the finding that eIF2a may also beimportant in determining the stringency in initiator codonselection (89) it will be interesting to see whether eIF2aphosphorylation affects the selection of initiator codonsand whether this plays a role in the fine-tuning of geneexpressionBesides being a valuable tool to study translational

regulation ribosome profiling will also be invaluable forthe discovery of small peptides While only 2 of thehuman genome contains annotated CDSs (90) transcrip-tome analyses have revealed almost genome-wide tran-scription of RNA mostly considered to be non-protein-coding In lower organisms the percentage of thegenome encoding for proteins is higher (629192) butpervasive transcription of non-protein coding regionscan be found in essentially all organisms (93) The vastamount of ncRNA has triggered great interest inelucidating the biological significance of these transcriptsand has led to the identification of numerous regulatorymechanisms controlled by short and long ncRNAInterestingly the majority of transcription occurs aslong ncRNA in humans (94) many of which containORFs making it difficult to unambiguously classify a tran-script as non-protein-coding To distinguish long ncRNA

Nucleic Acids Research 2014 Vol 42 No 6 3633

from mRNA ORF length has served as the mostcommonly applied criterion because even without select-ive pressure short ORFs will occur frequently by chancewhile the probability that long ORFs occur by chance arelow Nevertheless recent findings reveal that eukaryoticgenomes contain a large number of small CDSs (95ndash97)and that small peptides play important biological roles(77789899) In addition numerous long ncRNAs con-taining ORFs longer than 100 aa have been found to as-sociate with ribosomes (4653) While these findingscomplicate the annotation of CDSs and demand for newapproaches to annotate genes they raise the exciting pos-sibility that a large number of unknown small CDSs existsand awaits discoveryFor this study RNA was obtained from a 427 strain

the most common laboratory strain and sequence readswere aligned to the more complete genome of the 927strain nevertheless RNA-sequencing analysis indicateswidespread transcription Thus as seen in other organ-isms active transcription is not necessarily a good pre-dictor of CDSs In contrast we found ribosomefootprint density to be highly enriched across annotatedCDSs excluding introns It should thus serve as a reliabletool to identify novel genes To search for new CDSs andto evaluate their biological function we combinedribosome profiling proteomics and genome-wide RNAidata While proteomics data allowed us to identifyprotein products for only 20 relatively large candidateCDSs the experimental set-up of the proteomic analysisthat involved the removal of small peptides made itimpossible to verify the existence of many small CDSsThus more important than the verification of putativeCDSs with proteomics data may be our finding thatgt200 of the putative CDSs appear to be essential forparasite fitness Furthermore RIT-seq data suggest thatcandidate CDSs are 13-fold less likely to be essentialfor viability in the PF than in the BF or during differen-tiation of cells from BF to PF These findings point to apossible role of small peptides in the adaptation of tryp-anosomes to life in the mammalian host Generally ourdata suggest the existence of a large yet to be exploredsmall proteome but further characterizations of thesmall proteome will have to include a more targetedproteomics analysisIn this analysis we focused on measuring translational

control and the identification of new CDSs Howeverribosome profiling data should also aid in the identifica-tion of bona fide ncRNAs For many annotated CDSs weobserved RNA transcripts but not ribosome footprintssuggesting the presence of ncRNA Though given thatwe have only analysed two life cycle stages theseputative ncRNAs may be translated during other stagesof the parasitersquos life cycle Thus a comprehensive searchfor ncRNA should include ribosome profiling analyses ofall life cycle stages

ACCESSION NUMBERS

All sequencing data have been deposited in the EuropeanNucleotide Archive PRJEB4801

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Stan Gorski Susanne Kramer Christian Janzenand Jose-Juan Lopez-Rubio for valuable discussions andcritical reading of the manuscript and Yanjie Chao andJan Medenbach for much appreciated advice ongenerating ribosome profiling libraries

FUNDING

Young Investigator Program of the Research Center ofInfectious Diseases (ZINF) of the University ofWuerzburg Germany German Research FoundationDFG [SI 16102-1] Human Frontier Science Program(to TNS) French National Research Agency [ANR-2010-GENM-011-01 GENAMIBE to CCH] Fundingfor open access charge German Research Foundation(DFG) and the University of Wuerzburg in the fundingprogramme Open Access Publishing

Conflict of interest statement None declared

REFERENCES

1 MooreMJ (2005) From birth to death the complex lives ofeukaryotic mRNAs Science 309 1514ndash1518

2 NagarajN WisniewskiJR GeigerT CoxJ KircherMKelsoJ PaaboS and MannM (2011) Deep proteome andtranscriptome mapping of a human cancer cell line Mol SystBiol 7 548

3 LuP VogelC WangR YaoX and MarcotteEM (2007)Absolute protein expression profiling estimates the relativecontributions of transcriptional and translational regulationNat Biotechnol 25 117ndash124

4 de GodoyLM OlsenJV CoxJ NielsenML HubnerNCFrohlichF WaltherTC and MannM (2008) Comprehensivemass-spectrometry-based proteome quantification of haploidversus diploid yeast Nature 455 1251ndash1254

5 McNicollF DrummelsmithJ MullerM MadoreE BoilardNOuelletteM and PapadopoulouB (2006) A combinedproteomic and transcriptomic approach to the study ofstage differentiation in Leishmania infantum Proteomics 63567ndash3581

6 LahavT SivamD VolpinH RonenM TsigankovPGreenA HollandN KuzykM BorchersC ZilbersteinDet al (2011) Multiple levels of gene regulation mediatedifferentiation of the intracellular pathogen LeishmaniaFASEB J 25 515ndash525

7 SchwanhausserB BusseD LiN DittmarG SchuchhardtJWolfJ ChenW and SelbachM (2011) Global quantificationof mammalian gene expression control Nature 473 337ndash342

8 FevreEM WissmannBV WelburnSC and LutumbaP (2008)The burden of human African trypanosomiasis PLoS Negl TropDis 2 e333

9 LozanoR NaghaviM ForemanK LimS ShibuyaKAboyansV AbrahamJ AdairT AggarwalR AhnSY et al(2012) Global and regional mortality from 235 causes ofdeath for 20 age groups in 1990 and 2010 a systematic analysisfor the Global Burden of Disease Study 2010 Lancet 3802095ndash2128

10 Martinez-CalvilloS YanS NguyenD FoxM StuartK andMylerPJ (2003) Transcription of Leishmania major Friedlinchromosome 1 initiates in both directions within a single regionMol Cell 11 1291ndash1299

3634 Nucleic Acids Research 2014 Vol 42 No 6

11 Martinez-CalvilloS NguyenD StuartK and MylerPJ (2004)Transcription initiation and termination on Leishmania majorchromosome 3 Eukaryot Cell 3 506ndash517

12 SiegelTN HekstraDR KempLE FigueiredoLMLowellJE FenyoD WangX DewellS and CrossGAM(2009) Four histone variants mark the boundaries of polycistronictranscription units in Trypanosoma brucei Genes Dev 231063ndash1076

13 KolevNG FranklinJB CarmiS ShiH MichaeliS andTschudiC (2010) The transcriptome of the human pathogenTrypanosoma brucei at single-nucleotide resolutionPLoS Pathog 6 e1001090

14 LeBowitzJH SmithHQ RuscheL and BeverleySM (1993)Coupling of poly(A) site selection and trans-splicing inLeishmania Genes Dev 7 996ndash1007

15 UlluE MatthewsKR and TschudiC (1993) Temporal order ofRNA-processing reactions in trypanosomes rapid trans splicingprecedes polyadenylation of newly synthesized tubulin transcriptsMol Cell Biol 13 720ndash725

16 MatthewsKR TschudiC and UlluE (1994) A commonpyrimidine-rich motif governs trans-splicing and polyadenylationof tubulin polycistronic pre-mRNA in trypanosomes Genes Dev8 491ndash501

17 WrightJR SiegelTN and CrossGAM (2010) Histone H3trimethylated at lysine 4 is enriched at probable transcriptionstart sites in Trypanosoma brucei Mol Biochem Parasitol 136434ndash450

18 ClaytonC and ShapiraM (2007) Post-transcriptional regulationof gene expression in trypanosomes and leishmanias MolBiochem Parasitol 156 93ndash101

19 MatthewsKR (2011) Controlling and coordinating developmentin vector-transmitted parasites Science 331 1149ndash1153

20 CrossGA KleinRA and LinsteadDJ (1975) Utilization ofamino acids by Trypanosoma brucei in culture L-threonine as aprecursor for acetate Parasitology 71 311ndash326

21 BrunR and SchonenbergerM (1979) Cultivation and in vitrocloning or procyclic culture forms of Trypanosoma brucei in asemi-defined medium Acta Trop 36 289ndash292

22 FurgerA SchurchN KurathU and RoditiI (1997) Elementsin the 30 untranslated region of procyclin mRNA regulateexpression in insect forms of Trypanosoma brucei bymodulating RNA stability and translation Mol Cell Biol 174372ndash4380

23 HehlA VassellaE BraunR and RoditiI (1994) A conservedstem-loop structure in the 30 untranslated region of procyclinmRNAs regulates expression in Trypanosoma brucei Proc NatlAcad Sci USA 91 370ndash374

24 HotzHR HartmannC HuoberK HugM and ClaytonC(1997) Mechanisms of developmental regulation in Trypanosomabrucei a polypyrimidine tract in the 30-untranslated region of asurface protein mRNA affects RNA abundance and translationNucleic Acids Res 25 3017ndash3026

25 MayhoM FennK CraddyP CrosthwaiteS and MatthewsK(2006) Post-transcriptional control of nuclear-encoded cytochromeoxidase subunits in Trypanosoma brucei evidence for genome-wide conservation of life-cycle stage-specific regulatory elementsNucleic Acids Res 34 5312ndash5324

26 WalradP PaterouA Acosta-SerranoA and MatthewsKR(2009) Differential trypanosome surface coat regulation by aCCCH protein that co-associates with procyclin mRNA cis-elements PLoS Pathog 5 e1000317

27 HelmJR WilsonME and DonelsonJE (2009) Differentialexpression of a protease gene family in African trypanosomesMol Biochem Parasitol 163 8ndash18

28 HornD (2008) Codon usage suggests that translational selectionhas a major impact on protein expression in trypanosomatidsBMC Genomics 9 2

29 ManfulT FaddaA and ClaytonC (2011) The role of the 50-30

exoribonuclease XRNA in transcriptome-wide mRNAdegradation RNA 17 2039ndash2047

30 YoffeY ZuberekJ LewdorowiczM ZeiraZ KeasarCOrr-DahanI Jankowska-AnyszkaM StepinskiJDarzynkiewiczE and ShapiraM (2004) Cap-binding activity ofan eIF4E homolog from Leishmania RNA 10 1764ndash1775

31 YoffeY ZuberekJ LererA LewdorowiczM StepinskiJAltmannM DarzynkiewiczE and ShapiraM (2006) Bindingspecificities and potential roles of isoforms of eukaryotic initiationfactor 4E in Leishmania Eukaryot Cell 5 1969ndash1979

32 De GaudenziJ FraschAC and ClaytonC (2005) RNA-bindingdomain proteins in Kinetoplastids a comparative analysisEukaryot Cell 4 2106ndash2114

33 DhaliaR ReisCR FreireER RochaPO KatzRMunizJR StandartN and de Melo NetoOP (2005)Translation initiation in Leishmania major characterisation ofmultiple eIF4F subunit homologues Mol Biochem Parasitol140 23ndash41

34 ZinovievA and ShapiraM (2012) Evolutionary conservation anddiversification of the translation initiation apparatus intrypanosomatids Comp Funct Genomics 2012 813718

35 SiegelTN HekstraDR WangX DewellS and CrossGAM(2010) Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicingand polyadenylation sites Nucleic Acids Res 38 4946ndash4957

36 NilssonD GunasekeraK ManiJ OsterasM FarinelliLBaerlocherL RoditiI and OchsenreiterT (2010) Spliced leadertrapping reveals widespread alternative splicing patterns in thehighly dynamic transcriptome of Trypanosoma brucei PLoSPathog 6 e1001037

37 SiegelTN GunasekeraK CrossGA and OchsenreiterT(2011) Gene expression in Trypanosoma brucei lessons fromhigh-throughput RNA sequencing Trends Parasitol 27 434ndash441

38 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principles ofits regulation Nat Rev Mol Cell Biol 11 113ndash127

39 KozakM (1984) Selection of initiation sites by eucaryoticribosomes effect of inserting AUG triplets upstream from thecoding sequence for preproinsulin Nucleic Acids Res 123873ndash3893

40 SomersJ PoyryT and WillisAE (2013) A perspective onmammalian upstream open reading frame function Int JBiochem Cell Biol 45 1690ndash1700

41 SiegelTN TanKS and CrossGAM (2005) Systematic studyof sequence motifs for RNA trans splicing in Trypanosomabrucei Mol Cell Biol 25 9586ndash9594

42 AravaY WangY StoreyJD LiuCL BrownPO andHerschlagD (2003) Genome-wide analysis of mRNA translationprofiles in Saccharomyces cerevisiae Proc Natl Acad Sci USA100 3889ndash3894

43 CapewellP MonkS IvensA MacgregorP FennKWalradP BringaudF SmithTK and MatthewsKR (2013)Regulation of trypanosoma brucei total and polysomal mRNAduring development within its mammalian host PLoS One 8e67069

44 BrechtM and ParsonsM (1998) Changes in polysome profilesaccompany trypanosome development Mol Biochem Parasitol97 189ndash198

45 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

46 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

47 OhE BeckerAH SandikciA HuberD ChabaR GlogeFNicholsRJ TypasA GrossCA KramerG et al (2011)Selective ribosome profiling reveals the cotranslational chaperoneaction of trigger factor in vivo Cell 147 1295ndash1308

48 MichelAM and BaranovPV (2013) Ribosome profiling aHi-Def monitor for protein synthesis at the genome-wide scaleWiley Interdiscip Rev RNA 349 4184ndash4188

49 IngoliaNT (2010) Genome-wide translational profiling byribosome footprinting Methods Enzymol 470 119ndash142

50 IngoliaNT BrarGA RouskinS McGeachyAM andWeissmanJS (2012) The ribosome profiling strategy formonitoring translation in vivo by deep sequencing ofribosome-protected mRNA fragments Nat Protoc 7 1534ndash1550

51 LangmeadB and SalzbergSL (2012) Fast gapped-readalignment with Bowtie 2 Nat Methods 9 357ndash359

Nucleic Acids Research 2014 Vol 42 No 6 3635

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 8: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

how protein translation is regulated For a small numberof genes in T brucei sequence motifs in the 30 UTR havebeen linked to developmentally regulated changes in trans-lational efficiency but for the large majority of genes nosuch motifs have been identified (70) Numerous reportsfrom other eukaryotes demonstrated that uORFs canfunction as important regulators of gene expression (71)For example analysis of mRNA and protein levels fromgt10 000 mammalian genes revealed that the presence ofan uORF correlates with a significant reduction of expres-sion from the downstream CDS Furthermore usingreporter constructs it was estimated that the presence ofan uORF results in a decrease in gene expression of 30ndash80 with only a minor reduction of mRNA levels (72)No such analyses have been performed for T brucei but ithas been shown that the removal of a uATG leads to a 7-fold increase in protein levels of a luciferase reporter (41)In this manuscript we use the term ORF or uORF to referto regions of DNA sequences beginning with an ATG andending with a termination codon that may or may notencode for proteins The term coding sequence (CDS) isused to refer to ORFs that encode proteins MinimallyuORFs consist of 9 nt containing an upstream start codon(uATG) an additional sense codon and a terminationcodon whereby the termination codon may be locateddownstream of the start codon of the main CDS (73)Even though 50 UTRs have only been assigned for 4909

genes (927 version 42) applying the criteria listed abovewe identified 8310 uORFs and found 1092 (22) of 50

UTRs to contain at least one uORF (SupplementaryTable S4) Thus uORF are much less common inT brucei than in mammals where 40ndash50 of genescontain uORFs (7475) Nevertheless while uORFscannot explain the translational regulation for all tran-scripts they may represent one of several factors import-ant for the regulation of gene expression in T bruceiTo determine the degree to what uORFs are translated

a prerequisite for affecting translational efficiency of thedownstream CDS we calculated the ribosome densityacross uORFs and identified 1834 uORFs (22) with 2 read-coverage and 70 of the ORF coveredThe average length of these uORFs was 22 aa Just likefor annotated CDSs the alignment of ribosome profilingreads across these uORFs revealed a distinct 3-nt period-icity with 63 of BF ribosome profiling reads starting atthe first nucleotide of a codon (Figure 4A and B) Whilethe periodicity of ribosome footprints across uORFs wasslightly less pronounced than for annotated CDS ourdata suggest that a large proportion of uORFs istranslatedTo evaluate the regulatory potential of uORFs we

compared the ribosome densities between transcripts con-taining at least one uORF (N=1092) and those withoutuORFs (N=3817) In the BF median ribosome densityacross the CDSs of transcripts with uORFs was3031 rpkm compared with 4486 rpkm for genes withoutuORFs (Table 2) In the PF median ribosome densitieswere 2001 rpkm (genes with uORF) and 3650 rpkm(genes without uORF) Thus in both life cycle stages weobserved a higher ribosome density for genes withoutuORFs than for genes with uORFs (Plt 00001) While

we also observed a higher mRNA level for geneswithout uORFs compared with genes with uORFs(Plt 00001) the differences in mRNA levels were lowerthan the differences in ribosome density (Table 2) Thusour data indicate that median translational efficiencyis higher for genes without uORF than for genes withuORF suggesting that the presence of uORFs may nega-tively impact translation of downstream CDSs

Noteworthy while median ribosome density for CDSswas higher in the BF than in the PF (Table 2) we observedthe opposite for 50 UTRs (Plt 00001) For 50 UTRs wefound a higher ribosome density in the PF (median rpkm773) than in the BF (median rpkm 289 SupplementaryTable S4 for an example see Figure 4C) In addition wefound that in the BF 50 UTRs with uORF (408 rpkm)have a higher ribosome density than 50 UTRs withoutuORFs (254 rpkm Plt 00001) Unexpectedly thecontrary was true in PF In the PF we found 50 UTRswithout uORF (834 rpkm) to have a higher ribosomedensity than 50 UTRs with uORF (501 rpkmPlt 00001) For an example of a non-AUG uORF seeFigure 4C right panel Many of such non-AUG uORFshave been observed in yeast and mice embryonic stem cells(mESCs) Interestingly an increase in 50 UTR translationsimilar to what we observed in the PF compared with theBF was observed in yeast on starvation and a decreasein 50 UTR translation was observed on differentiation ofpluripotent mESCs into embryoid bodies (4546)

Genome-wide analyses to prove or disprove a directcorrelation between translation of uORFs and

00

04

08

411000Chr I

411500 412000

00

02

04

00

04

08

Chr I420000 420200 420400

0

2

4

foot

prin

t (R

PM

)fo

otpr

int (

RP

M)

BF BF

PF PF

Tb92711610 Tb92711670

-24 -18 -12 -6 0 6 12

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

uORF

uORF

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

A B

C

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

-24 -18 -12 -6 0 6

Figure 4 Ribosome footprints reveal translation of uORFs(A) Alignment of the 50 nucleotides from ribosome footprint readsthat map close to translation start or translation termination sites ofuORFs (B) Percentage of position of sequence reads relative to readingframe (C) Ribosome profiles of two genes with uORF (left panel) andwithout uORF (right panel) Narrow grey boxes represent 50 UTRsgreen box represents AUG-codon red box represents terminationcodon

3630 Nucleic Acids Research 2014 Vol 42 No 6

translational efficiency is complicated by the existence ofmultiple 50 UTR isoforms resulting from widespreaddifferential trans-splicing (353676) At the same timethe heterogeneity in UTR lengths raises the intriguingpossibility that inclusion or exclusion of an uORFduring alternative trans-splicing may represent a regula-tory mechanism to modulate translational efficiency

Ribosome footprints reveal the presence of hundreds ofpreviously un-annotated CDSs

Even though small proteins (lt200 aa) have been shown toplay major roles in plant and animal development (7778)the complexity of the short proteome remains largelyunexplored because algorithms to reliably predict shortCDSs are lacking (79) We observed that while almostall ribosome footprints aligned to annotated features inthe BF 101 197 reads (023) and in the PF 1 116 030reads (144) did not It is noteworthy that just as weobserved for 50 UTRs the percentage of reads not aligningto CDSs is higher in the PF than in the BF We suspectthat many of these reads originate from un-annotatedUTRs but some may stem from previously un-annotatedCDSs A previous RNA-sequencing-based analysisdetected 1114 new un-annotated transcripts in the PF1011 of which had the potential to encode one or morepeptides ( 25 aa) Using available proteomics data theauthors were able to confirm translation for 19 of the 1114transcripts (13) Given our ability to determine ribosomepositions at nucleotide resolution we looked for new pre-viously un-annotated CDSs

We mapped ribosome footprint reads from the BF andPF and determined the ribosome footprint density for allpotential CDSs at least 10 aa in length To avoid overlapwith annotated CDSs we only considered ORFs locatedat least 20 nt away from annotated features Thisapproach enabled us to identify a set of 2021 candidateCDSs with 2 read-coverage and 70 of the ORFcovered The candidate CDSs ranged in length from 10to 378 aa (average 30 aa Supplementary Table S5) For797 of the 1114 previously identified transcripts we foundthe average ribosome footprint read-coverage to be 2(Supplementary Table S6 Previously identified transcriptsfrom chromosome 10 were not considered because oursequencing reads were mapped to a newer version of the

chromosome 10 assembly and thus genomic coordinatesmight be incompatible)To determine whether our set of candidate CDSs is

translated we analysed published proteomics datasets(57) to search for protein products For 24 of the 2021candidate CDSs (average size 117 aa 13 kDa) proteinproducts could be identified however four of theseidentified peptides also matched annotated CDSs Inaddition we identified protein products for 31 uORFs(Supplementary Table S4) We suspect that the lownumber of identified protein products relative to thelarge number of candidate CDSs may be caused (i) bythe experimental set-up of the original proteomicsanalysis where proteins were separated by 1D-SDS-PAGE prior to in-gel protease digest and mass spectro-metric analysis whereby proteins with a size smaller than5ndash10 kDa typically run out of the gel (ii) by the limitedsensitivity of mass spectrometric analyses compared withDNA sequencing-based techniques and (iii) by the factthat not all ribosome-protected RNAs are translatedThe first point is supported by the fact that only 67

(3) of our candidate CDSs were 90 aa (10 kDa) butfor 28 of those large candidate CDSs we could identifypeptides Evidence supporting the possibility that not allribosome-protected RNAs are translated exists in micewhere even known long ncRNAs have been found to beassociated with ribosomes (46) To evaluate the codingpotential of a transcript and to separate long ncRNAfrom small coding genes a new metric was establishedthe so-called RRS (53) The RRS takes advantage of thefact that translating ribosomes are released on encounter-ing a stop codon This release results in a sharp decrease inribosome occupancy between protein-coding regions andthe subsequent 30 UTR Consequently the RRS wasdefined as the ratio of footprint reads in the putativeCDS to footprint reads in the corresponding 30 UTRdivided by the ratio of RNA reads in the putative CDSto RNA reads in the corresponding 30 UTR (seelsquoMaterials and Methodsrsquo section)Similarly to annotated CDSs and uORFs our averaged

ribosome footprint data indicated a well-defined 3-nt peri-odicity for sequence reads occurring near translationinitiation sites However we observed a less well-defineddrop in ribosome densities across translation terminationsites (Figure 5A and B) which may indicate a lack of

Table 2 Translational efficiency of genes with and without uORF

Transcripts withuORF (N=3817)

Transcripts withoutuORF (N=1092)

MannndashWhitney test

Bloodstream formRibosome density (median rpkm) 3031 4466 Plt 00001mRNA levels (median rpkm) 3083 3613 Plt 00001Translational efficiency (ribosome densitymRNA levels) 100 127 Plt 00001

Procyclic formRibosome density (median rpkm) 2001 3650 Plt 00001mRNA levels (median rpkm) 2404 3038 Plt 00001Translational efficiency (ribosome densitymRNA levels) 087 126 Plt 00001

Reads mapping within the first 40 nt of a CDS were not considered for the measurements of translational efficiency Number of genes with annotated50 UTR N=4909

Nucleic Acids Research 2014 Vol 42 No 6 3631

productive translation Because we observed an abruptdrop in footprint density downstream of the stop codonof annotated genes in T brucei (Figure 1A and B) wedecided to apply the same parameters to evaluate the like-lihood that ORFs with high footprint density are indeedbeing translated into functional proteins Again candidateCDSs were defined as ORFs with a minimum length of10 aa and an average ribosome footprint coverage of 2over at least 70 of the ORF To define the correspond-ing putative 30 UTR we applied the same parameters usedin mice (53) ie the 30 UTR was defined as the regionbeginning immediately downstream of the ORF andending at the first subsequent start codon (in anyreading frame) A limitation of the RRS is that it canonly be calculated for an ORF if at least one entire foot-print read and one RNA read map to the putative CDSand the corresponding 30 UTR Because many 30 UTRswere too short to fulfill this criterion or simply did notcontain any footprint reads a sign for efficient translationtermination we were able to determine the RRS for only

1445 of genes annotated in T brucei (SupplementaryTable S7) For these 1445 genes the median RRS was839 compared with an RRS of 019 for the subset ofgenes annotated as hypothetical unlikely (N=90) Thussimilar to the observation in mice the RRS appears to bea good predictor of productive translation Next wedetermined the RRS for the previously un-annotated can-didate CDSs that fulfilled the criteria for RRS calculation(N=265) The median RRS of these candidate CDSs waswith 399 lower than the median RRS of the annotatedgenes (839) Thirteen candidate CDSs had an RRS largerthan 839 and 98 candidate CDSs had an RRS higher thanthe well-described intron-containing poly(A) polymerase(RRS=963) The RRS could only be calculated forone of the candidate CDSs for which we had identifiedprotein products (RRS=5744)

Finally in an attempt to learn about the function of theputative CDSs we searched for known protein motifsin those candidate CDSs for which we had identifiedprotein products and for those with an RRS above 10However except for one candidate CDS located withinthe histone H2B gene array and which contained anH2B signature motif (Supplementary Table S5) nomotifs were identified

Taken together proteomics data and a well-definedribosome release (high RRS) suggest extensive translationbeyond the annotated CDSs in T brucei

Newly identified CDSs are important for parasite fitness

The combination of ribosome profiling data with previ-ously published mass spectrometric data allowed us toidentify a large number of new CDSs Nonetheless the bio-logical significance of theseCDSs remains to be determinedIn addition while ORFs with a high ribosome footprintdensity a low RRS and without mass spectrometricevidence may not be translated into functional proteinsthey may nevertheless play important regulatory roles

Therefore we decided to evaluate the importance of ourcandidate CDSs independent of RRS and mass spectro-metric evidence in parasite survival A previously pub-lished high-throughput phenotyping approach termedRIT-seq measured the fitnessndashcost associations usingRNA interference (RNAi) The RIT-seq data revealed asignificant loss in fitness on RNAi induction of 2724 CDSsin the BF 1972 CDSs in the PF and 2677 CDSs in inducedand differentiated parasites (54) The approach is based onthe integration of RNAi libraries into trypanosomes and acomparison of the recovery of RNAi targets from tryp-anosome populations before and after RNAi inductionThe RNAi library was generated using genomic DNAbut the effect on parasite fitness was only determined forannotated genes Re-analysing the same datasets we finda significant loss of fitness on RNAi induction for 214putative CDSs in the BF (6 days after RNAi induction)16 CDSs in the PF and 227 CDSs on differentiation(Supplementary Table S8) For examples of candidateCDSs potentially essential for viability in the BF seeFigure 5C It is important to note that the RNAi libraryused for the RIT-seq experiments contains a median insertsize of 600 bp (average 1 kb) thus somewhat limiting the

01234

0

4

8

0005101520

02468

05

101520

02468

Chr III

622400

622000

622800

0

4

8

Chr X

2577800

2578200

01234

mR

NA

fo

otpr

int

RN

Ai r

eads

R

NA

i rea

ds

A

C

B

putative CDS putative CDS

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

-24 -18 -12 -6 0 6 12 -24 -18 -12 -6 0 6

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

putative CDS

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

(RPM)

Figure 5 Ribosome footprints reveal previously un-annotated CDSs(A) Alignment of the 50 nucleotides from ribosome footprint reads thatmap close to translation start or translation termination sites of candi-date CDSs (B) Percentage of position of sequence reads relative toreading frame (C) Ribosome mRNA and RIT-seq (RNAi targetsequencing) profiles of two previously un-annotated putative CDSs

3632 Nucleic Acids Research 2014 Vol 42 No 6

resolution of the RIT-seq data (8081) Nevertheless thesefindings suggest an important biological role for a subsetof genes that have been missed in previous genome anno-tations Intriguingly the RNAi data reveal large develop-mental differences in the fitness costs associated with thedown-regulation of putative CDSs suggesting that the setof candidate CDSs may be more important during themammalian stage

DISCUSSION

In this study we report the first genome-wide analysis ofprotein synthesis and a strand-specific analysis of RNAtranscript levels for T brucei It is the first such analysisfor a eukaryotic pathogen and an organism without tran-scriptional control Our data reveal large differences intranslational efficiency among transcripts in the same lifecycle stage and between the same transcript in differentstages In addition the sequencing of ribosome-protectedRNA enabled us to identify transcripts that are likely tobe translated Hundreds of putative previously unidenti-fied CDSs appear to be essential for parasite fitness andthe analysis of available proteomics data confirmed theexistence of at least 20 of these previously unknown CDSs

Eukaryotic gene expression is regulated at multiplelevels with translation of mRNA into proteins beingone of the most important (17) The ability to determineboth translatome and transcriptome enabled us toevaluate the efficiency with which individual transcriptsare translated into proteins and suggests translationalcontrol to be an important regulatory mechanism inT brucei Under a single condition we observed transla-tional efficiency to vary over two orders of magnitudethus its importance equals that of RNA stability forwhich a similar range has been measured (82) Howeverwe observed no correlation between RNA abundance andtranslational efficiency suggesting that translational effi-ciency is regulated independently of RNA stability

To use ribosome density to estimate the rate of proteinsynthesis the speed of translation must be constant Thisassumption is supported by measurements in mouse em-bryonic stem cells that demonstrate the rate of translationto be consistent between different classes of mRNA thekinetics of elongation to be independent of length andprotein abundance and the speed of translation to be in-dependent of codon usage (4664) Nevertheless the speedof translation may very well be different between the twolife cycle stages of the parasite that live at 37C and 27CFurthermore our approach does not permit measurementof the absolute mRNA and footprint abundance for theindividual life cycle stages making direct comparisons oftranslation efficiencies for individual transcripts unreli-able Therefore we did not compare translational effi-ciency directly but rather determined the lsquorankrsquo intranslational efficiency for all genes in each life cyclestage assuming that a lsquochange in rankrsquo indicated lifecycle-dependent translational control While we see ageneral positive correlation in translational efficiencybetween the two life cycle stages (Pearson r=07428)for a large number of genes we observed developmental

regulation of translation Importantly the regulation weobserve agrees well with that reported in previous studiesfor a small subset of genes but also includes many proteinspreviously unknown to be developmentally regulated Inaddition the efficient translation of proteins annotated ascytochrome oxidases in the PF and of enzymes importantfor glycolysis in the BF is consistent with differences inenergy metabolism with the BF entirely dependent on gly-colysis to generate energyHow translational control is achieved in T brucei

remains to be seen but our analysis suggests that uORFmay be an important contributor Work in several organ-isms has shown that generally the presence of uORFs cor-relates with reduced gene expression (71) However forsome genes like the transcription factor GCN4 in yeastand the activating transcription factor ATF4 in mammalsthis correlation is reversed on stress (8384) Thus uORFcan exert positive and negative effects on translation ofdownstream CDSs Based on previous 50 UTR annota-tions we identified uORFs in 22 of T brucei genesbut it will be interesting to learn if the regulatory effectsof uORFs may be supplemented by translation from non-AUG uORFs Our ribosome profiling data indicate anincrease in translation from 50 UTRs and a more ubiqui-tous translation initiation from non-AUG sites in thePF compared with the BF Similarly an increase in thetranslation from 50 UTRs has been observed in yeast onstarvation (45) One of the most highly conserved stress-induced mechanisms to regulate translation involves theinactivation of the translation initiation factor eIF2aby phosphorylation at a conserved serine (8586)Phosphorylation of the T brucei and L major orthologsoccurs at the corresponding Thr residues (687) but thegeneration of a T brucei cell line exclusively expressing amutant form of eIF2a demonstrated that phosphorylationof eIF2a is not required for heat-induced translationalarrest or for the formation of heat shock stress granulesin T brucei (88) Given the finding that eIF2a may also beimportant in determining the stringency in initiator codonselection (89) it will be interesting to see whether eIF2aphosphorylation affects the selection of initiator codonsand whether this plays a role in the fine-tuning of geneexpressionBesides being a valuable tool to study translational

regulation ribosome profiling will also be invaluable forthe discovery of small peptides While only 2 of thehuman genome contains annotated CDSs (90) transcrip-tome analyses have revealed almost genome-wide tran-scription of RNA mostly considered to be non-protein-coding In lower organisms the percentage of thegenome encoding for proteins is higher (629192) butpervasive transcription of non-protein coding regionscan be found in essentially all organisms (93) The vastamount of ncRNA has triggered great interest inelucidating the biological significance of these transcriptsand has led to the identification of numerous regulatorymechanisms controlled by short and long ncRNAInterestingly the majority of transcription occurs aslong ncRNA in humans (94) many of which containORFs making it difficult to unambiguously classify a tran-script as non-protein-coding To distinguish long ncRNA

Nucleic Acids Research 2014 Vol 42 No 6 3633

from mRNA ORF length has served as the mostcommonly applied criterion because even without select-ive pressure short ORFs will occur frequently by chancewhile the probability that long ORFs occur by chance arelow Nevertheless recent findings reveal that eukaryoticgenomes contain a large number of small CDSs (95ndash97)and that small peptides play important biological roles(77789899) In addition numerous long ncRNAs con-taining ORFs longer than 100 aa have been found to as-sociate with ribosomes (4653) While these findingscomplicate the annotation of CDSs and demand for newapproaches to annotate genes they raise the exciting pos-sibility that a large number of unknown small CDSs existsand awaits discoveryFor this study RNA was obtained from a 427 strain

the most common laboratory strain and sequence readswere aligned to the more complete genome of the 927strain nevertheless RNA-sequencing analysis indicateswidespread transcription Thus as seen in other organ-isms active transcription is not necessarily a good pre-dictor of CDSs In contrast we found ribosomefootprint density to be highly enriched across annotatedCDSs excluding introns It should thus serve as a reliabletool to identify novel genes To search for new CDSs andto evaluate their biological function we combinedribosome profiling proteomics and genome-wide RNAidata While proteomics data allowed us to identifyprotein products for only 20 relatively large candidateCDSs the experimental set-up of the proteomic analysisthat involved the removal of small peptides made itimpossible to verify the existence of many small CDSsThus more important than the verification of putativeCDSs with proteomics data may be our finding thatgt200 of the putative CDSs appear to be essential forparasite fitness Furthermore RIT-seq data suggest thatcandidate CDSs are 13-fold less likely to be essentialfor viability in the PF than in the BF or during differen-tiation of cells from BF to PF These findings point to apossible role of small peptides in the adaptation of tryp-anosomes to life in the mammalian host Generally ourdata suggest the existence of a large yet to be exploredsmall proteome but further characterizations of thesmall proteome will have to include a more targetedproteomics analysisIn this analysis we focused on measuring translational

control and the identification of new CDSs Howeverribosome profiling data should also aid in the identifica-tion of bona fide ncRNAs For many annotated CDSs weobserved RNA transcripts but not ribosome footprintssuggesting the presence of ncRNA Though given thatwe have only analysed two life cycle stages theseputative ncRNAs may be translated during other stagesof the parasitersquos life cycle Thus a comprehensive searchfor ncRNA should include ribosome profiling analyses ofall life cycle stages

ACCESSION NUMBERS

All sequencing data have been deposited in the EuropeanNucleotide Archive PRJEB4801

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Stan Gorski Susanne Kramer Christian Janzenand Jose-Juan Lopez-Rubio for valuable discussions andcritical reading of the manuscript and Yanjie Chao andJan Medenbach for much appreciated advice ongenerating ribosome profiling libraries

FUNDING

Young Investigator Program of the Research Center ofInfectious Diseases (ZINF) of the University ofWuerzburg Germany German Research FoundationDFG [SI 16102-1] Human Frontier Science Program(to TNS) French National Research Agency [ANR-2010-GENM-011-01 GENAMIBE to CCH] Fundingfor open access charge German Research Foundation(DFG) and the University of Wuerzburg in the fundingprogramme Open Access Publishing

Conflict of interest statement None declared

REFERENCES

1 MooreMJ (2005) From birth to death the complex lives ofeukaryotic mRNAs Science 309 1514ndash1518

2 NagarajN WisniewskiJR GeigerT CoxJ KircherMKelsoJ PaaboS and MannM (2011) Deep proteome andtranscriptome mapping of a human cancer cell line Mol SystBiol 7 548

3 LuP VogelC WangR YaoX and MarcotteEM (2007)Absolute protein expression profiling estimates the relativecontributions of transcriptional and translational regulationNat Biotechnol 25 117ndash124

4 de GodoyLM OlsenJV CoxJ NielsenML HubnerNCFrohlichF WaltherTC and MannM (2008) Comprehensivemass-spectrometry-based proteome quantification of haploidversus diploid yeast Nature 455 1251ndash1254

5 McNicollF DrummelsmithJ MullerM MadoreE BoilardNOuelletteM and PapadopoulouB (2006) A combinedproteomic and transcriptomic approach to the study ofstage differentiation in Leishmania infantum Proteomics 63567ndash3581

6 LahavT SivamD VolpinH RonenM TsigankovPGreenA HollandN KuzykM BorchersC ZilbersteinDet al (2011) Multiple levels of gene regulation mediatedifferentiation of the intracellular pathogen LeishmaniaFASEB J 25 515ndash525

7 SchwanhausserB BusseD LiN DittmarG SchuchhardtJWolfJ ChenW and SelbachM (2011) Global quantificationof mammalian gene expression control Nature 473 337ndash342

8 FevreEM WissmannBV WelburnSC and LutumbaP (2008)The burden of human African trypanosomiasis PLoS Negl TropDis 2 e333

9 LozanoR NaghaviM ForemanK LimS ShibuyaKAboyansV AbrahamJ AdairT AggarwalR AhnSY et al(2012) Global and regional mortality from 235 causes ofdeath for 20 age groups in 1990 and 2010 a systematic analysisfor the Global Burden of Disease Study 2010 Lancet 3802095ndash2128

10 Martinez-CalvilloS YanS NguyenD FoxM StuartK andMylerPJ (2003) Transcription of Leishmania major Friedlinchromosome 1 initiates in both directions within a single regionMol Cell 11 1291ndash1299

3634 Nucleic Acids Research 2014 Vol 42 No 6

11 Martinez-CalvilloS NguyenD StuartK and MylerPJ (2004)Transcription initiation and termination on Leishmania majorchromosome 3 Eukaryot Cell 3 506ndash517

12 SiegelTN HekstraDR KempLE FigueiredoLMLowellJE FenyoD WangX DewellS and CrossGAM(2009) Four histone variants mark the boundaries of polycistronictranscription units in Trypanosoma brucei Genes Dev 231063ndash1076

13 KolevNG FranklinJB CarmiS ShiH MichaeliS andTschudiC (2010) The transcriptome of the human pathogenTrypanosoma brucei at single-nucleotide resolutionPLoS Pathog 6 e1001090

14 LeBowitzJH SmithHQ RuscheL and BeverleySM (1993)Coupling of poly(A) site selection and trans-splicing inLeishmania Genes Dev 7 996ndash1007

15 UlluE MatthewsKR and TschudiC (1993) Temporal order ofRNA-processing reactions in trypanosomes rapid trans splicingprecedes polyadenylation of newly synthesized tubulin transcriptsMol Cell Biol 13 720ndash725

16 MatthewsKR TschudiC and UlluE (1994) A commonpyrimidine-rich motif governs trans-splicing and polyadenylationof tubulin polycistronic pre-mRNA in trypanosomes Genes Dev8 491ndash501

17 WrightJR SiegelTN and CrossGAM (2010) Histone H3trimethylated at lysine 4 is enriched at probable transcriptionstart sites in Trypanosoma brucei Mol Biochem Parasitol 136434ndash450

18 ClaytonC and ShapiraM (2007) Post-transcriptional regulationof gene expression in trypanosomes and leishmanias MolBiochem Parasitol 156 93ndash101

19 MatthewsKR (2011) Controlling and coordinating developmentin vector-transmitted parasites Science 331 1149ndash1153

20 CrossGA KleinRA and LinsteadDJ (1975) Utilization ofamino acids by Trypanosoma brucei in culture L-threonine as aprecursor for acetate Parasitology 71 311ndash326

21 BrunR and SchonenbergerM (1979) Cultivation and in vitrocloning or procyclic culture forms of Trypanosoma brucei in asemi-defined medium Acta Trop 36 289ndash292

22 FurgerA SchurchN KurathU and RoditiI (1997) Elementsin the 30 untranslated region of procyclin mRNA regulateexpression in insect forms of Trypanosoma brucei bymodulating RNA stability and translation Mol Cell Biol 174372ndash4380

23 HehlA VassellaE BraunR and RoditiI (1994) A conservedstem-loop structure in the 30 untranslated region of procyclinmRNAs regulates expression in Trypanosoma brucei Proc NatlAcad Sci USA 91 370ndash374

24 HotzHR HartmannC HuoberK HugM and ClaytonC(1997) Mechanisms of developmental regulation in Trypanosomabrucei a polypyrimidine tract in the 30-untranslated region of asurface protein mRNA affects RNA abundance and translationNucleic Acids Res 25 3017ndash3026

25 MayhoM FennK CraddyP CrosthwaiteS and MatthewsK(2006) Post-transcriptional control of nuclear-encoded cytochromeoxidase subunits in Trypanosoma brucei evidence for genome-wide conservation of life-cycle stage-specific regulatory elementsNucleic Acids Res 34 5312ndash5324

26 WalradP PaterouA Acosta-SerranoA and MatthewsKR(2009) Differential trypanosome surface coat regulation by aCCCH protein that co-associates with procyclin mRNA cis-elements PLoS Pathog 5 e1000317

27 HelmJR WilsonME and DonelsonJE (2009) Differentialexpression of a protease gene family in African trypanosomesMol Biochem Parasitol 163 8ndash18

28 HornD (2008) Codon usage suggests that translational selectionhas a major impact on protein expression in trypanosomatidsBMC Genomics 9 2

29 ManfulT FaddaA and ClaytonC (2011) The role of the 50-30

exoribonuclease XRNA in transcriptome-wide mRNAdegradation RNA 17 2039ndash2047

30 YoffeY ZuberekJ LewdorowiczM ZeiraZ KeasarCOrr-DahanI Jankowska-AnyszkaM StepinskiJDarzynkiewiczE and ShapiraM (2004) Cap-binding activity ofan eIF4E homolog from Leishmania RNA 10 1764ndash1775

31 YoffeY ZuberekJ LererA LewdorowiczM StepinskiJAltmannM DarzynkiewiczE and ShapiraM (2006) Bindingspecificities and potential roles of isoforms of eukaryotic initiationfactor 4E in Leishmania Eukaryot Cell 5 1969ndash1979

32 De GaudenziJ FraschAC and ClaytonC (2005) RNA-bindingdomain proteins in Kinetoplastids a comparative analysisEukaryot Cell 4 2106ndash2114

33 DhaliaR ReisCR FreireER RochaPO KatzRMunizJR StandartN and de Melo NetoOP (2005)Translation initiation in Leishmania major characterisation ofmultiple eIF4F subunit homologues Mol Biochem Parasitol140 23ndash41

34 ZinovievA and ShapiraM (2012) Evolutionary conservation anddiversification of the translation initiation apparatus intrypanosomatids Comp Funct Genomics 2012 813718

35 SiegelTN HekstraDR WangX DewellS and CrossGAM(2010) Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicingand polyadenylation sites Nucleic Acids Res 38 4946ndash4957

36 NilssonD GunasekeraK ManiJ OsterasM FarinelliLBaerlocherL RoditiI and OchsenreiterT (2010) Spliced leadertrapping reveals widespread alternative splicing patterns in thehighly dynamic transcriptome of Trypanosoma brucei PLoSPathog 6 e1001037

37 SiegelTN GunasekeraK CrossGA and OchsenreiterT(2011) Gene expression in Trypanosoma brucei lessons fromhigh-throughput RNA sequencing Trends Parasitol 27 434ndash441

38 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principles ofits regulation Nat Rev Mol Cell Biol 11 113ndash127

39 KozakM (1984) Selection of initiation sites by eucaryoticribosomes effect of inserting AUG triplets upstream from thecoding sequence for preproinsulin Nucleic Acids Res 123873ndash3893

40 SomersJ PoyryT and WillisAE (2013) A perspective onmammalian upstream open reading frame function Int JBiochem Cell Biol 45 1690ndash1700

41 SiegelTN TanKS and CrossGAM (2005) Systematic studyof sequence motifs for RNA trans splicing in Trypanosomabrucei Mol Cell Biol 25 9586ndash9594

42 AravaY WangY StoreyJD LiuCL BrownPO andHerschlagD (2003) Genome-wide analysis of mRNA translationprofiles in Saccharomyces cerevisiae Proc Natl Acad Sci USA100 3889ndash3894

43 CapewellP MonkS IvensA MacgregorP FennKWalradP BringaudF SmithTK and MatthewsKR (2013)Regulation of trypanosoma brucei total and polysomal mRNAduring development within its mammalian host PLoS One 8e67069

44 BrechtM and ParsonsM (1998) Changes in polysome profilesaccompany trypanosome development Mol Biochem Parasitol97 189ndash198

45 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

46 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

47 OhE BeckerAH SandikciA HuberD ChabaR GlogeFNicholsRJ TypasA GrossCA KramerG et al (2011)Selective ribosome profiling reveals the cotranslational chaperoneaction of trigger factor in vivo Cell 147 1295ndash1308

48 MichelAM and BaranovPV (2013) Ribosome profiling aHi-Def monitor for protein synthesis at the genome-wide scaleWiley Interdiscip Rev RNA 349 4184ndash4188

49 IngoliaNT (2010) Genome-wide translational profiling byribosome footprinting Methods Enzymol 470 119ndash142

50 IngoliaNT BrarGA RouskinS McGeachyAM andWeissmanJS (2012) The ribosome profiling strategy formonitoring translation in vivo by deep sequencing ofribosome-protected mRNA fragments Nat Protoc 7 1534ndash1550

51 LangmeadB and SalzbergSL (2012) Fast gapped-readalignment with Bowtie 2 Nat Methods 9 357ndash359

Nucleic Acids Research 2014 Vol 42 No 6 3635

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 9: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

translational efficiency is complicated by the existence ofmultiple 50 UTR isoforms resulting from widespreaddifferential trans-splicing (353676) At the same timethe heterogeneity in UTR lengths raises the intriguingpossibility that inclusion or exclusion of an uORFduring alternative trans-splicing may represent a regula-tory mechanism to modulate translational efficiency

Ribosome footprints reveal the presence of hundreds ofpreviously un-annotated CDSs

Even though small proteins (lt200 aa) have been shown toplay major roles in plant and animal development (7778)the complexity of the short proteome remains largelyunexplored because algorithms to reliably predict shortCDSs are lacking (79) We observed that while almostall ribosome footprints aligned to annotated features inthe BF 101 197 reads (023) and in the PF 1 116 030reads (144) did not It is noteworthy that just as weobserved for 50 UTRs the percentage of reads not aligningto CDSs is higher in the PF than in the BF We suspectthat many of these reads originate from un-annotatedUTRs but some may stem from previously un-annotatedCDSs A previous RNA-sequencing-based analysisdetected 1114 new un-annotated transcripts in the PF1011 of which had the potential to encode one or morepeptides ( 25 aa) Using available proteomics data theauthors were able to confirm translation for 19 of the 1114transcripts (13) Given our ability to determine ribosomepositions at nucleotide resolution we looked for new pre-viously un-annotated CDSs

We mapped ribosome footprint reads from the BF andPF and determined the ribosome footprint density for allpotential CDSs at least 10 aa in length To avoid overlapwith annotated CDSs we only considered ORFs locatedat least 20 nt away from annotated features Thisapproach enabled us to identify a set of 2021 candidateCDSs with 2 read-coverage and 70 of the ORFcovered The candidate CDSs ranged in length from 10to 378 aa (average 30 aa Supplementary Table S5) For797 of the 1114 previously identified transcripts we foundthe average ribosome footprint read-coverage to be 2(Supplementary Table S6 Previously identified transcriptsfrom chromosome 10 were not considered because oursequencing reads were mapped to a newer version of the

chromosome 10 assembly and thus genomic coordinatesmight be incompatible)To determine whether our set of candidate CDSs is

translated we analysed published proteomics datasets(57) to search for protein products For 24 of the 2021candidate CDSs (average size 117 aa 13 kDa) proteinproducts could be identified however four of theseidentified peptides also matched annotated CDSs Inaddition we identified protein products for 31 uORFs(Supplementary Table S4) We suspect that the lownumber of identified protein products relative to thelarge number of candidate CDSs may be caused (i) bythe experimental set-up of the original proteomicsanalysis where proteins were separated by 1D-SDS-PAGE prior to in-gel protease digest and mass spectro-metric analysis whereby proteins with a size smaller than5ndash10 kDa typically run out of the gel (ii) by the limitedsensitivity of mass spectrometric analyses compared withDNA sequencing-based techniques and (iii) by the factthat not all ribosome-protected RNAs are translatedThe first point is supported by the fact that only 67

(3) of our candidate CDSs were 90 aa (10 kDa) butfor 28 of those large candidate CDSs we could identifypeptides Evidence supporting the possibility that not allribosome-protected RNAs are translated exists in micewhere even known long ncRNAs have been found to beassociated with ribosomes (46) To evaluate the codingpotential of a transcript and to separate long ncRNAfrom small coding genes a new metric was establishedthe so-called RRS (53) The RRS takes advantage of thefact that translating ribosomes are released on encounter-ing a stop codon This release results in a sharp decrease inribosome occupancy between protein-coding regions andthe subsequent 30 UTR Consequently the RRS wasdefined as the ratio of footprint reads in the putativeCDS to footprint reads in the corresponding 30 UTRdivided by the ratio of RNA reads in the putative CDSto RNA reads in the corresponding 30 UTR (seelsquoMaterials and Methodsrsquo section)Similarly to annotated CDSs and uORFs our averaged

ribosome footprint data indicated a well-defined 3-nt peri-odicity for sequence reads occurring near translationinitiation sites However we observed a less well-defineddrop in ribosome densities across translation terminationsites (Figure 5A and B) which may indicate a lack of

Table 2 Translational efficiency of genes with and without uORF

Transcripts withuORF (N=3817)

Transcripts withoutuORF (N=1092)

MannndashWhitney test

Bloodstream formRibosome density (median rpkm) 3031 4466 Plt 00001mRNA levels (median rpkm) 3083 3613 Plt 00001Translational efficiency (ribosome densitymRNA levels) 100 127 Plt 00001

Procyclic formRibosome density (median rpkm) 2001 3650 Plt 00001mRNA levels (median rpkm) 2404 3038 Plt 00001Translational efficiency (ribosome densitymRNA levels) 087 126 Plt 00001

Reads mapping within the first 40 nt of a CDS were not considered for the measurements of translational efficiency Number of genes with annotated50 UTR N=4909

Nucleic Acids Research 2014 Vol 42 No 6 3631

productive translation Because we observed an abruptdrop in footprint density downstream of the stop codonof annotated genes in T brucei (Figure 1A and B) wedecided to apply the same parameters to evaluate the like-lihood that ORFs with high footprint density are indeedbeing translated into functional proteins Again candidateCDSs were defined as ORFs with a minimum length of10 aa and an average ribosome footprint coverage of 2over at least 70 of the ORF To define the correspond-ing putative 30 UTR we applied the same parameters usedin mice (53) ie the 30 UTR was defined as the regionbeginning immediately downstream of the ORF andending at the first subsequent start codon (in anyreading frame) A limitation of the RRS is that it canonly be calculated for an ORF if at least one entire foot-print read and one RNA read map to the putative CDSand the corresponding 30 UTR Because many 30 UTRswere too short to fulfill this criterion or simply did notcontain any footprint reads a sign for efficient translationtermination we were able to determine the RRS for only

1445 of genes annotated in T brucei (SupplementaryTable S7) For these 1445 genes the median RRS was839 compared with an RRS of 019 for the subset ofgenes annotated as hypothetical unlikely (N=90) Thussimilar to the observation in mice the RRS appears to bea good predictor of productive translation Next wedetermined the RRS for the previously un-annotated can-didate CDSs that fulfilled the criteria for RRS calculation(N=265) The median RRS of these candidate CDSs waswith 399 lower than the median RRS of the annotatedgenes (839) Thirteen candidate CDSs had an RRS largerthan 839 and 98 candidate CDSs had an RRS higher thanthe well-described intron-containing poly(A) polymerase(RRS=963) The RRS could only be calculated forone of the candidate CDSs for which we had identifiedprotein products (RRS=5744)

Finally in an attempt to learn about the function of theputative CDSs we searched for known protein motifsin those candidate CDSs for which we had identifiedprotein products and for those with an RRS above 10However except for one candidate CDS located withinthe histone H2B gene array and which contained anH2B signature motif (Supplementary Table S5) nomotifs were identified

Taken together proteomics data and a well-definedribosome release (high RRS) suggest extensive translationbeyond the annotated CDSs in T brucei

Newly identified CDSs are important for parasite fitness

The combination of ribosome profiling data with previ-ously published mass spectrometric data allowed us toidentify a large number of new CDSs Nonetheless the bio-logical significance of theseCDSs remains to be determinedIn addition while ORFs with a high ribosome footprintdensity a low RRS and without mass spectrometricevidence may not be translated into functional proteinsthey may nevertheless play important regulatory roles

Therefore we decided to evaluate the importance of ourcandidate CDSs independent of RRS and mass spectro-metric evidence in parasite survival A previously pub-lished high-throughput phenotyping approach termedRIT-seq measured the fitnessndashcost associations usingRNA interference (RNAi) The RIT-seq data revealed asignificant loss in fitness on RNAi induction of 2724 CDSsin the BF 1972 CDSs in the PF and 2677 CDSs in inducedand differentiated parasites (54) The approach is based onthe integration of RNAi libraries into trypanosomes and acomparison of the recovery of RNAi targets from tryp-anosome populations before and after RNAi inductionThe RNAi library was generated using genomic DNAbut the effect on parasite fitness was only determined forannotated genes Re-analysing the same datasets we finda significant loss of fitness on RNAi induction for 214putative CDSs in the BF (6 days after RNAi induction)16 CDSs in the PF and 227 CDSs on differentiation(Supplementary Table S8) For examples of candidateCDSs potentially essential for viability in the BF seeFigure 5C It is important to note that the RNAi libraryused for the RIT-seq experiments contains a median insertsize of 600 bp (average 1 kb) thus somewhat limiting the

01234

0

4

8

0005101520

02468

05

101520

02468

Chr III

622400

622000

622800

0

4

8

Chr X

2577800

2578200

01234

mR

NA

fo

otpr

int

RN

Ai r

eads

R

NA

i rea

ds

A

C

B

putative CDS putative CDS

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

-24 -18 -12 -6 0 6 12 -24 -18 -12 -6 0 6

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

putative CDS

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

(RPM)

Figure 5 Ribosome footprints reveal previously un-annotated CDSs(A) Alignment of the 50 nucleotides from ribosome footprint reads thatmap close to translation start or translation termination sites of candi-date CDSs (B) Percentage of position of sequence reads relative toreading frame (C) Ribosome mRNA and RIT-seq (RNAi targetsequencing) profiles of two previously un-annotated putative CDSs

3632 Nucleic Acids Research 2014 Vol 42 No 6

resolution of the RIT-seq data (8081) Nevertheless thesefindings suggest an important biological role for a subsetof genes that have been missed in previous genome anno-tations Intriguingly the RNAi data reveal large develop-mental differences in the fitness costs associated with thedown-regulation of putative CDSs suggesting that the setof candidate CDSs may be more important during themammalian stage

DISCUSSION

In this study we report the first genome-wide analysis ofprotein synthesis and a strand-specific analysis of RNAtranscript levels for T brucei It is the first such analysisfor a eukaryotic pathogen and an organism without tran-scriptional control Our data reveal large differences intranslational efficiency among transcripts in the same lifecycle stage and between the same transcript in differentstages In addition the sequencing of ribosome-protectedRNA enabled us to identify transcripts that are likely tobe translated Hundreds of putative previously unidenti-fied CDSs appear to be essential for parasite fitness andthe analysis of available proteomics data confirmed theexistence of at least 20 of these previously unknown CDSs

Eukaryotic gene expression is regulated at multiplelevels with translation of mRNA into proteins beingone of the most important (17) The ability to determineboth translatome and transcriptome enabled us toevaluate the efficiency with which individual transcriptsare translated into proteins and suggests translationalcontrol to be an important regulatory mechanism inT brucei Under a single condition we observed transla-tional efficiency to vary over two orders of magnitudethus its importance equals that of RNA stability forwhich a similar range has been measured (82) Howeverwe observed no correlation between RNA abundance andtranslational efficiency suggesting that translational effi-ciency is regulated independently of RNA stability

To use ribosome density to estimate the rate of proteinsynthesis the speed of translation must be constant Thisassumption is supported by measurements in mouse em-bryonic stem cells that demonstrate the rate of translationto be consistent between different classes of mRNA thekinetics of elongation to be independent of length andprotein abundance and the speed of translation to be in-dependent of codon usage (4664) Nevertheless the speedof translation may very well be different between the twolife cycle stages of the parasite that live at 37C and 27CFurthermore our approach does not permit measurementof the absolute mRNA and footprint abundance for theindividual life cycle stages making direct comparisons oftranslation efficiencies for individual transcripts unreli-able Therefore we did not compare translational effi-ciency directly but rather determined the lsquorankrsquo intranslational efficiency for all genes in each life cyclestage assuming that a lsquochange in rankrsquo indicated lifecycle-dependent translational control While we see ageneral positive correlation in translational efficiencybetween the two life cycle stages (Pearson r=07428)for a large number of genes we observed developmental

regulation of translation Importantly the regulation weobserve agrees well with that reported in previous studiesfor a small subset of genes but also includes many proteinspreviously unknown to be developmentally regulated Inaddition the efficient translation of proteins annotated ascytochrome oxidases in the PF and of enzymes importantfor glycolysis in the BF is consistent with differences inenergy metabolism with the BF entirely dependent on gly-colysis to generate energyHow translational control is achieved in T brucei

remains to be seen but our analysis suggests that uORFmay be an important contributor Work in several organ-isms has shown that generally the presence of uORFs cor-relates with reduced gene expression (71) However forsome genes like the transcription factor GCN4 in yeastand the activating transcription factor ATF4 in mammalsthis correlation is reversed on stress (8384) Thus uORFcan exert positive and negative effects on translation ofdownstream CDSs Based on previous 50 UTR annota-tions we identified uORFs in 22 of T brucei genesbut it will be interesting to learn if the regulatory effectsof uORFs may be supplemented by translation from non-AUG uORFs Our ribosome profiling data indicate anincrease in translation from 50 UTRs and a more ubiqui-tous translation initiation from non-AUG sites in thePF compared with the BF Similarly an increase in thetranslation from 50 UTRs has been observed in yeast onstarvation (45) One of the most highly conserved stress-induced mechanisms to regulate translation involves theinactivation of the translation initiation factor eIF2aby phosphorylation at a conserved serine (8586)Phosphorylation of the T brucei and L major orthologsoccurs at the corresponding Thr residues (687) but thegeneration of a T brucei cell line exclusively expressing amutant form of eIF2a demonstrated that phosphorylationof eIF2a is not required for heat-induced translationalarrest or for the formation of heat shock stress granulesin T brucei (88) Given the finding that eIF2a may also beimportant in determining the stringency in initiator codonselection (89) it will be interesting to see whether eIF2aphosphorylation affects the selection of initiator codonsand whether this plays a role in the fine-tuning of geneexpressionBesides being a valuable tool to study translational

regulation ribosome profiling will also be invaluable forthe discovery of small peptides While only 2 of thehuman genome contains annotated CDSs (90) transcrip-tome analyses have revealed almost genome-wide tran-scription of RNA mostly considered to be non-protein-coding In lower organisms the percentage of thegenome encoding for proteins is higher (629192) butpervasive transcription of non-protein coding regionscan be found in essentially all organisms (93) The vastamount of ncRNA has triggered great interest inelucidating the biological significance of these transcriptsand has led to the identification of numerous regulatorymechanisms controlled by short and long ncRNAInterestingly the majority of transcription occurs aslong ncRNA in humans (94) many of which containORFs making it difficult to unambiguously classify a tran-script as non-protein-coding To distinguish long ncRNA

Nucleic Acids Research 2014 Vol 42 No 6 3633

from mRNA ORF length has served as the mostcommonly applied criterion because even without select-ive pressure short ORFs will occur frequently by chancewhile the probability that long ORFs occur by chance arelow Nevertheless recent findings reveal that eukaryoticgenomes contain a large number of small CDSs (95ndash97)and that small peptides play important biological roles(77789899) In addition numerous long ncRNAs con-taining ORFs longer than 100 aa have been found to as-sociate with ribosomes (4653) While these findingscomplicate the annotation of CDSs and demand for newapproaches to annotate genes they raise the exciting pos-sibility that a large number of unknown small CDSs existsand awaits discoveryFor this study RNA was obtained from a 427 strain

the most common laboratory strain and sequence readswere aligned to the more complete genome of the 927strain nevertheless RNA-sequencing analysis indicateswidespread transcription Thus as seen in other organ-isms active transcription is not necessarily a good pre-dictor of CDSs In contrast we found ribosomefootprint density to be highly enriched across annotatedCDSs excluding introns It should thus serve as a reliabletool to identify novel genes To search for new CDSs andto evaluate their biological function we combinedribosome profiling proteomics and genome-wide RNAidata While proteomics data allowed us to identifyprotein products for only 20 relatively large candidateCDSs the experimental set-up of the proteomic analysisthat involved the removal of small peptides made itimpossible to verify the existence of many small CDSsThus more important than the verification of putativeCDSs with proteomics data may be our finding thatgt200 of the putative CDSs appear to be essential forparasite fitness Furthermore RIT-seq data suggest thatcandidate CDSs are 13-fold less likely to be essentialfor viability in the PF than in the BF or during differen-tiation of cells from BF to PF These findings point to apossible role of small peptides in the adaptation of tryp-anosomes to life in the mammalian host Generally ourdata suggest the existence of a large yet to be exploredsmall proteome but further characterizations of thesmall proteome will have to include a more targetedproteomics analysisIn this analysis we focused on measuring translational

control and the identification of new CDSs Howeverribosome profiling data should also aid in the identifica-tion of bona fide ncRNAs For many annotated CDSs weobserved RNA transcripts but not ribosome footprintssuggesting the presence of ncRNA Though given thatwe have only analysed two life cycle stages theseputative ncRNAs may be translated during other stagesof the parasitersquos life cycle Thus a comprehensive searchfor ncRNA should include ribosome profiling analyses ofall life cycle stages

ACCESSION NUMBERS

All sequencing data have been deposited in the EuropeanNucleotide Archive PRJEB4801

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Stan Gorski Susanne Kramer Christian Janzenand Jose-Juan Lopez-Rubio for valuable discussions andcritical reading of the manuscript and Yanjie Chao andJan Medenbach for much appreciated advice ongenerating ribosome profiling libraries

FUNDING

Young Investigator Program of the Research Center ofInfectious Diseases (ZINF) of the University ofWuerzburg Germany German Research FoundationDFG [SI 16102-1] Human Frontier Science Program(to TNS) French National Research Agency [ANR-2010-GENM-011-01 GENAMIBE to CCH] Fundingfor open access charge German Research Foundation(DFG) and the University of Wuerzburg in the fundingprogramme Open Access Publishing

Conflict of interest statement None declared

REFERENCES

1 MooreMJ (2005) From birth to death the complex lives ofeukaryotic mRNAs Science 309 1514ndash1518

2 NagarajN WisniewskiJR GeigerT CoxJ KircherMKelsoJ PaaboS and MannM (2011) Deep proteome andtranscriptome mapping of a human cancer cell line Mol SystBiol 7 548

3 LuP VogelC WangR YaoX and MarcotteEM (2007)Absolute protein expression profiling estimates the relativecontributions of transcriptional and translational regulationNat Biotechnol 25 117ndash124

4 de GodoyLM OlsenJV CoxJ NielsenML HubnerNCFrohlichF WaltherTC and MannM (2008) Comprehensivemass-spectrometry-based proteome quantification of haploidversus diploid yeast Nature 455 1251ndash1254

5 McNicollF DrummelsmithJ MullerM MadoreE BoilardNOuelletteM and PapadopoulouB (2006) A combinedproteomic and transcriptomic approach to the study ofstage differentiation in Leishmania infantum Proteomics 63567ndash3581

6 LahavT SivamD VolpinH RonenM TsigankovPGreenA HollandN KuzykM BorchersC ZilbersteinDet al (2011) Multiple levels of gene regulation mediatedifferentiation of the intracellular pathogen LeishmaniaFASEB J 25 515ndash525

7 SchwanhausserB BusseD LiN DittmarG SchuchhardtJWolfJ ChenW and SelbachM (2011) Global quantificationof mammalian gene expression control Nature 473 337ndash342

8 FevreEM WissmannBV WelburnSC and LutumbaP (2008)The burden of human African trypanosomiasis PLoS Negl TropDis 2 e333

9 LozanoR NaghaviM ForemanK LimS ShibuyaKAboyansV AbrahamJ AdairT AggarwalR AhnSY et al(2012) Global and regional mortality from 235 causes ofdeath for 20 age groups in 1990 and 2010 a systematic analysisfor the Global Burden of Disease Study 2010 Lancet 3802095ndash2128

10 Martinez-CalvilloS YanS NguyenD FoxM StuartK andMylerPJ (2003) Transcription of Leishmania major Friedlinchromosome 1 initiates in both directions within a single regionMol Cell 11 1291ndash1299

3634 Nucleic Acids Research 2014 Vol 42 No 6

11 Martinez-CalvilloS NguyenD StuartK and MylerPJ (2004)Transcription initiation and termination on Leishmania majorchromosome 3 Eukaryot Cell 3 506ndash517

12 SiegelTN HekstraDR KempLE FigueiredoLMLowellJE FenyoD WangX DewellS and CrossGAM(2009) Four histone variants mark the boundaries of polycistronictranscription units in Trypanosoma brucei Genes Dev 231063ndash1076

13 KolevNG FranklinJB CarmiS ShiH MichaeliS andTschudiC (2010) The transcriptome of the human pathogenTrypanosoma brucei at single-nucleotide resolutionPLoS Pathog 6 e1001090

14 LeBowitzJH SmithHQ RuscheL and BeverleySM (1993)Coupling of poly(A) site selection and trans-splicing inLeishmania Genes Dev 7 996ndash1007

15 UlluE MatthewsKR and TschudiC (1993) Temporal order ofRNA-processing reactions in trypanosomes rapid trans splicingprecedes polyadenylation of newly synthesized tubulin transcriptsMol Cell Biol 13 720ndash725

16 MatthewsKR TschudiC and UlluE (1994) A commonpyrimidine-rich motif governs trans-splicing and polyadenylationof tubulin polycistronic pre-mRNA in trypanosomes Genes Dev8 491ndash501

17 WrightJR SiegelTN and CrossGAM (2010) Histone H3trimethylated at lysine 4 is enriched at probable transcriptionstart sites in Trypanosoma brucei Mol Biochem Parasitol 136434ndash450

18 ClaytonC and ShapiraM (2007) Post-transcriptional regulationof gene expression in trypanosomes and leishmanias MolBiochem Parasitol 156 93ndash101

19 MatthewsKR (2011) Controlling and coordinating developmentin vector-transmitted parasites Science 331 1149ndash1153

20 CrossGA KleinRA and LinsteadDJ (1975) Utilization ofamino acids by Trypanosoma brucei in culture L-threonine as aprecursor for acetate Parasitology 71 311ndash326

21 BrunR and SchonenbergerM (1979) Cultivation and in vitrocloning or procyclic culture forms of Trypanosoma brucei in asemi-defined medium Acta Trop 36 289ndash292

22 FurgerA SchurchN KurathU and RoditiI (1997) Elementsin the 30 untranslated region of procyclin mRNA regulateexpression in insect forms of Trypanosoma brucei bymodulating RNA stability and translation Mol Cell Biol 174372ndash4380

23 HehlA VassellaE BraunR and RoditiI (1994) A conservedstem-loop structure in the 30 untranslated region of procyclinmRNAs regulates expression in Trypanosoma brucei Proc NatlAcad Sci USA 91 370ndash374

24 HotzHR HartmannC HuoberK HugM and ClaytonC(1997) Mechanisms of developmental regulation in Trypanosomabrucei a polypyrimidine tract in the 30-untranslated region of asurface protein mRNA affects RNA abundance and translationNucleic Acids Res 25 3017ndash3026

25 MayhoM FennK CraddyP CrosthwaiteS and MatthewsK(2006) Post-transcriptional control of nuclear-encoded cytochromeoxidase subunits in Trypanosoma brucei evidence for genome-wide conservation of life-cycle stage-specific regulatory elementsNucleic Acids Res 34 5312ndash5324

26 WalradP PaterouA Acosta-SerranoA and MatthewsKR(2009) Differential trypanosome surface coat regulation by aCCCH protein that co-associates with procyclin mRNA cis-elements PLoS Pathog 5 e1000317

27 HelmJR WilsonME and DonelsonJE (2009) Differentialexpression of a protease gene family in African trypanosomesMol Biochem Parasitol 163 8ndash18

28 HornD (2008) Codon usage suggests that translational selectionhas a major impact on protein expression in trypanosomatidsBMC Genomics 9 2

29 ManfulT FaddaA and ClaytonC (2011) The role of the 50-30

exoribonuclease XRNA in transcriptome-wide mRNAdegradation RNA 17 2039ndash2047

30 YoffeY ZuberekJ LewdorowiczM ZeiraZ KeasarCOrr-DahanI Jankowska-AnyszkaM StepinskiJDarzynkiewiczE and ShapiraM (2004) Cap-binding activity ofan eIF4E homolog from Leishmania RNA 10 1764ndash1775

31 YoffeY ZuberekJ LererA LewdorowiczM StepinskiJAltmannM DarzynkiewiczE and ShapiraM (2006) Bindingspecificities and potential roles of isoforms of eukaryotic initiationfactor 4E in Leishmania Eukaryot Cell 5 1969ndash1979

32 De GaudenziJ FraschAC and ClaytonC (2005) RNA-bindingdomain proteins in Kinetoplastids a comparative analysisEukaryot Cell 4 2106ndash2114

33 DhaliaR ReisCR FreireER RochaPO KatzRMunizJR StandartN and de Melo NetoOP (2005)Translation initiation in Leishmania major characterisation ofmultiple eIF4F subunit homologues Mol Biochem Parasitol140 23ndash41

34 ZinovievA and ShapiraM (2012) Evolutionary conservation anddiversification of the translation initiation apparatus intrypanosomatids Comp Funct Genomics 2012 813718

35 SiegelTN HekstraDR WangX DewellS and CrossGAM(2010) Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicingand polyadenylation sites Nucleic Acids Res 38 4946ndash4957

36 NilssonD GunasekeraK ManiJ OsterasM FarinelliLBaerlocherL RoditiI and OchsenreiterT (2010) Spliced leadertrapping reveals widespread alternative splicing patterns in thehighly dynamic transcriptome of Trypanosoma brucei PLoSPathog 6 e1001037

37 SiegelTN GunasekeraK CrossGA and OchsenreiterT(2011) Gene expression in Trypanosoma brucei lessons fromhigh-throughput RNA sequencing Trends Parasitol 27 434ndash441

38 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principles ofits regulation Nat Rev Mol Cell Biol 11 113ndash127

39 KozakM (1984) Selection of initiation sites by eucaryoticribosomes effect of inserting AUG triplets upstream from thecoding sequence for preproinsulin Nucleic Acids Res 123873ndash3893

40 SomersJ PoyryT and WillisAE (2013) A perspective onmammalian upstream open reading frame function Int JBiochem Cell Biol 45 1690ndash1700

41 SiegelTN TanKS and CrossGAM (2005) Systematic studyof sequence motifs for RNA trans splicing in Trypanosomabrucei Mol Cell Biol 25 9586ndash9594

42 AravaY WangY StoreyJD LiuCL BrownPO andHerschlagD (2003) Genome-wide analysis of mRNA translationprofiles in Saccharomyces cerevisiae Proc Natl Acad Sci USA100 3889ndash3894

43 CapewellP MonkS IvensA MacgregorP FennKWalradP BringaudF SmithTK and MatthewsKR (2013)Regulation of trypanosoma brucei total and polysomal mRNAduring development within its mammalian host PLoS One 8e67069

44 BrechtM and ParsonsM (1998) Changes in polysome profilesaccompany trypanosome development Mol Biochem Parasitol97 189ndash198

45 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

46 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

47 OhE BeckerAH SandikciA HuberD ChabaR GlogeFNicholsRJ TypasA GrossCA KramerG et al (2011)Selective ribosome profiling reveals the cotranslational chaperoneaction of trigger factor in vivo Cell 147 1295ndash1308

48 MichelAM and BaranovPV (2013) Ribosome profiling aHi-Def monitor for protein synthesis at the genome-wide scaleWiley Interdiscip Rev RNA 349 4184ndash4188

49 IngoliaNT (2010) Genome-wide translational profiling byribosome footprinting Methods Enzymol 470 119ndash142

50 IngoliaNT BrarGA RouskinS McGeachyAM andWeissmanJS (2012) The ribosome profiling strategy formonitoring translation in vivo by deep sequencing ofribosome-protected mRNA fragments Nat Protoc 7 1534ndash1550

51 LangmeadB and SalzbergSL (2012) Fast gapped-readalignment with Bowtie 2 Nat Methods 9 357ndash359

Nucleic Acids Research 2014 Vol 42 No 6 3635

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 10: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

productive translation Because we observed an abruptdrop in footprint density downstream of the stop codonof annotated genes in T brucei (Figure 1A and B) wedecided to apply the same parameters to evaluate the like-lihood that ORFs with high footprint density are indeedbeing translated into functional proteins Again candidateCDSs were defined as ORFs with a minimum length of10 aa and an average ribosome footprint coverage of 2over at least 70 of the ORF To define the correspond-ing putative 30 UTR we applied the same parameters usedin mice (53) ie the 30 UTR was defined as the regionbeginning immediately downstream of the ORF andending at the first subsequent start codon (in anyreading frame) A limitation of the RRS is that it canonly be calculated for an ORF if at least one entire foot-print read and one RNA read map to the putative CDSand the corresponding 30 UTR Because many 30 UTRswere too short to fulfill this criterion or simply did notcontain any footprint reads a sign for efficient translationtermination we were able to determine the RRS for only

1445 of genes annotated in T brucei (SupplementaryTable S7) For these 1445 genes the median RRS was839 compared with an RRS of 019 for the subset ofgenes annotated as hypothetical unlikely (N=90) Thussimilar to the observation in mice the RRS appears to bea good predictor of productive translation Next wedetermined the RRS for the previously un-annotated can-didate CDSs that fulfilled the criteria for RRS calculation(N=265) The median RRS of these candidate CDSs waswith 399 lower than the median RRS of the annotatedgenes (839) Thirteen candidate CDSs had an RRS largerthan 839 and 98 candidate CDSs had an RRS higher thanthe well-described intron-containing poly(A) polymerase(RRS=963) The RRS could only be calculated forone of the candidate CDSs for which we had identifiedprotein products (RRS=5744)

Finally in an attempt to learn about the function of theputative CDSs we searched for known protein motifsin those candidate CDSs for which we had identifiedprotein products and for those with an RRS above 10However except for one candidate CDS located withinthe histone H2B gene array and which contained anH2B signature motif (Supplementary Table S5) nomotifs were identified

Taken together proteomics data and a well-definedribosome release (high RRS) suggest extensive translationbeyond the annotated CDSs in T brucei

Newly identified CDSs are important for parasite fitness

The combination of ribosome profiling data with previ-ously published mass spectrometric data allowed us toidentify a large number of new CDSs Nonetheless the bio-logical significance of theseCDSs remains to be determinedIn addition while ORFs with a high ribosome footprintdensity a low RRS and without mass spectrometricevidence may not be translated into functional proteinsthey may nevertheless play important regulatory roles

Therefore we decided to evaluate the importance of ourcandidate CDSs independent of RRS and mass spectro-metric evidence in parasite survival A previously pub-lished high-throughput phenotyping approach termedRIT-seq measured the fitnessndashcost associations usingRNA interference (RNAi) The RIT-seq data revealed asignificant loss in fitness on RNAi induction of 2724 CDSsin the BF 1972 CDSs in the PF and 2677 CDSs in inducedand differentiated parasites (54) The approach is based onthe integration of RNAi libraries into trypanosomes and acomparison of the recovery of RNAi targets from tryp-anosome populations before and after RNAi inductionThe RNAi library was generated using genomic DNAbut the effect on parasite fitness was only determined forannotated genes Re-analysing the same datasets we finda significant loss of fitness on RNAi induction for 214putative CDSs in the BF (6 days after RNAi induction)16 CDSs in the PF and 227 CDSs on differentiation(Supplementary Table S8) For examples of candidateCDSs potentially essential for viability in the BF seeFigure 5C It is important to note that the RNAi libraryused for the RIT-seq experiments contains a median insertsize of 600 bp (average 1 kb) thus somewhat limiting the

01234

0

4

8

0005101520

02468

05

101520

02468

Chr III

622400

622000

622800

0

4

8

Chr X

2577800

2578200

01234

mR

NA

fo

otpr

int

RN

Ai r

eads

R

NA

i rea

ds

A

C

B

putative CDS putative CDS

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

BF mRNA density

BF footprint density

BF no RNAi induction

BF 6 days of RNAi induction

-24 -18 -12 -6 0 6 12 -24 -18 -12 -6 0 6

UAAAUG

CDS position (nt from start) CDS position (nt from stop)

putative CDS

rela

tive

enric

hmen

tof

rea

d 5acute

end

s

ribosome footprintmRNA

perc

enta

ge o

f rea

ds

20

0

40

60

80

frame0 1 2

(RPM)

Figure 5 Ribosome footprints reveal previously un-annotated CDSs(A) Alignment of the 50 nucleotides from ribosome footprint reads thatmap close to translation start or translation termination sites of candi-date CDSs (B) Percentage of position of sequence reads relative toreading frame (C) Ribosome mRNA and RIT-seq (RNAi targetsequencing) profiles of two previously un-annotated putative CDSs

3632 Nucleic Acids Research 2014 Vol 42 No 6

resolution of the RIT-seq data (8081) Nevertheless thesefindings suggest an important biological role for a subsetof genes that have been missed in previous genome anno-tations Intriguingly the RNAi data reveal large develop-mental differences in the fitness costs associated with thedown-regulation of putative CDSs suggesting that the setof candidate CDSs may be more important during themammalian stage

DISCUSSION

In this study we report the first genome-wide analysis ofprotein synthesis and a strand-specific analysis of RNAtranscript levels for T brucei It is the first such analysisfor a eukaryotic pathogen and an organism without tran-scriptional control Our data reveal large differences intranslational efficiency among transcripts in the same lifecycle stage and between the same transcript in differentstages In addition the sequencing of ribosome-protectedRNA enabled us to identify transcripts that are likely tobe translated Hundreds of putative previously unidenti-fied CDSs appear to be essential for parasite fitness andthe analysis of available proteomics data confirmed theexistence of at least 20 of these previously unknown CDSs

Eukaryotic gene expression is regulated at multiplelevels with translation of mRNA into proteins beingone of the most important (17) The ability to determineboth translatome and transcriptome enabled us toevaluate the efficiency with which individual transcriptsare translated into proteins and suggests translationalcontrol to be an important regulatory mechanism inT brucei Under a single condition we observed transla-tional efficiency to vary over two orders of magnitudethus its importance equals that of RNA stability forwhich a similar range has been measured (82) Howeverwe observed no correlation between RNA abundance andtranslational efficiency suggesting that translational effi-ciency is regulated independently of RNA stability

To use ribosome density to estimate the rate of proteinsynthesis the speed of translation must be constant Thisassumption is supported by measurements in mouse em-bryonic stem cells that demonstrate the rate of translationto be consistent between different classes of mRNA thekinetics of elongation to be independent of length andprotein abundance and the speed of translation to be in-dependent of codon usage (4664) Nevertheless the speedof translation may very well be different between the twolife cycle stages of the parasite that live at 37C and 27CFurthermore our approach does not permit measurementof the absolute mRNA and footprint abundance for theindividual life cycle stages making direct comparisons oftranslation efficiencies for individual transcripts unreli-able Therefore we did not compare translational effi-ciency directly but rather determined the lsquorankrsquo intranslational efficiency for all genes in each life cyclestage assuming that a lsquochange in rankrsquo indicated lifecycle-dependent translational control While we see ageneral positive correlation in translational efficiencybetween the two life cycle stages (Pearson r=07428)for a large number of genes we observed developmental

regulation of translation Importantly the regulation weobserve agrees well with that reported in previous studiesfor a small subset of genes but also includes many proteinspreviously unknown to be developmentally regulated Inaddition the efficient translation of proteins annotated ascytochrome oxidases in the PF and of enzymes importantfor glycolysis in the BF is consistent with differences inenergy metabolism with the BF entirely dependent on gly-colysis to generate energyHow translational control is achieved in T brucei

remains to be seen but our analysis suggests that uORFmay be an important contributor Work in several organ-isms has shown that generally the presence of uORFs cor-relates with reduced gene expression (71) However forsome genes like the transcription factor GCN4 in yeastand the activating transcription factor ATF4 in mammalsthis correlation is reversed on stress (8384) Thus uORFcan exert positive and negative effects on translation ofdownstream CDSs Based on previous 50 UTR annota-tions we identified uORFs in 22 of T brucei genesbut it will be interesting to learn if the regulatory effectsof uORFs may be supplemented by translation from non-AUG uORFs Our ribosome profiling data indicate anincrease in translation from 50 UTRs and a more ubiqui-tous translation initiation from non-AUG sites in thePF compared with the BF Similarly an increase in thetranslation from 50 UTRs has been observed in yeast onstarvation (45) One of the most highly conserved stress-induced mechanisms to regulate translation involves theinactivation of the translation initiation factor eIF2aby phosphorylation at a conserved serine (8586)Phosphorylation of the T brucei and L major orthologsoccurs at the corresponding Thr residues (687) but thegeneration of a T brucei cell line exclusively expressing amutant form of eIF2a demonstrated that phosphorylationof eIF2a is not required for heat-induced translationalarrest or for the formation of heat shock stress granulesin T brucei (88) Given the finding that eIF2a may also beimportant in determining the stringency in initiator codonselection (89) it will be interesting to see whether eIF2aphosphorylation affects the selection of initiator codonsand whether this plays a role in the fine-tuning of geneexpressionBesides being a valuable tool to study translational

regulation ribosome profiling will also be invaluable forthe discovery of small peptides While only 2 of thehuman genome contains annotated CDSs (90) transcrip-tome analyses have revealed almost genome-wide tran-scription of RNA mostly considered to be non-protein-coding In lower organisms the percentage of thegenome encoding for proteins is higher (629192) butpervasive transcription of non-protein coding regionscan be found in essentially all organisms (93) The vastamount of ncRNA has triggered great interest inelucidating the biological significance of these transcriptsand has led to the identification of numerous regulatorymechanisms controlled by short and long ncRNAInterestingly the majority of transcription occurs aslong ncRNA in humans (94) many of which containORFs making it difficult to unambiguously classify a tran-script as non-protein-coding To distinguish long ncRNA

Nucleic Acids Research 2014 Vol 42 No 6 3633

from mRNA ORF length has served as the mostcommonly applied criterion because even without select-ive pressure short ORFs will occur frequently by chancewhile the probability that long ORFs occur by chance arelow Nevertheless recent findings reveal that eukaryoticgenomes contain a large number of small CDSs (95ndash97)and that small peptides play important biological roles(77789899) In addition numerous long ncRNAs con-taining ORFs longer than 100 aa have been found to as-sociate with ribosomes (4653) While these findingscomplicate the annotation of CDSs and demand for newapproaches to annotate genes they raise the exciting pos-sibility that a large number of unknown small CDSs existsand awaits discoveryFor this study RNA was obtained from a 427 strain

the most common laboratory strain and sequence readswere aligned to the more complete genome of the 927strain nevertheless RNA-sequencing analysis indicateswidespread transcription Thus as seen in other organ-isms active transcription is not necessarily a good pre-dictor of CDSs In contrast we found ribosomefootprint density to be highly enriched across annotatedCDSs excluding introns It should thus serve as a reliabletool to identify novel genes To search for new CDSs andto evaluate their biological function we combinedribosome profiling proteomics and genome-wide RNAidata While proteomics data allowed us to identifyprotein products for only 20 relatively large candidateCDSs the experimental set-up of the proteomic analysisthat involved the removal of small peptides made itimpossible to verify the existence of many small CDSsThus more important than the verification of putativeCDSs with proteomics data may be our finding thatgt200 of the putative CDSs appear to be essential forparasite fitness Furthermore RIT-seq data suggest thatcandidate CDSs are 13-fold less likely to be essentialfor viability in the PF than in the BF or during differen-tiation of cells from BF to PF These findings point to apossible role of small peptides in the adaptation of tryp-anosomes to life in the mammalian host Generally ourdata suggest the existence of a large yet to be exploredsmall proteome but further characterizations of thesmall proteome will have to include a more targetedproteomics analysisIn this analysis we focused on measuring translational

control and the identification of new CDSs Howeverribosome profiling data should also aid in the identifica-tion of bona fide ncRNAs For many annotated CDSs weobserved RNA transcripts but not ribosome footprintssuggesting the presence of ncRNA Though given thatwe have only analysed two life cycle stages theseputative ncRNAs may be translated during other stagesof the parasitersquos life cycle Thus a comprehensive searchfor ncRNA should include ribosome profiling analyses ofall life cycle stages

ACCESSION NUMBERS

All sequencing data have been deposited in the EuropeanNucleotide Archive PRJEB4801

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Stan Gorski Susanne Kramer Christian Janzenand Jose-Juan Lopez-Rubio for valuable discussions andcritical reading of the manuscript and Yanjie Chao andJan Medenbach for much appreciated advice ongenerating ribosome profiling libraries

FUNDING

Young Investigator Program of the Research Center ofInfectious Diseases (ZINF) of the University ofWuerzburg Germany German Research FoundationDFG [SI 16102-1] Human Frontier Science Program(to TNS) French National Research Agency [ANR-2010-GENM-011-01 GENAMIBE to CCH] Fundingfor open access charge German Research Foundation(DFG) and the University of Wuerzburg in the fundingprogramme Open Access Publishing

Conflict of interest statement None declared

REFERENCES

1 MooreMJ (2005) From birth to death the complex lives ofeukaryotic mRNAs Science 309 1514ndash1518

2 NagarajN WisniewskiJR GeigerT CoxJ KircherMKelsoJ PaaboS and MannM (2011) Deep proteome andtranscriptome mapping of a human cancer cell line Mol SystBiol 7 548

3 LuP VogelC WangR YaoX and MarcotteEM (2007)Absolute protein expression profiling estimates the relativecontributions of transcriptional and translational regulationNat Biotechnol 25 117ndash124

4 de GodoyLM OlsenJV CoxJ NielsenML HubnerNCFrohlichF WaltherTC and MannM (2008) Comprehensivemass-spectrometry-based proteome quantification of haploidversus diploid yeast Nature 455 1251ndash1254

5 McNicollF DrummelsmithJ MullerM MadoreE BoilardNOuelletteM and PapadopoulouB (2006) A combinedproteomic and transcriptomic approach to the study ofstage differentiation in Leishmania infantum Proteomics 63567ndash3581

6 LahavT SivamD VolpinH RonenM TsigankovPGreenA HollandN KuzykM BorchersC ZilbersteinDet al (2011) Multiple levels of gene regulation mediatedifferentiation of the intracellular pathogen LeishmaniaFASEB J 25 515ndash525

7 SchwanhausserB BusseD LiN DittmarG SchuchhardtJWolfJ ChenW and SelbachM (2011) Global quantificationof mammalian gene expression control Nature 473 337ndash342

8 FevreEM WissmannBV WelburnSC and LutumbaP (2008)The burden of human African trypanosomiasis PLoS Negl TropDis 2 e333

9 LozanoR NaghaviM ForemanK LimS ShibuyaKAboyansV AbrahamJ AdairT AggarwalR AhnSY et al(2012) Global and regional mortality from 235 causes ofdeath for 20 age groups in 1990 and 2010 a systematic analysisfor the Global Burden of Disease Study 2010 Lancet 3802095ndash2128

10 Martinez-CalvilloS YanS NguyenD FoxM StuartK andMylerPJ (2003) Transcription of Leishmania major Friedlinchromosome 1 initiates in both directions within a single regionMol Cell 11 1291ndash1299

3634 Nucleic Acids Research 2014 Vol 42 No 6

11 Martinez-CalvilloS NguyenD StuartK and MylerPJ (2004)Transcription initiation and termination on Leishmania majorchromosome 3 Eukaryot Cell 3 506ndash517

12 SiegelTN HekstraDR KempLE FigueiredoLMLowellJE FenyoD WangX DewellS and CrossGAM(2009) Four histone variants mark the boundaries of polycistronictranscription units in Trypanosoma brucei Genes Dev 231063ndash1076

13 KolevNG FranklinJB CarmiS ShiH MichaeliS andTschudiC (2010) The transcriptome of the human pathogenTrypanosoma brucei at single-nucleotide resolutionPLoS Pathog 6 e1001090

14 LeBowitzJH SmithHQ RuscheL and BeverleySM (1993)Coupling of poly(A) site selection and trans-splicing inLeishmania Genes Dev 7 996ndash1007

15 UlluE MatthewsKR and TschudiC (1993) Temporal order ofRNA-processing reactions in trypanosomes rapid trans splicingprecedes polyadenylation of newly synthesized tubulin transcriptsMol Cell Biol 13 720ndash725

16 MatthewsKR TschudiC and UlluE (1994) A commonpyrimidine-rich motif governs trans-splicing and polyadenylationof tubulin polycistronic pre-mRNA in trypanosomes Genes Dev8 491ndash501

17 WrightJR SiegelTN and CrossGAM (2010) Histone H3trimethylated at lysine 4 is enriched at probable transcriptionstart sites in Trypanosoma brucei Mol Biochem Parasitol 136434ndash450

18 ClaytonC and ShapiraM (2007) Post-transcriptional regulationof gene expression in trypanosomes and leishmanias MolBiochem Parasitol 156 93ndash101

19 MatthewsKR (2011) Controlling and coordinating developmentin vector-transmitted parasites Science 331 1149ndash1153

20 CrossGA KleinRA and LinsteadDJ (1975) Utilization ofamino acids by Trypanosoma brucei in culture L-threonine as aprecursor for acetate Parasitology 71 311ndash326

21 BrunR and SchonenbergerM (1979) Cultivation and in vitrocloning or procyclic culture forms of Trypanosoma brucei in asemi-defined medium Acta Trop 36 289ndash292

22 FurgerA SchurchN KurathU and RoditiI (1997) Elementsin the 30 untranslated region of procyclin mRNA regulateexpression in insect forms of Trypanosoma brucei bymodulating RNA stability and translation Mol Cell Biol 174372ndash4380

23 HehlA VassellaE BraunR and RoditiI (1994) A conservedstem-loop structure in the 30 untranslated region of procyclinmRNAs regulates expression in Trypanosoma brucei Proc NatlAcad Sci USA 91 370ndash374

24 HotzHR HartmannC HuoberK HugM and ClaytonC(1997) Mechanisms of developmental regulation in Trypanosomabrucei a polypyrimidine tract in the 30-untranslated region of asurface protein mRNA affects RNA abundance and translationNucleic Acids Res 25 3017ndash3026

25 MayhoM FennK CraddyP CrosthwaiteS and MatthewsK(2006) Post-transcriptional control of nuclear-encoded cytochromeoxidase subunits in Trypanosoma brucei evidence for genome-wide conservation of life-cycle stage-specific regulatory elementsNucleic Acids Res 34 5312ndash5324

26 WalradP PaterouA Acosta-SerranoA and MatthewsKR(2009) Differential trypanosome surface coat regulation by aCCCH protein that co-associates with procyclin mRNA cis-elements PLoS Pathog 5 e1000317

27 HelmJR WilsonME and DonelsonJE (2009) Differentialexpression of a protease gene family in African trypanosomesMol Biochem Parasitol 163 8ndash18

28 HornD (2008) Codon usage suggests that translational selectionhas a major impact on protein expression in trypanosomatidsBMC Genomics 9 2

29 ManfulT FaddaA and ClaytonC (2011) The role of the 50-30

exoribonuclease XRNA in transcriptome-wide mRNAdegradation RNA 17 2039ndash2047

30 YoffeY ZuberekJ LewdorowiczM ZeiraZ KeasarCOrr-DahanI Jankowska-AnyszkaM StepinskiJDarzynkiewiczE and ShapiraM (2004) Cap-binding activity ofan eIF4E homolog from Leishmania RNA 10 1764ndash1775

31 YoffeY ZuberekJ LererA LewdorowiczM StepinskiJAltmannM DarzynkiewiczE and ShapiraM (2006) Bindingspecificities and potential roles of isoforms of eukaryotic initiationfactor 4E in Leishmania Eukaryot Cell 5 1969ndash1979

32 De GaudenziJ FraschAC and ClaytonC (2005) RNA-bindingdomain proteins in Kinetoplastids a comparative analysisEukaryot Cell 4 2106ndash2114

33 DhaliaR ReisCR FreireER RochaPO KatzRMunizJR StandartN and de Melo NetoOP (2005)Translation initiation in Leishmania major characterisation ofmultiple eIF4F subunit homologues Mol Biochem Parasitol140 23ndash41

34 ZinovievA and ShapiraM (2012) Evolutionary conservation anddiversification of the translation initiation apparatus intrypanosomatids Comp Funct Genomics 2012 813718

35 SiegelTN HekstraDR WangX DewellS and CrossGAM(2010) Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicingand polyadenylation sites Nucleic Acids Res 38 4946ndash4957

36 NilssonD GunasekeraK ManiJ OsterasM FarinelliLBaerlocherL RoditiI and OchsenreiterT (2010) Spliced leadertrapping reveals widespread alternative splicing patterns in thehighly dynamic transcriptome of Trypanosoma brucei PLoSPathog 6 e1001037

37 SiegelTN GunasekeraK CrossGA and OchsenreiterT(2011) Gene expression in Trypanosoma brucei lessons fromhigh-throughput RNA sequencing Trends Parasitol 27 434ndash441

38 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principles ofits regulation Nat Rev Mol Cell Biol 11 113ndash127

39 KozakM (1984) Selection of initiation sites by eucaryoticribosomes effect of inserting AUG triplets upstream from thecoding sequence for preproinsulin Nucleic Acids Res 123873ndash3893

40 SomersJ PoyryT and WillisAE (2013) A perspective onmammalian upstream open reading frame function Int JBiochem Cell Biol 45 1690ndash1700

41 SiegelTN TanKS and CrossGAM (2005) Systematic studyof sequence motifs for RNA trans splicing in Trypanosomabrucei Mol Cell Biol 25 9586ndash9594

42 AravaY WangY StoreyJD LiuCL BrownPO andHerschlagD (2003) Genome-wide analysis of mRNA translationprofiles in Saccharomyces cerevisiae Proc Natl Acad Sci USA100 3889ndash3894

43 CapewellP MonkS IvensA MacgregorP FennKWalradP BringaudF SmithTK and MatthewsKR (2013)Regulation of trypanosoma brucei total and polysomal mRNAduring development within its mammalian host PLoS One 8e67069

44 BrechtM and ParsonsM (1998) Changes in polysome profilesaccompany trypanosome development Mol Biochem Parasitol97 189ndash198

45 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

46 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

47 OhE BeckerAH SandikciA HuberD ChabaR GlogeFNicholsRJ TypasA GrossCA KramerG et al (2011)Selective ribosome profiling reveals the cotranslational chaperoneaction of trigger factor in vivo Cell 147 1295ndash1308

48 MichelAM and BaranovPV (2013) Ribosome profiling aHi-Def monitor for protein synthesis at the genome-wide scaleWiley Interdiscip Rev RNA 349 4184ndash4188

49 IngoliaNT (2010) Genome-wide translational profiling byribosome footprinting Methods Enzymol 470 119ndash142

50 IngoliaNT BrarGA RouskinS McGeachyAM andWeissmanJS (2012) The ribosome profiling strategy formonitoring translation in vivo by deep sequencing ofribosome-protected mRNA fragments Nat Protoc 7 1534ndash1550

51 LangmeadB and SalzbergSL (2012) Fast gapped-readalignment with Bowtie 2 Nat Methods 9 357ndash359

Nucleic Acids Research 2014 Vol 42 No 6 3635

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 11: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

resolution of the RIT-seq data (8081) Nevertheless thesefindings suggest an important biological role for a subsetof genes that have been missed in previous genome anno-tations Intriguingly the RNAi data reveal large develop-mental differences in the fitness costs associated with thedown-regulation of putative CDSs suggesting that the setof candidate CDSs may be more important during themammalian stage

DISCUSSION

In this study we report the first genome-wide analysis ofprotein synthesis and a strand-specific analysis of RNAtranscript levels for T brucei It is the first such analysisfor a eukaryotic pathogen and an organism without tran-scriptional control Our data reveal large differences intranslational efficiency among transcripts in the same lifecycle stage and between the same transcript in differentstages In addition the sequencing of ribosome-protectedRNA enabled us to identify transcripts that are likely tobe translated Hundreds of putative previously unidenti-fied CDSs appear to be essential for parasite fitness andthe analysis of available proteomics data confirmed theexistence of at least 20 of these previously unknown CDSs

Eukaryotic gene expression is regulated at multiplelevels with translation of mRNA into proteins beingone of the most important (17) The ability to determineboth translatome and transcriptome enabled us toevaluate the efficiency with which individual transcriptsare translated into proteins and suggests translationalcontrol to be an important regulatory mechanism inT brucei Under a single condition we observed transla-tional efficiency to vary over two orders of magnitudethus its importance equals that of RNA stability forwhich a similar range has been measured (82) Howeverwe observed no correlation between RNA abundance andtranslational efficiency suggesting that translational effi-ciency is regulated independently of RNA stability

To use ribosome density to estimate the rate of proteinsynthesis the speed of translation must be constant Thisassumption is supported by measurements in mouse em-bryonic stem cells that demonstrate the rate of translationto be consistent between different classes of mRNA thekinetics of elongation to be independent of length andprotein abundance and the speed of translation to be in-dependent of codon usage (4664) Nevertheless the speedof translation may very well be different between the twolife cycle stages of the parasite that live at 37C and 27CFurthermore our approach does not permit measurementof the absolute mRNA and footprint abundance for theindividual life cycle stages making direct comparisons oftranslation efficiencies for individual transcripts unreli-able Therefore we did not compare translational effi-ciency directly but rather determined the lsquorankrsquo intranslational efficiency for all genes in each life cyclestage assuming that a lsquochange in rankrsquo indicated lifecycle-dependent translational control While we see ageneral positive correlation in translational efficiencybetween the two life cycle stages (Pearson r=07428)for a large number of genes we observed developmental

regulation of translation Importantly the regulation weobserve agrees well with that reported in previous studiesfor a small subset of genes but also includes many proteinspreviously unknown to be developmentally regulated Inaddition the efficient translation of proteins annotated ascytochrome oxidases in the PF and of enzymes importantfor glycolysis in the BF is consistent with differences inenergy metabolism with the BF entirely dependent on gly-colysis to generate energyHow translational control is achieved in T brucei

remains to be seen but our analysis suggests that uORFmay be an important contributor Work in several organ-isms has shown that generally the presence of uORFs cor-relates with reduced gene expression (71) However forsome genes like the transcription factor GCN4 in yeastand the activating transcription factor ATF4 in mammalsthis correlation is reversed on stress (8384) Thus uORFcan exert positive and negative effects on translation ofdownstream CDSs Based on previous 50 UTR annota-tions we identified uORFs in 22 of T brucei genesbut it will be interesting to learn if the regulatory effectsof uORFs may be supplemented by translation from non-AUG uORFs Our ribosome profiling data indicate anincrease in translation from 50 UTRs and a more ubiqui-tous translation initiation from non-AUG sites in thePF compared with the BF Similarly an increase in thetranslation from 50 UTRs has been observed in yeast onstarvation (45) One of the most highly conserved stress-induced mechanisms to regulate translation involves theinactivation of the translation initiation factor eIF2aby phosphorylation at a conserved serine (8586)Phosphorylation of the T brucei and L major orthologsoccurs at the corresponding Thr residues (687) but thegeneration of a T brucei cell line exclusively expressing amutant form of eIF2a demonstrated that phosphorylationof eIF2a is not required for heat-induced translationalarrest or for the formation of heat shock stress granulesin T brucei (88) Given the finding that eIF2a may also beimportant in determining the stringency in initiator codonselection (89) it will be interesting to see whether eIF2aphosphorylation affects the selection of initiator codonsand whether this plays a role in the fine-tuning of geneexpressionBesides being a valuable tool to study translational

regulation ribosome profiling will also be invaluable forthe discovery of small peptides While only 2 of thehuman genome contains annotated CDSs (90) transcrip-tome analyses have revealed almost genome-wide tran-scription of RNA mostly considered to be non-protein-coding In lower organisms the percentage of thegenome encoding for proteins is higher (629192) butpervasive transcription of non-protein coding regionscan be found in essentially all organisms (93) The vastamount of ncRNA has triggered great interest inelucidating the biological significance of these transcriptsand has led to the identification of numerous regulatorymechanisms controlled by short and long ncRNAInterestingly the majority of transcription occurs aslong ncRNA in humans (94) many of which containORFs making it difficult to unambiguously classify a tran-script as non-protein-coding To distinguish long ncRNA

Nucleic Acids Research 2014 Vol 42 No 6 3633

from mRNA ORF length has served as the mostcommonly applied criterion because even without select-ive pressure short ORFs will occur frequently by chancewhile the probability that long ORFs occur by chance arelow Nevertheless recent findings reveal that eukaryoticgenomes contain a large number of small CDSs (95ndash97)and that small peptides play important biological roles(77789899) In addition numerous long ncRNAs con-taining ORFs longer than 100 aa have been found to as-sociate with ribosomes (4653) While these findingscomplicate the annotation of CDSs and demand for newapproaches to annotate genes they raise the exciting pos-sibility that a large number of unknown small CDSs existsand awaits discoveryFor this study RNA was obtained from a 427 strain

the most common laboratory strain and sequence readswere aligned to the more complete genome of the 927strain nevertheless RNA-sequencing analysis indicateswidespread transcription Thus as seen in other organ-isms active transcription is not necessarily a good pre-dictor of CDSs In contrast we found ribosomefootprint density to be highly enriched across annotatedCDSs excluding introns It should thus serve as a reliabletool to identify novel genes To search for new CDSs andto evaluate their biological function we combinedribosome profiling proteomics and genome-wide RNAidata While proteomics data allowed us to identifyprotein products for only 20 relatively large candidateCDSs the experimental set-up of the proteomic analysisthat involved the removal of small peptides made itimpossible to verify the existence of many small CDSsThus more important than the verification of putativeCDSs with proteomics data may be our finding thatgt200 of the putative CDSs appear to be essential forparasite fitness Furthermore RIT-seq data suggest thatcandidate CDSs are 13-fold less likely to be essentialfor viability in the PF than in the BF or during differen-tiation of cells from BF to PF These findings point to apossible role of small peptides in the adaptation of tryp-anosomes to life in the mammalian host Generally ourdata suggest the existence of a large yet to be exploredsmall proteome but further characterizations of thesmall proteome will have to include a more targetedproteomics analysisIn this analysis we focused on measuring translational

control and the identification of new CDSs Howeverribosome profiling data should also aid in the identifica-tion of bona fide ncRNAs For many annotated CDSs weobserved RNA transcripts but not ribosome footprintssuggesting the presence of ncRNA Though given thatwe have only analysed two life cycle stages theseputative ncRNAs may be translated during other stagesof the parasitersquos life cycle Thus a comprehensive searchfor ncRNA should include ribosome profiling analyses ofall life cycle stages

ACCESSION NUMBERS

All sequencing data have been deposited in the EuropeanNucleotide Archive PRJEB4801

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Stan Gorski Susanne Kramer Christian Janzenand Jose-Juan Lopez-Rubio for valuable discussions andcritical reading of the manuscript and Yanjie Chao andJan Medenbach for much appreciated advice ongenerating ribosome profiling libraries

FUNDING

Young Investigator Program of the Research Center ofInfectious Diseases (ZINF) of the University ofWuerzburg Germany German Research FoundationDFG [SI 16102-1] Human Frontier Science Program(to TNS) French National Research Agency [ANR-2010-GENM-011-01 GENAMIBE to CCH] Fundingfor open access charge German Research Foundation(DFG) and the University of Wuerzburg in the fundingprogramme Open Access Publishing

Conflict of interest statement None declared

REFERENCES

1 MooreMJ (2005) From birth to death the complex lives ofeukaryotic mRNAs Science 309 1514ndash1518

2 NagarajN WisniewskiJR GeigerT CoxJ KircherMKelsoJ PaaboS and MannM (2011) Deep proteome andtranscriptome mapping of a human cancer cell line Mol SystBiol 7 548

3 LuP VogelC WangR YaoX and MarcotteEM (2007)Absolute protein expression profiling estimates the relativecontributions of transcriptional and translational regulationNat Biotechnol 25 117ndash124

4 de GodoyLM OlsenJV CoxJ NielsenML HubnerNCFrohlichF WaltherTC and MannM (2008) Comprehensivemass-spectrometry-based proteome quantification of haploidversus diploid yeast Nature 455 1251ndash1254

5 McNicollF DrummelsmithJ MullerM MadoreE BoilardNOuelletteM and PapadopoulouB (2006) A combinedproteomic and transcriptomic approach to the study ofstage differentiation in Leishmania infantum Proteomics 63567ndash3581

6 LahavT SivamD VolpinH RonenM TsigankovPGreenA HollandN KuzykM BorchersC ZilbersteinDet al (2011) Multiple levels of gene regulation mediatedifferentiation of the intracellular pathogen LeishmaniaFASEB J 25 515ndash525

7 SchwanhausserB BusseD LiN DittmarG SchuchhardtJWolfJ ChenW and SelbachM (2011) Global quantificationof mammalian gene expression control Nature 473 337ndash342

8 FevreEM WissmannBV WelburnSC and LutumbaP (2008)The burden of human African trypanosomiasis PLoS Negl TropDis 2 e333

9 LozanoR NaghaviM ForemanK LimS ShibuyaKAboyansV AbrahamJ AdairT AggarwalR AhnSY et al(2012) Global and regional mortality from 235 causes ofdeath for 20 age groups in 1990 and 2010 a systematic analysisfor the Global Burden of Disease Study 2010 Lancet 3802095ndash2128

10 Martinez-CalvilloS YanS NguyenD FoxM StuartK andMylerPJ (2003) Transcription of Leishmania major Friedlinchromosome 1 initiates in both directions within a single regionMol Cell 11 1291ndash1299

3634 Nucleic Acids Research 2014 Vol 42 No 6

11 Martinez-CalvilloS NguyenD StuartK and MylerPJ (2004)Transcription initiation and termination on Leishmania majorchromosome 3 Eukaryot Cell 3 506ndash517

12 SiegelTN HekstraDR KempLE FigueiredoLMLowellJE FenyoD WangX DewellS and CrossGAM(2009) Four histone variants mark the boundaries of polycistronictranscription units in Trypanosoma brucei Genes Dev 231063ndash1076

13 KolevNG FranklinJB CarmiS ShiH MichaeliS andTschudiC (2010) The transcriptome of the human pathogenTrypanosoma brucei at single-nucleotide resolutionPLoS Pathog 6 e1001090

14 LeBowitzJH SmithHQ RuscheL and BeverleySM (1993)Coupling of poly(A) site selection and trans-splicing inLeishmania Genes Dev 7 996ndash1007

15 UlluE MatthewsKR and TschudiC (1993) Temporal order ofRNA-processing reactions in trypanosomes rapid trans splicingprecedes polyadenylation of newly synthesized tubulin transcriptsMol Cell Biol 13 720ndash725

16 MatthewsKR TschudiC and UlluE (1994) A commonpyrimidine-rich motif governs trans-splicing and polyadenylationof tubulin polycistronic pre-mRNA in trypanosomes Genes Dev8 491ndash501

17 WrightJR SiegelTN and CrossGAM (2010) Histone H3trimethylated at lysine 4 is enriched at probable transcriptionstart sites in Trypanosoma brucei Mol Biochem Parasitol 136434ndash450

18 ClaytonC and ShapiraM (2007) Post-transcriptional regulationof gene expression in trypanosomes and leishmanias MolBiochem Parasitol 156 93ndash101

19 MatthewsKR (2011) Controlling and coordinating developmentin vector-transmitted parasites Science 331 1149ndash1153

20 CrossGA KleinRA and LinsteadDJ (1975) Utilization ofamino acids by Trypanosoma brucei in culture L-threonine as aprecursor for acetate Parasitology 71 311ndash326

21 BrunR and SchonenbergerM (1979) Cultivation and in vitrocloning or procyclic culture forms of Trypanosoma brucei in asemi-defined medium Acta Trop 36 289ndash292

22 FurgerA SchurchN KurathU and RoditiI (1997) Elementsin the 30 untranslated region of procyclin mRNA regulateexpression in insect forms of Trypanosoma brucei bymodulating RNA stability and translation Mol Cell Biol 174372ndash4380

23 HehlA VassellaE BraunR and RoditiI (1994) A conservedstem-loop structure in the 30 untranslated region of procyclinmRNAs regulates expression in Trypanosoma brucei Proc NatlAcad Sci USA 91 370ndash374

24 HotzHR HartmannC HuoberK HugM and ClaytonC(1997) Mechanisms of developmental regulation in Trypanosomabrucei a polypyrimidine tract in the 30-untranslated region of asurface protein mRNA affects RNA abundance and translationNucleic Acids Res 25 3017ndash3026

25 MayhoM FennK CraddyP CrosthwaiteS and MatthewsK(2006) Post-transcriptional control of nuclear-encoded cytochromeoxidase subunits in Trypanosoma brucei evidence for genome-wide conservation of life-cycle stage-specific regulatory elementsNucleic Acids Res 34 5312ndash5324

26 WalradP PaterouA Acosta-SerranoA and MatthewsKR(2009) Differential trypanosome surface coat regulation by aCCCH protein that co-associates with procyclin mRNA cis-elements PLoS Pathog 5 e1000317

27 HelmJR WilsonME and DonelsonJE (2009) Differentialexpression of a protease gene family in African trypanosomesMol Biochem Parasitol 163 8ndash18

28 HornD (2008) Codon usage suggests that translational selectionhas a major impact on protein expression in trypanosomatidsBMC Genomics 9 2

29 ManfulT FaddaA and ClaytonC (2011) The role of the 50-30

exoribonuclease XRNA in transcriptome-wide mRNAdegradation RNA 17 2039ndash2047

30 YoffeY ZuberekJ LewdorowiczM ZeiraZ KeasarCOrr-DahanI Jankowska-AnyszkaM StepinskiJDarzynkiewiczE and ShapiraM (2004) Cap-binding activity ofan eIF4E homolog from Leishmania RNA 10 1764ndash1775

31 YoffeY ZuberekJ LererA LewdorowiczM StepinskiJAltmannM DarzynkiewiczE and ShapiraM (2006) Bindingspecificities and potential roles of isoforms of eukaryotic initiationfactor 4E in Leishmania Eukaryot Cell 5 1969ndash1979

32 De GaudenziJ FraschAC and ClaytonC (2005) RNA-bindingdomain proteins in Kinetoplastids a comparative analysisEukaryot Cell 4 2106ndash2114

33 DhaliaR ReisCR FreireER RochaPO KatzRMunizJR StandartN and de Melo NetoOP (2005)Translation initiation in Leishmania major characterisation ofmultiple eIF4F subunit homologues Mol Biochem Parasitol140 23ndash41

34 ZinovievA and ShapiraM (2012) Evolutionary conservation anddiversification of the translation initiation apparatus intrypanosomatids Comp Funct Genomics 2012 813718

35 SiegelTN HekstraDR WangX DewellS and CrossGAM(2010) Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicingand polyadenylation sites Nucleic Acids Res 38 4946ndash4957

36 NilssonD GunasekeraK ManiJ OsterasM FarinelliLBaerlocherL RoditiI and OchsenreiterT (2010) Spliced leadertrapping reveals widespread alternative splicing patterns in thehighly dynamic transcriptome of Trypanosoma brucei PLoSPathog 6 e1001037

37 SiegelTN GunasekeraK CrossGA and OchsenreiterT(2011) Gene expression in Trypanosoma brucei lessons fromhigh-throughput RNA sequencing Trends Parasitol 27 434ndash441

38 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principles ofits regulation Nat Rev Mol Cell Biol 11 113ndash127

39 KozakM (1984) Selection of initiation sites by eucaryoticribosomes effect of inserting AUG triplets upstream from thecoding sequence for preproinsulin Nucleic Acids Res 123873ndash3893

40 SomersJ PoyryT and WillisAE (2013) A perspective onmammalian upstream open reading frame function Int JBiochem Cell Biol 45 1690ndash1700

41 SiegelTN TanKS and CrossGAM (2005) Systematic studyof sequence motifs for RNA trans splicing in Trypanosomabrucei Mol Cell Biol 25 9586ndash9594

42 AravaY WangY StoreyJD LiuCL BrownPO andHerschlagD (2003) Genome-wide analysis of mRNA translationprofiles in Saccharomyces cerevisiae Proc Natl Acad Sci USA100 3889ndash3894

43 CapewellP MonkS IvensA MacgregorP FennKWalradP BringaudF SmithTK and MatthewsKR (2013)Regulation of trypanosoma brucei total and polysomal mRNAduring development within its mammalian host PLoS One 8e67069

44 BrechtM and ParsonsM (1998) Changes in polysome profilesaccompany trypanosome development Mol Biochem Parasitol97 189ndash198

45 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

46 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

47 OhE BeckerAH SandikciA HuberD ChabaR GlogeFNicholsRJ TypasA GrossCA KramerG et al (2011)Selective ribosome profiling reveals the cotranslational chaperoneaction of trigger factor in vivo Cell 147 1295ndash1308

48 MichelAM and BaranovPV (2013) Ribosome profiling aHi-Def monitor for protein synthesis at the genome-wide scaleWiley Interdiscip Rev RNA 349 4184ndash4188

49 IngoliaNT (2010) Genome-wide translational profiling byribosome footprinting Methods Enzymol 470 119ndash142

50 IngoliaNT BrarGA RouskinS McGeachyAM andWeissmanJS (2012) The ribosome profiling strategy formonitoring translation in vivo by deep sequencing ofribosome-protected mRNA fragments Nat Protoc 7 1534ndash1550

51 LangmeadB and SalzbergSL (2012) Fast gapped-readalignment with Bowtie 2 Nat Methods 9 357ndash359

Nucleic Acids Research 2014 Vol 42 No 6 3635

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 12: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

from mRNA ORF length has served as the mostcommonly applied criterion because even without select-ive pressure short ORFs will occur frequently by chancewhile the probability that long ORFs occur by chance arelow Nevertheless recent findings reveal that eukaryoticgenomes contain a large number of small CDSs (95ndash97)and that small peptides play important biological roles(77789899) In addition numerous long ncRNAs con-taining ORFs longer than 100 aa have been found to as-sociate with ribosomes (4653) While these findingscomplicate the annotation of CDSs and demand for newapproaches to annotate genes they raise the exciting pos-sibility that a large number of unknown small CDSs existsand awaits discoveryFor this study RNA was obtained from a 427 strain

the most common laboratory strain and sequence readswere aligned to the more complete genome of the 927strain nevertheless RNA-sequencing analysis indicateswidespread transcription Thus as seen in other organ-isms active transcription is not necessarily a good pre-dictor of CDSs In contrast we found ribosomefootprint density to be highly enriched across annotatedCDSs excluding introns It should thus serve as a reliabletool to identify novel genes To search for new CDSs andto evaluate their biological function we combinedribosome profiling proteomics and genome-wide RNAidata While proteomics data allowed us to identifyprotein products for only 20 relatively large candidateCDSs the experimental set-up of the proteomic analysisthat involved the removal of small peptides made itimpossible to verify the existence of many small CDSsThus more important than the verification of putativeCDSs with proteomics data may be our finding thatgt200 of the putative CDSs appear to be essential forparasite fitness Furthermore RIT-seq data suggest thatcandidate CDSs are 13-fold less likely to be essentialfor viability in the PF than in the BF or during differen-tiation of cells from BF to PF These findings point to apossible role of small peptides in the adaptation of tryp-anosomes to life in the mammalian host Generally ourdata suggest the existence of a large yet to be exploredsmall proteome but further characterizations of thesmall proteome will have to include a more targetedproteomics analysisIn this analysis we focused on measuring translational

control and the identification of new CDSs Howeverribosome profiling data should also aid in the identifica-tion of bona fide ncRNAs For many annotated CDSs weobserved RNA transcripts but not ribosome footprintssuggesting the presence of ncRNA Though given thatwe have only analysed two life cycle stages theseputative ncRNAs may be translated during other stagesof the parasitersquos life cycle Thus a comprehensive searchfor ncRNA should include ribosome profiling analyses ofall life cycle stages

ACCESSION NUMBERS

All sequencing data have been deposited in the EuropeanNucleotide Archive PRJEB4801

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Stan Gorski Susanne Kramer Christian Janzenand Jose-Juan Lopez-Rubio for valuable discussions andcritical reading of the manuscript and Yanjie Chao andJan Medenbach for much appreciated advice ongenerating ribosome profiling libraries

FUNDING

Young Investigator Program of the Research Center ofInfectious Diseases (ZINF) of the University ofWuerzburg Germany German Research FoundationDFG [SI 16102-1] Human Frontier Science Program(to TNS) French National Research Agency [ANR-2010-GENM-011-01 GENAMIBE to CCH] Fundingfor open access charge German Research Foundation(DFG) and the University of Wuerzburg in the fundingprogramme Open Access Publishing

Conflict of interest statement None declared

REFERENCES

1 MooreMJ (2005) From birth to death the complex lives ofeukaryotic mRNAs Science 309 1514ndash1518

2 NagarajN WisniewskiJR GeigerT CoxJ KircherMKelsoJ PaaboS and MannM (2011) Deep proteome andtranscriptome mapping of a human cancer cell line Mol SystBiol 7 548

3 LuP VogelC WangR YaoX and MarcotteEM (2007)Absolute protein expression profiling estimates the relativecontributions of transcriptional and translational regulationNat Biotechnol 25 117ndash124

4 de GodoyLM OlsenJV CoxJ NielsenML HubnerNCFrohlichF WaltherTC and MannM (2008) Comprehensivemass-spectrometry-based proteome quantification of haploidversus diploid yeast Nature 455 1251ndash1254

5 McNicollF DrummelsmithJ MullerM MadoreE BoilardNOuelletteM and PapadopoulouB (2006) A combinedproteomic and transcriptomic approach to the study ofstage differentiation in Leishmania infantum Proteomics 63567ndash3581

6 LahavT SivamD VolpinH RonenM TsigankovPGreenA HollandN KuzykM BorchersC ZilbersteinDet al (2011) Multiple levels of gene regulation mediatedifferentiation of the intracellular pathogen LeishmaniaFASEB J 25 515ndash525

7 SchwanhausserB BusseD LiN DittmarG SchuchhardtJWolfJ ChenW and SelbachM (2011) Global quantificationof mammalian gene expression control Nature 473 337ndash342

8 FevreEM WissmannBV WelburnSC and LutumbaP (2008)The burden of human African trypanosomiasis PLoS Negl TropDis 2 e333

9 LozanoR NaghaviM ForemanK LimS ShibuyaKAboyansV AbrahamJ AdairT AggarwalR AhnSY et al(2012) Global and regional mortality from 235 causes ofdeath for 20 age groups in 1990 and 2010 a systematic analysisfor the Global Burden of Disease Study 2010 Lancet 3802095ndash2128

10 Martinez-CalvilloS YanS NguyenD FoxM StuartK andMylerPJ (2003) Transcription of Leishmania major Friedlinchromosome 1 initiates in both directions within a single regionMol Cell 11 1291ndash1299

3634 Nucleic Acids Research 2014 Vol 42 No 6

11 Martinez-CalvilloS NguyenD StuartK and MylerPJ (2004)Transcription initiation and termination on Leishmania majorchromosome 3 Eukaryot Cell 3 506ndash517

12 SiegelTN HekstraDR KempLE FigueiredoLMLowellJE FenyoD WangX DewellS and CrossGAM(2009) Four histone variants mark the boundaries of polycistronictranscription units in Trypanosoma brucei Genes Dev 231063ndash1076

13 KolevNG FranklinJB CarmiS ShiH MichaeliS andTschudiC (2010) The transcriptome of the human pathogenTrypanosoma brucei at single-nucleotide resolutionPLoS Pathog 6 e1001090

14 LeBowitzJH SmithHQ RuscheL and BeverleySM (1993)Coupling of poly(A) site selection and trans-splicing inLeishmania Genes Dev 7 996ndash1007

15 UlluE MatthewsKR and TschudiC (1993) Temporal order ofRNA-processing reactions in trypanosomes rapid trans splicingprecedes polyadenylation of newly synthesized tubulin transcriptsMol Cell Biol 13 720ndash725

16 MatthewsKR TschudiC and UlluE (1994) A commonpyrimidine-rich motif governs trans-splicing and polyadenylationof tubulin polycistronic pre-mRNA in trypanosomes Genes Dev8 491ndash501

17 WrightJR SiegelTN and CrossGAM (2010) Histone H3trimethylated at lysine 4 is enriched at probable transcriptionstart sites in Trypanosoma brucei Mol Biochem Parasitol 136434ndash450

18 ClaytonC and ShapiraM (2007) Post-transcriptional regulationof gene expression in trypanosomes and leishmanias MolBiochem Parasitol 156 93ndash101

19 MatthewsKR (2011) Controlling and coordinating developmentin vector-transmitted parasites Science 331 1149ndash1153

20 CrossGA KleinRA and LinsteadDJ (1975) Utilization ofamino acids by Trypanosoma brucei in culture L-threonine as aprecursor for acetate Parasitology 71 311ndash326

21 BrunR and SchonenbergerM (1979) Cultivation and in vitrocloning or procyclic culture forms of Trypanosoma brucei in asemi-defined medium Acta Trop 36 289ndash292

22 FurgerA SchurchN KurathU and RoditiI (1997) Elementsin the 30 untranslated region of procyclin mRNA regulateexpression in insect forms of Trypanosoma brucei bymodulating RNA stability and translation Mol Cell Biol 174372ndash4380

23 HehlA VassellaE BraunR and RoditiI (1994) A conservedstem-loop structure in the 30 untranslated region of procyclinmRNAs regulates expression in Trypanosoma brucei Proc NatlAcad Sci USA 91 370ndash374

24 HotzHR HartmannC HuoberK HugM and ClaytonC(1997) Mechanisms of developmental regulation in Trypanosomabrucei a polypyrimidine tract in the 30-untranslated region of asurface protein mRNA affects RNA abundance and translationNucleic Acids Res 25 3017ndash3026

25 MayhoM FennK CraddyP CrosthwaiteS and MatthewsK(2006) Post-transcriptional control of nuclear-encoded cytochromeoxidase subunits in Trypanosoma brucei evidence for genome-wide conservation of life-cycle stage-specific regulatory elementsNucleic Acids Res 34 5312ndash5324

26 WalradP PaterouA Acosta-SerranoA and MatthewsKR(2009) Differential trypanosome surface coat regulation by aCCCH protein that co-associates with procyclin mRNA cis-elements PLoS Pathog 5 e1000317

27 HelmJR WilsonME and DonelsonJE (2009) Differentialexpression of a protease gene family in African trypanosomesMol Biochem Parasitol 163 8ndash18

28 HornD (2008) Codon usage suggests that translational selectionhas a major impact on protein expression in trypanosomatidsBMC Genomics 9 2

29 ManfulT FaddaA and ClaytonC (2011) The role of the 50-30

exoribonuclease XRNA in transcriptome-wide mRNAdegradation RNA 17 2039ndash2047

30 YoffeY ZuberekJ LewdorowiczM ZeiraZ KeasarCOrr-DahanI Jankowska-AnyszkaM StepinskiJDarzynkiewiczE and ShapiraM (2004) Cap-binding activity ofan eIF4E homolog from Leishmania RNA 10 1764ndash1775

31 YoffeY ZuberekJ LererA LewdorowiczM StepinskiJAltmannM DarzynkiewiczE and ShapiraM (2006) Bindingspecificities and potential roles of isoforms of eukaryotic initiationfactor 4E in Leishmania Eukaryot Cell 5 1969ndash1979

32 De GaudenziJ FraschAC and ClaytonC (2005) RNA-bindingdomain proteins in Kinetoplastids a comparative analysisEukaryot Cell 4 2106ndash2114

33 DhaliaR ReisCR FreireER RochaPO KatzRMunizJR StandartN and de Melo NetoOP (2005)Translation initiation in Leishmania major characterisation ofmultiple eIF4F subunit homologues Mol Biochem Parasitol140 23ndash41

34 ZinovievA and ShapiraM (2012) Evolutionary conservation anddiversification of the translation initiation apparatus intrypanosomatids Comp Funct Genomics 2012 813718

35 SiegelTN HekstraDR WangX DewellS and CrossGAM(2010) Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicingand polyadenylation sites Nucleic Acids Res 38 4946ndash4957

36 NilssonD GunasekeraK ManiJ OsterasM FarinelliLBaerlocherL RoditiI and OchsenreiterT (2010) Spliced leadertrapping reveals widespread alternative splicing patterns in thehighly dynamic transcriptome of Trypanosoma brucei PLoSPathog 6 e1001037

37 SiegelTN GunasekeraK CrossGA and OchsenreiterT(2011) Gene expression in Trypanosoma brucei lessons fromhigh-throughput RNA sequencing Trends Parasitol 27 434ndash441

38 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principles ofits regulation Nat Rev Mol Cell Biol 11 113ndash127

39 KozakM (1984) Selection of initiation sites by eucaryoticribosomes effect of inserting AUG triplets upstream from thecoding sequence for preproinsulin Nucleic Acids Res 123873ndash3893

40 SomersJ PoyryT and WillisAE (2013) A perspective onmammalian upstream open reading frame function Int JBiochem Cell Biol 45 1690ndash1700

41 SiegelTN TanKS and CrossGAM (2005) Systematic studyof sequence motifs for RNA trans splicing in Trypanosomabrucei Mol Cell Biol 25 9586ndash9594

42 AravaY WangY StoreyJD LiuCL BrownPO andHerschlagD (2003) Genome-wide analysis of mRNA translationprofiles in Saccharomyces cerevisiae Proc Natl Acad Sci USA100 3889ndash3894

43 CapewellP MonkS IvensA MacgregorP FennKWalradP BringaudF SmithTK and MatthewsKR (2013)Regulation of trypanosoma brucei total and polysomal mRNAduring development within its mammalian host PLoS One 8e67069

44 BrechtM and ParsonsM (1998) Changes in polysome profilesaccompany trypanosome development Mol Biochem Parasitol97 189ndash198

45 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

46 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

47 OhE BeckerAH SandikciA HuberD ChabaR GlogeFNicholsRJ TypasA GrossCA KramerG et al (2011)Selective ribosome profiling reveals the cotranslational chaperoneaction of trigger factor in vivo Cell 147 1295ndash1308

48 MichelAM and BaranovPV (2013) Ribosome profiling aHi-Def monitor for protein synthesis at the genome-wide scaleWiley Interdiscip Rev RNA 349 4184ndash4188

49 IngoliaNT (2010) Genome-wide translational profiling byribosome footprinting Methods Enzymol 470 119ndash142

50 IngoliaNT BrarGA RouskinS McGeachyAM andWeissmanJS (2012) The ribosome profiling strategy formonitoring translation in vivo by deep sequencing ofribosome-protected mRNA fragments Nat Protoc 7 1534ndash1550

51 LangmeadB and SalzbergSL (2012) Fast gapped-readalignment with Bowtie 2 Nat Methods 9 357ndash359

Nucleic Acids Research 2014 Vol 42 No 6 3635

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 13: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

11 Martinez-CalvilloS NguyenD StuartK and MylerPJ (2004)Transcription initiation and termination on Leishmania majorchromosome 3 Eukaryot Cell 3 506ndash517

12 SiegelTN HekstraDR KempLE FigueiredoLMLowellJE FenyoD WangX DewellS and CrossGAM(2009) Four histone variants mark the boundaries of polycistronictranscription units in Trypanosoma brucei Genes Dev 231063ndash1076

13 KolevNG FranklinJB CarmiS ShiH MichaeliS andTschudiC (2010) The transcriptome of the human pathogenTrypanosoma brucei at single-nucleotide resolutionPLoS Pathog 6 e1001090

14 LeBowitzJH SmithHQ RuscheL and BeverleySM (1993)Coupling of poly(A) site selection and trans-splicing inLeishmania Genes Dev 7 996ndash1007

15 UlluE MatthewsKR and TschudiC (1993) Temporal order ofRNA-processing reactions in trypanosomes rapid trans splicingprecedes polyadenylation of newly synthesized tubulin transcriptsMol Cell Biol 13 720ndash725

16 MatthewsKR TschudiC and UlluE (1994) A commonpyrimidine-rich motif governs trans-splicing and polyadenylationof tubulin polycistronic pre-mRNA in trypanosomes Genes Dev8 491ndash501

17 WrightJR SiegelTN and CrossGAM (2010) Histone H3trimethylated at lysine 4 is enriched at probable transcriptionstart sites in Trypanosoma brucei Mol Biochem Parasitol 136434ndash450

18 ClaytonC and ShapiraM (2007) Post-transcriptional regulationof gene expression in trypanosomes and leishmanias MolBiochem Parasitol 156 93ndash101

19 MatthewsKR (2011) Controlling and coordinating developmentin vector-transmitted parasites Science 331 1149ndash1153

20 CrossGA KleinRA and LinsteadDJ (1975) Utilization ofamino acids by Trypanosoma brucei in culture L-threonine as aprecursor for acetate Parasitology 71 311ndash326

21 BrunR and SchonenbergerM (1979) Cultivation and in vitrocloning or procyclic culture forms of Trypanosoma brucei in asemi-defined medium Acta Trop 36 289ndash292

22 FurgerA SchurchN KurathU and RoditiI (1997) Elementsin the 30 untranslated region of procyclin mRNA regulateexpression in insect forms of Trypanosoma brucei bymodulating RNA stability and translation Mol Cell Biol 174372ndash4380

23 HehlA VassellaE BraunR and RoditiI (1994) A conservedstem-loop structure in the 30 untranslated region of procyclinmRNAs regulates expression in Trypanosoma brucei Proc NatlAcad Sci USA 91 370ndash374

24 HotzHR HartmannC HuoberK HugM and ClaytonC(1997) Mechanisms of developmental regulation in Trypanosomabrucei a polypyrimidine tract in the 30-untranslated region of asurface protein mRNA affects RNA abundance and translationNucleic Acids Res 25 3017ndash3026

25 MayhoM FennK CraddyP CrosthwaiteS and MatthewsK(2006) Post-transcriptional control of nuclear-encoded cytochromeoxidase subunits in Trypanosoma brucei evidence for genome-wide conservation of life-cycle stage-specific regulatory elementsNucleic Acids Res 34 5312ndash5324

26 WalradP PaterouA Acosta-SerranoA and MatthewsKR(2009) Differential trypanosome surface coat regulation by aCCCH protein that co-associates with procyclin mRNA cis-elements PLoS Pathog 5 e1000317

27 HelmJR WilsonME and DonelsonJE (2009) Differentialexpression of a protease gene family in African trypanosomesMol Biochem Parasitol 163 8ndash18

28 HornD (2008) Codon usage suggests that translational selectionhas a major impact on protein expression in trypanosomatidsBMC Genomics 9 2

29 ManfulT FaddaA and ClaytonC (2011) The role of the 50-30

exoribonuclease XRNA in transcriptome-wide mRNAdegradation RNA 17 2039ndash2047

30 YoffeY ZuberekJ LewdorowiczM ZeiraZ KeasarCOrr-DahanI Jankowska-AnyszkaM StepinskiJDarzynkiewiczE and ShapiraM (2004) Cap-binding activity ofan eIF4E homolog from Leishmania RNA 10 1764ndash1775

31 YoffeY ZuberekJ LererA LewdorowiczM StepinskiJAltmannM DarzynkiewiczE and ShapiraM (2006) Bindingspecificities and potential roles of isoforms of eukaryotic initiationfactor 4E in Leishmania Eukaryot Cell 5 1969ndash1979

32 De GaudenziJ FraschAC and ClaytonC (2005) RNA-bindingdomain proteins in Kinetoplastids a comparative analysisEukaryot Cell 4 2106ndash2114

33 DhaliaR ReisCR FreireER RochaPO KatzRMunizJR StandartN and de Melo NetoOP (2005)Translation initiation in Leishmania major characterisation ofmultiple eIF4F subunit homologues Mol Biochem Parasitol140 23ndash41

34 ZinovievA and ShapiraM (2012) Evolutionary conservation anddiversification of the translation initiation apparatus intrypanosomatids Comp Funct Genomics 2012 813718

35 SiegelTN HekstraDR WangX DewellS and CrossGAM(2010) Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicingand polyadenylation sites Nucleic Acids Res 38 4946ndash4957

36 NilssonD GunasekeraK ManiJ OsterasM FarinelliLBaerlocherL RoditiI and OchsenreiterT (2010) Spliced leadertrapping reveals widespread alternative splicing patterns in thehighly dynamic transcriptome of Trypanosoma brucei PLoSPathog 6 e1001037

37 SiegelTN GunasekeraK CrossGA and OchsenreiterT(2011) Gene expression in Trypanosoma brucei lessons fromhigh-throughput RNA sequencing Trends Parasitol 27 434ndash441

38 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principles ofits regulation Nat Rev Mol Cell Biol 11 113ndash127

39 KozakM (1984) Selection of initiation sites by eucaryoticribosomes effect of inserting AUG triplets upstream from thecoding sequence for preproinsulin Nucleic Acids Res 123873ndash3893

40 SomersJ PoyryT and WillisAE (2013) A perspective onmammalian upstream open reading frame function Int JBiochem Cell Biol 45 1690ndash1700

41 SiegelTN TanKS and CrossGAM (2005) Systematic studyof sequence motifs for RNA trans splicing in Trypanosomabrucei Mol Cell Biol 25 9586ndash9594

42 AravaY WangY StoreyJD LiuCL BrownPO andHerschlagD (2003) Genome-wide analysis of mRNA translationprofiles in Saccharomyces cerevisiae Proc Natl Acad Sci USA100 3889ndash3894

43 CapewellP MonkS IvensA MacgregorP FennKWalradP BringaudF SmithTK and MatthewsKR (2013)Regulation of trypanosoma brucei total and polysomal mRNAduring development within its mammalian host PLoS One 8e67069

44 BrechtM and ParsonsM (1998) Changes in polysome profilesaccompany trypanosome development Mol Biochem Parasitol97 189ndash198

45 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

46 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

47 OhE BeckerAH SandikciA HuberD ChabaR GlogeFNicholsRJ TypasA GrossCA KramerG et al (2011)Selective ribosome profiling reveals the cotranslational chaperoneaction of trigger factor in vivo Cell 147 1295ndash1308

48 MichelAM and BaranovPV (2013) Ribosome profiling aHi-Def monitor for protein synthesis at the genome-wide scaleWiley Interdiscip Rev RNA 349 4184ndash4188

49 IngoliaNT (2010) Genome-wide translational profiling byribosome footprinting Methods Enzymol 470 119ndash142

50 IngoliaNT BrarGA RouskinS McGeachyAM andWeissmanJS (2012) The ribosome profiling strategy formonitoring translation in vivo by deep sequencing ofribosome-protected mRNA fragments Nat Protoc 7 1534ndash1550

51 LangmeadB and SalzbergSL (2012) Fast gapped-readalignment with Bowtie 2 Nat Methods 9 357ndash359

Nucleic Acids Research 2014 Vol 42 No 6 3635

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 14: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

52 AurrecoecheaC BarretoA BrestelliJ BrunkBP CadeSDohertyR FischerS GajriaB GaoX GingleA et al (2013)EuPathDB the eukaryotic pathogen database Nucleic Acids Res41 D684ndashD691

53 GuttmanM RussellP IngoliaNT WeissmanJS andLanderES (2013) Ribosome Profiling Provides Evidence thatLarge Noncoding RNAs Do Not Encode Proteins Cell 154240ndash251

54 AlsfordS and HornD (2008) Single-locus targeting constructsfor reliable regulated RNAi and transgene expression inTrypanosoma brucei Mol Biochem Parasitol 161 76ndash79

55 AndersS and HuberW (2010) Differential expression analysisfor sequence count data Genome Biol 11 R106

56 RobinsonMD McCarthyDJ and SmythGK (2010) edgeR aBioconductor package for differential expression analysis ofdigital gene expression data Bioinformatics 26 139ndash140

57 ButterF BuceriusF MichelM CicovaZ MannM andJanzenCJ (2013) Comparative proteomics of two life cyclestages of stable isotope-labeled Trypanosoma brucei reveals novelcomponents of the parasitersquos host adaptation machinery MolCell Proteomics 12 172ndash179

58 CoxJ NeuhauserN MichalskiA ScheltemaRA OlsenJVand MannM (2011) Andromeda a peptide search engineintegrated into the MaxQuant environment J Proteome Res 101794ndash1805

59 CoxJ and MannM (2008) MaxQuant enables high peptideidentification rates individualized ppb-range mass accuraciesand proteome-wide protein quantification Nat Biotechnol 261367ndash1372

60 WebbH BurnsR EllisL KimblinN and CarringtonM(2005) Developmentally regulated instability of the GPI-PLCmRNA is dependent on a short-lived protein factor Nucleic AcidsRes 33 1503ndash1512

61 MairG ShiH LiH DjikengA AvilesHO BishopJRFalconeFH GavrilescuC MontgomeryJL SantoriMI et al(2000) A new twist in trypanosome RNA metabolism cis-splicingof pre-mRNA RNA 6 163ndash169

62 BerrimanM GhedinE Hertz-FowlerC BlandinGRenauldH BartholomeuDC LennardNJ CalerEHamlinNE HaasB et al (2005) The genome of the Africantrypanosome Trypanosoma brucei Science 309 416ndash422

63 BakkerBM MichelsPA OpperdoesFR and WesterhoffHV(1997) Glycolysis in bloodstream form Trypanosoma brucei canbe understood in terms of the kinetics of the glycolytic enzymesJ Biol Chem 272 3207ndash3215

64 DanaA and TullerT (2012) Determinants of translationelongation speed and ribosomal profiling biases in mouseembryonic stem cells PLoS Comput Biol 8 e1002755

65 SaasJ ZiegelbauerK von HaeselerA FastB and BoshartM(2000) A developmentally regulated aconitase related to iron-regulatory protein-1 is localized in the cytoplasm and in themitochondrion of Trypanosoma brucei J Biol Chem 2752745ndash2755

66 MilleriouxY EbikemeC BiranM MorandP BouyssouGVincentIM MazetM RiviereL FranconiJMBurchmoreRJ et al (2013) The threonine degradation pathwayof the Trypanosoma brucei procyclic form the main carbonsource for lipid biosynthesis is under metabolic controlMol Microbiol 90 114ndash129

67 ArcherSK LuuVD de QueirozRA BremsS and ClaytonC(2009) Trypanosoma brucei PUF9 regulates mRNAs for proteinsinvolved in replicative processes over the cell cycle PLoS Pathog5 e1000565

68 MiaoJ LiJ FanQ LiX LiX and CuiL (2010) ThePuf-family RNA-binding protein PfPuf2 regulates sexualdevelopment and sex differentiation in the malaria parasitePlasmodium falciparum J Cell Sci 123 1039ndash1049

69 WhartonRP and AggarwalAK (2006) mRNA regulation byPuf domain proteins Sci STKE 2006 pe37

70 KramerS (2011) Developmental regulation of gene expression inthe absence of transcriptional control The case of kinetoplastidsMol Biochem Parasitol 181 61ndash72

71 WethmarK Barbosa-SilvaA Andrade-NavarroMA andLeutzA (2013) uORFdbmdasha comprehensive literature

database on eukaryotic uORF biology Nucleic Acids Res 42D60ndashD67

72 CalvoSE PagliariniDJ and MoothaVK (2009) Upstreamopen reading frames cause widespread reduction of proteinexpression and are polymorphic among humans Proc Natl AcadSci USA 106 7507ndash7512

73 HoodHM NeafseyDE GalaganJ and SachsMS (2009)Evolutionary roles of upstream open reading frames in mediatinggene regulation in fungi Annu Rev Microbiol 63 385ndash409

74 MatsuiM YachieN OkadaY SaitoR and TomitaM (2007)Bioinformatic analysis of post-transcriptional regulation by uORFin human and mouse FEBS Lett 581 4184ndash4188

75 IaconoM MignoneF and PesoleG (2005) uAUG and uORFsin human and rodent 50 untranslated mRNAs Gene 349 97ndash105

76 HelmJR WilsonME and DonelsonJE (2008) Different transRNA splicing events in bloodstream and procyclic Trypanosomabrucei Mol Biochem Parasitol 159 134ndash137

77 CambyI Le MercierM LefrancF and KissR (2006)Galectin-1 a small protein with major functions Glycobiology16 137Rndash157R

78 FletcherJC BrandU RunningMP SimonR andMeyerowitzEM (1999) Signaling of cell fate decisions byCLAVATA3 in Arabidopsis shoot meristems Science 2831911ndash1914

79 DingerME PangKC MercerTR and MattickJS (2008)Differentiating protein-coding and noncoding RNA challengesand ambiguities PLoS Comput Biol 4 e1000176

80 MorrisJC WangZ DrewME and EnglundPT (2002)Glycolysis modulates trypanosome glycoprotein expression asrevealed by an RNAi library EMBO J 21 4429ndash4438

81 AlsfordS TurnerDJ ObadoSO Sanchez-FloresAGloverL BerrimanM Hertz-FowlerC and HornD (2011)High-throughput phenotyping using parallel sequencing of RNAinterference targets in the African trypanosome Genome Res 21915ndash924

82 ManfulT CristoderoM and ClaytonC (2009) DRBD1 is theTrypanosoma brucei homologue of the spliceosome-associatedprotein 49 Mol Biochem Parasitol 166 186ndash189

83 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

84 HinnebuschAG (1993) Gene-specific translational control of theyeast GCN4 gene by phosphorylation of eukaryotic initiationfactor 2 Mol Microbiol 10 215ndash223

85 DeverTE FengL WekRC CiganAM DonahueTF andHinnebuschAG (1992) Phosphorylation of initiation factor2 alpha by protein kinase GCN2 mediates gene-specifictranslational control of GCN4 in yeast Cell 68 585ndash596

86 KongJ and LaskoP (2012) Translational control in cellularand developmental processes Nat Rev Genet 13 383ndash394

87 MoraesMC JesusTC HashimotoNN DeyMSchwartzKJ AlvesVS AvilaCC BangsJD DeverTESchenkmanS et al (2007) Novel membrane-bound eIF2alphakinase in the flagellar pocket of Trypanosoma brucei EukaryotCell 6 1979ndash1991

88 KramerS QueirozR EllisL WebbH HoheiselJDClaytonC and CarringtonM (2008) Heat shock causes adecrease in polysomes and the appearance of stress granules intrypanosomes independently of eIF2(alpha) phosphorylation atThr169 J Cell Sci 121 3002ndash3014

89 HuangHK YoonH HannigEM and DonahueTF (1997)GTP hydrolysis controls stringent selection of the AUG startcodon during translation initiation in Saccharomyces cerevisiaeGenes Dev 11 2396ndash2413

90 International Human Genome Sequencing Consortium (2004)Finishing the euchromatic sequence of the human genomeNature 431 931ndash945

91 GardnerMJ HallN FungE WhiteO BerrimanMHymanRW CarltonJM PainA NelsonKE BowmanSet al (2002) Genome sequence of the human malaria parasitePlasmodium falciparum Nature 419 498ndash511

92 WoodV RutherfordKM IvensA RajandreamMA andBarrellB (2001) A re-annotation of the Saccharomyces cerevisiaegenome Comp Funct Genomics 2 143ndash154

3636 Nucleic Acids Research 2014 Vol 42 No 6

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637

Page 15: Comparative ribosome profiling reveals extensive ... · Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages

93 BerrettaJ and MorillonA (2009) Pervasive transcriptionconstitutes a new level of eukaryotic genome regulationEMBO Rep 10 973ndash982

94 KapranovP ChengJ DikeS NixDA DuttaguptaRWillinghamAT StadlerPF HertelJ HackermullerJHofackerIL et al (2007) RNA maps reveal new RNA classesand a possible function for pervasive transcription Science 3161484ndash1488

95 YangX TschaplinskiTJ HurstGB JawdyS AbrahamPELankfordPK AdamsRM ShahMB HettichRLLindquistE et al (2011) Discovery and annotation of smallproteins using genomics proteomics and computationalapproaches Genome Res 21 634ndash641

96 SlavoffSA MitchellAJ SchwaidAG CabiliMN MaJLevinJZ KargerAD BudnikBA RinnJL and

SaghatelianA (2013) Peptidomic discovery of short open readingframe-encoded peptides in human cells Nat Chem Biol 959ndash64

97 KastenmayerJP NiL ChuA KitchenLE AuWCYangH CarterCD WheelerD DavisRW BoekeJD et al(2006) Functional genomics of genes with small open readingframes (sORFs) in S cerevisiae Genome Res 16 365ndash373

98 GalindoMI PueyoJI FouixS BishopSA and CousoJP(2007) Peptides encoded by short ORFs control developmentand define a new eukaryotic gene family PLoS Biol 5 e106

99 KondoT PlazaS ZanetJ BenrabahE ValentiPHashimotoY KobayashiS PayreF and KageyamaY (2010)Small peptides switch the transcriptional activity ofShavenbaby during Drosophila embryogenesis Science 329336ndash339

Nucleic Acids Research 2014 Vol 42 No 6 3637