gene order in a 10 275 bp fragment of yarrowia lipolytica, including adjacent ylura5 and ylsec65...
TRANSCRIPT
Yeast Sequencing Report
Gene order in a 10 275 bp fragment of Yarrowialipolytica, including adjacent YlURA5 and YlSEC65genes conserved in four yeast species
Manuel Sanchez and Angel Domınguez*Departamento de Microbiologıa y Genetica, Instituto de Microbiologıa Bioquımica/CSIC, Universidad de Salamanca, 37071 Salamanca, Spain
*Correspondence to:A. Dominguez, Departamento deMicrobiologıa y Genetica, Institutode Microbiologıa Bioquımica/CSIC,Universidad de Salamanca,37071 Salamanca, Spain.E-mail: [email protected]
Received: 23 December 2000
Accepted: 10 February 2001
Abstract
We have determined the sequence of a 10275 bp DNA segment of Yarrowia lipolyticalocated on chromosome VI. The sequence contains six complete open reading frames
(ORFs) longer than 100 amino acids and two more partial ORFs at both ends. Two of the
ORFs encode for the well-characterized genes YlURA5 (orotate phosphoribosyl-
transferase) and YlSEC65 (encoding a subunit of the signal recognition particle). These
two genes show an identical organization—located on opposite strands and in opposite
orientations—in four yeast species: Saccharomyces cerevisiae, Kluyveromyces lactis,
Candida albicans and Y. lipolytica. One ORF and the two partial ORFs code for putative
proteins showing significant homology with proteins from other organisms. YlVI-108w
(partial) and YlVI-103w show 39% and 54% identity, respectively, with YDR430c and
YHR088w from S. cerevisiae. YlVI-102c (partial) shows significant homology with a
matrix protein, lustrin A from Haliotis rufescens, and with the PGRS subfamily (Gly-rich
proteins) of Mycobacterium tuberculosis. The three remaining ORFs show weak or non-
significant homology with previously sequenced genes. The nucleotide sequence has been
submitted to the EMBL database under Accession No. AI006754. Copyright # 2001 John
Wiley & Sons, Ltd.
Keywords: Yarrowia lipolytica; Saccharomyces cerevisiae; Kluyveromyces lactis;
Candida albicans; gene order; genome organization
Introduction
Completion of the Saccharomyces cerevisiae genomesequence (Goffeau et al., 1996), together with theextensive work carried out on the systematicsequencing of the genomes of another two yeasts(Schizosaccharomyces pombe and Candida albicans),has opened the possibility of analysing to whatextent gene order is conserved in yeast genomes.
Another two yeasts, Yarrowia lipolytica andKluyveromyces lactis, which are quite divergentfrom the evolutionary point of view (Barns et al.,1991) but amenable to classic and molecular geneticstudies, are currently under extensive research byseveral European groups within the framework ofthe Biotech Programme (Cell Factory Area, Grienglet al., 1999). The total haploid genome size is21–22 Mb for Y. lipolytica (Casaregola et al., 1997)
and 12 Mb for K. lactis (Wesolowski-Louwel andFukuhara, 1995). The electrophoretic patterns ofchromosomal DNA suggest that both yeast speciescontain six DNA bands (numbered I–VI, fromsmallest to largest) and about 150 genes have beencloned and located physically in each yeast (Weso-lowski-Louvel et al., 1996; Casaregola et al., 1997;Domınguez, A., unpublished).
Several authors have reported conservation ofgene order in ascomycete fungi, e.g. S. cerevisiaeand Saccharomyces douglasii (Adjiri et al., 1994);S. cerevisiae and K. lactis (Wesolowski-Louvel andFukuhara, 1995; Bai et al., 1999); S. cerevisiaeand Ashbya gossypii (Attman-Johl and Philippsen,1996) and S. cerevisiae and C. albicans (Hartunget al., 1998). Extensive analyses aimed at showingthe order of genes along chromosomes amongS. cerevisiae and other yeast species by comparing
YeastYeast 2001; 18: 807–813.DOI: 10.1002/yea.735
Copyright # 2001 John Wiley & Sons, Ltd.
data from DNA sequences contained in the data-bases have been performed by Keogh et al. (1998).The global degree of synteny for the conservation,or non-conservation, of neighbouring gene couplesbetween S. cerevisiae and 13 hemiascomycetousyeasts species has also recently been described(Llorente et al., 2000). Both studies have shownthat the extent of gene order conservation decreaseswith increasing evolutionary distance. However, thecomparative analysis of genomic DNA has beenevaluated on the basis of the sequences of smallDNA fragments (500 bp at both ends). To furtherextend the data on gene order, sequence compar-ison, genome compactness and analysis of theterminator–promoter environment among yeastspecies, we have chosen a different approach: thesequencing of larger fragments of DNA (8–16 kb)from several yeast species. This study reports thesequence of a 10.2 kb Y. lipolytica region containingin its central part two genes previously character-ized by us: YlURA5 and YlSEC65 (Sanchez et al.,1995, 1997). The two genes are adjacent, with aninverted orientation. This distribution is conservedin four yeast species (S. cerevisiae, K. lactis, C.albicans and Y. lipolytica).
Materials and methods
Plasmids and strains
The YlURA5 and YlSEC65 genes were isolated asplasmid pMP47 (Sanchez et al., 1995). This plasmidcontains a 10.2 kb insert from a DNA libraryconstructed from Y. lipolytica W29 strain partiallydigested with Sau3A and cloned in the BamHI siteof the pINA62 vector (Xuan et al., 1988). TheEscherichia coli strain used as host for transfor-mation and amplification of plasmids was DH5asupE44 DlacU169(ø80 lacZ DM15)hsdR17 recA1endA1 gyrA96 thi-1 relA1 (Hanahan, 1983). E. colitransformants were selected on LB media supple-mented with 100 mg/l ampicillin.
Manipulation of nucleic acids
Routine DNA manipulations, Southern blotting,restriction enzyme digestions, agarose gel electro-phoresis and E. coli transformation were performedaccording to standard techniques (Sambrook et al.,1989). Plasmid preparations were carried out usingWizard miniprep columns (Promega).
Sequencing strategy
The sequence was determined using universal andreverse primers with the ABI377 automatic sequen-cer (Applied Biosystems Inc.) using the Taq DyeDeoxy Terminator Cycle Sequencing Kit as sup-plied by the manufacturer. Junctions were sequen-ced with walking primers using the entire plasmid.The quality of the final sequence was ensured byvisual inspection of the sequencing profiles at eachposition on each DNA strand. The sequence wasconsidered final only when an unambiguous readingof each nucleotide on each strand was achieved.
Software used
Walking primers were designed using the DNAsiscomputer program (Pharmacia Biotech). Assemblyof the sequences was accomplished with theSeqMan program of the DNASTAR programmepackage (DNASTAR Ltd.). ORFs were predictedusing DNA Strider software (Marck, 1988). Foreach ORF the first ATG was assumed to be theinitiation codon. ORFs were named with the prefixYl (Yarrowia lipolytica), the chromosome number(VI) and the MIPS working nomenclature forS. cerevisiae. Searches for homologies were doneusing the BLAST (Altschul et al., 1997) or FASTA(Pearson and Lipman, 1988) programmes. Multiple-sequence alignments were obtained using theCLUSTAL programme (Thompson et al., 1994) orPILEUP (GCG package).
Results and discussion
Plasmid pMP47, containing an insert of 10275 bpfrom chromosome VI of Y. lipolytica, was sequen-ced. A search for coding regions revealed six clearORFs (two of them partial) longer than 100 aminoacids and two more overlapping fragments(Figure 1). The major characteristics of the ORFsare listed in Table 1. The most upstream ATGcodon was arbitrarily considered as the initiationcodon. The ORFs, considering YlVI-104c, occupy69.5% of the complete sequence, a value slightlylower than the 72% described for S. cerevisiae(Dujon, 1996) but higher than those obtained forSz. pombe (54–59%; Sanchez et al., 1999; Xianget al., 2000). The sequence region has an overallG+C content of 48.5%, in good agreement withpreviously reported data (49.6–51.7%; Nakase and
808 M. Sanchez and A. Domınguez
Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 807–813.
Komagata, 1971; Kurtzman and Fell, 1998; Kucket al., 1980). The coding region alone (including theYlVI-104c) has a slightly higher G+C content(51.7%).
ORFs and ORF products
The YlVI-108w ORF is truncated at its 5k end andis located at one end of the genomic fragmentcloned into pM47 (Figure 1). Therefore, thededuced amino acid product is only 780 resi-dues long. However, this portion is sufficient todetect 39.4% identity with the YDR430cp of
S. cerevisiae (aminoacyl-transfer RNA synthetase
class-I signature).The deduced protein product of YlVI-107c is 382
residues long and shows no significant homology
with known proteins or ESTs. ORFs YlVl-104c
and YlVI-104w represent partially antiparallel over-
lapped ORFs. Both have lower G+C contents than
average (46.98% and 47.55%, respectively). Neither
the predicted products nor homologies were found
in databases, raising the question of their biological
significance. YlVI-103w encodes a putative protein
of 333 amino acids in length. FastA analysis
(EMBL database) revealed 63.44% identity with
Figure 1. DNA Strider plot showing the ORFs in the six possible frames. ATG codons are represented by half-height verticalbars and stop codons by full-height bars. In the lower part, arrows indicate positions and directions of the ORFs on the twostrands. WSc and CSc are Watson and Crick strands in S. cerevisiae
Table 1. Characteristics and homologies of ORFs and deduced amino acid sequences of the 10.2 kb fragment
ORF name Coordinates
Strand
orientation*
Length
(aa)
Molecular
mass (kDa) Homologies
FastA scores
Initn Init1 optn
YlVI-108W ?–2343 W – – Similar to S. cerevisiae YDR430C 1541 520 1941
YlVI-107C 2910–4058 C 382 43107 No similarity found – – –YlVI-106W 4549–5208 W 219 23667 S. cerevisiae URA5 782 656 807
YlVI-105C 5471–6403 C 310 35464 S. cerevisiae SEC65 368 253 372
YlVI-104C 6803–7186 C 127 14521 No similarity found – – –
YlVI-104W 6640–7011 W 123 13556 No similarity found – – –YlVI-103W 7755–8756 W 333 39329 Similar to S. cerevisiae YHR088W 1232 946 1263
YlVI-102C 9519–? C – – Similar to Lustrin A 471 471 555
*W=Watson strand; C=Crick strand.
The URA5–SEC65 region of ascomycetous yeasts 809
Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 807–813.
S. cerevisiae YHR088wp, a protein of unknown
function (Figure 2). The Y. lipolytica protein
appeared to be 38 amino acids longer than the
S. cerevisiae protein. Some of the Y. lipolytica
proteins are longer than their S. cerevisiae coun-
terparts [i.e. those encoded by YlSEC14 (Lopez
et al., 1994) and YlSEC65 (Sanchez et al., 1997)]. In
YlVI-103w, two methionines are located at the
N-terminal region (Figure 2, bold and underlined),
the second at amino acid 40. If transcription starts
at this point, a protein of 239 amino acids with a
size more similar to that of S. cerevisiae will be
obtained.YlVI-102c is also truncated at its 5k end
(Figure 1). The deduced amino acid product is 257
residues long. The protein shows a moderate degree
of homology with lustrin A (a matrix protein of
1482 amino acids from shell and pearl nacre of
Haliotis rufescens; Shen et al., 1997) and with
Rv3507 (a protein of 1381 amino acids), a mem-
ber of the glycine-rich PGRS subfamily of the
Mycobacterium tuberculosis PE protein family (Cole
et al., 1998). However, in both cases the similarity
is restricted to a GS-rich region located in the
C-terminal region and hence the N-terminalsequence would be necessary if more accurateconclusions are to be drawn.
Genomic URA5–SEC65 regions of Y. lipolytica,S. cerevisiae, K. lactis and C. albicans
Analysis of the DNA fragments harbouring theURA5–SEC65 genes revealed that both ORFs arearranged in the same order and orientation in thefour ascomycetes (Figure 3). The sizes of the genesare similar. The level of protein identity between thepairs varies (81.9–56.6% for URA5, 59.7–23.3% forSEC65). In Y. lipolytica and C. albicans, both genesare located on the largest chromosome (VI and R,respectively). In S. cerevisiae, they are locatedon chromosome XIII, while their chromosomallocation is unknown in K. lactis. Our results offerone of the best examples of the conservation ofsynteny and gene orientation between differentand yeast species. The lack of homology of threeY. lipolytica genes (out of eight) with the genesequences described in the databases are in goodagreement with the only 1187 genes identified after
Figure 2. Predicted amino acid sequences of the proteins YHR088w of S. cerevisiae and YlVI-103w of Y. lipolytica. Gaps havebeen introduced to give the best alignment. Identical residues (asterisks) and conservative amino acid substitutions (dots) areindicated
810 M. Sanchez and A. Domınguez
Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 807–813.
analysing 4940 Y. lipolytica random sequence tags
(Casaregola et al. 2000).In all cases the highest identities were obtained
between the two most closely related yeasts
(S. cerevisiae and K. lactis), the data being con-
sistent with their phylogenetic position (Figure 4;
Barns et al., 1991, Bolotin-Fukuhara et al., 2000,
Llorente et al., 2000). The intergenic regions do not
display many significant sequence similarities.
Moreover, the sizes of the intergenic regions of
Y. lipolytica are similar to those of the other three
yeasts. These observations raise an interesting
question about genome organization in this yeast.The size of the Y. lipolytica genome has been
estimated to be 21–22 Mb (Casaregola et al., 1997).
Taking into account a rDNA cluster size of 3 Mb
(slightly overestimated, Casaregola et al., 1997), we
obtain a size of 18 Mb.Assuming that the coding region occupies 63%
of the total (average 60–69.5%; this work;
Domınguez et al., in preparation); thus, 11.3 Mb
remain, which is 2.3 Mb or 3.7 Mb larger than
those of S. cerevisiae (9.0 Mb, 12.6r72%; Dujon,
1996) or Sz. pombe (7.8 Mb, 13.6r55%; Sanchez
et al., 1999; Xiang et al., 2000). Only a few Y.
lipolytica genes have introns (Barth and Gaillardin,
Figure 3. Map of the Y. lipolytica loci of URA5–SEC65 and its comparison with. S. cerevisiae, C. albicans and K. lactis. The ORFsand their orientation are represented by arrows. Thin lines represent intergenic regions
Figure 4. Phylogenetic tree of URA5 (A) and SEC65 (B)proteins from Y. lipolytica, S. cerevisiae, C. albicans and K. lactis,using the MegAlign document
The URA5–SEC65 region of ascomycetous yeasts 811
Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 807–813.
1996) and to date no chromosomal duplicationshave been described. Establishing a theoreticalstandard size of 500 amino acids per gene, theextra 2.3 Mb implies that Y. lipolytica has1500–2000 genes more than S. cerevisiae. Whetherthis reflects a higher metabolic capacity of Y.lipolytica or is a consequence of its divergentphylogenetic position remains to be elucidated.
We have also analysed gene order conservationin the flanking regions of the URA5–SEC65 genesbut the extent of linkage conservation falls off. InY. lipolytica the two nearest putative ORFs donot show similarity to any known sequence. Thenext two genes, YlVI-130w and YlVI-108w, showgood consensus with the S. cerevisiae genesYHRO88w and YDR430c (Table 1), located onchromosomes VIII and IV, and not on chromosomeXIII (where the URA5–SEC65 gene pair is located).In C. albicans and K. lactis, no ORFs have beendescribed in the 5k region upstream from the URA5gene (1600 and 617 bp, respectively). In C. albicansthe CaCDC4 gene is located 425 bp in the flankingregion of the CaSEC65 gene. The S. cerevisiaeCDC4 gene is located on chromosome IV. Finally,in Sz. pombe the URA5 and SEC65 homologues arelocated on different chromosomes (II and III,respectively).
Several analyses have been carried out on thedegree of synteny and gene order conservationbetween related yeast species (Keogh et al., 1998;Ozier-Kalogeropoulos et al., 1998; Sychrova et al.,2000; Llorente et al., 2000), but all of them haverelied on the comparison of small DNA fragments.Our results suggest that in order to understandgenome evolution at the chromosomal level, moreclosely related organisms, or at least an entirechromosome from some of them, must be fullysequenced.
Acknowledgement
We wish to thank N. Skinner for revising the English version
of this manuscript. This work was partially supported by a
grant from the EU (BIO4-CT96-0003).
References
Adjiri A, Chanet R, Mezard C, Fabre F. 1994. Sequence
comparison of the ARG4 chromosomal regions from the two
related yeasts, Saccharomyces cerevisiae and Saccharomyces
douglasii. Yeast 10: 309–317.
Altmann-Johl R, Philippsen P. 1996. AgTHR4, a new selection
marker for transformation of the filamentous fungus Ashbya
gossypii, maps in a four-gene cluster that is conserved between
A. gossypii and Saccharomyces cerevisiae. Mol Gen Genet 250:
69–80.
Altschul SF, Madden TL, Schaffer AA, et al. 1997. Gapped
BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acids Res 25: 3389–3402.
Bai X, Larsen M, Meinhardt F. 1999. The URA5 gene encoding
orotate-phosphoribosyl transferase of the yeast Kluyveromyces
lactis: cloning sequencing and use as a selectable marker. Yeast
15: 1393–1398.
Barns SM, Lane DJ, Sogin ML, Bibeau C, Weisburg WG. 1991.
Evolutionary relationships among pathogenic Candida species
and relatives. J Bacteriol 173: 2250–2255.
Barth G, Gaillardin C. 1996. Yarrowia lipolytica. In Non-
conventional Yeasts in Biotechnology, Wolf K (ed.). Springer:
Berlin; 313–388.
Bolotin-Fukuhara M, Toffano-Nioche C, Artiguenave F, et al.
2000. Genomic exploration of the hemiascomycetous yeasts:
11. Kluyveromyces lactis. FEBS Lett 487: 66–70.
Casaregola S, Feynerol C, Diez M, Fournier P, Gaillardin C.
1997. Genomic organization of the yeast Yarrowia lipolytica.
Chromosoma 106: 380–390.
Casaregola S, Neuveglise C, Lepingle A, et al. 2000. Genomic
exploration of the hemiascomycetous yeasts: 17. Yarrowia
lipolytica. FEBS Lett 487: 95–100.
Cole ST, et al. 1998. Deciphering the biology of Mycobacterium
tuberculosis from the complete genome sequence. Nature 393:
537–544.
Dujon B. 1996. The yeast genome project: what did we learn?
Trends Genet 12: 263–270.
Goffeau A, Barrell BG, Bussey H, et al. 1996. Life with 6000
genes. Science 274: 563–567.
Griengl H, Steiner W, Preisz A, Keil C (eds). 1999. Yeast as
protein factories: control of host physiology and exploration of
novel resources. Cell Factory Area within the Biotechnology
Programme: European Commission Research. Technical
University Graz: Graz; 246–252.
Hanahan D. 1983. Studies on transformation of Escherichia coli
with plasmids. J Mol Biol 166: 557–580.
Hartung K, Frishman D, Hinnen A, Wolfz S. 1998. Single-read
sequence tags of a limited number of genomic DNA fragments
provide an inexpensive tool for comparative genome analysis.
Yeast 14: 1327–1332.
Keogh RS, Seoighe C, Wolfe KH. 1998. Evolution of gene order
and chromosome number in Saccharomyces, Kluyveromyces
and related fungi. Yeast 14: 443–457.
Kuck V, Stahl U, Lhermitte A, Esser K. 1980. Isolation and
characterization of mitochondrial DNA from the alkane yeast
Yarrowia lipolytica. Curr Genet 2: 97–101.
Kurtzman CP, Fell JW (eds). 1998. The Yeasts: A Taxonomic
Study. Elsevier Science: Amsterdam.
Llorente B, Malpertuy A, Neuveglise C, et al. 2000. Genomic
exploration of the hemiascomycetous yeasts: 18. Comparative
analysis of chromosome maps and synteny with Saccharo-
myces cerevisiae. FEBS Lett 487: 101–112.
Lopez MC, Nicaud JM, Skinner HB, et al. 1994. A phospha-
tidylinositol/phosphatidylcholine transfer protein is required
for the differentiation of the dimorphic yeast Yarrowia
812 M. Sanchez and A. Domınguez
Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 807–813.
lipolytica from the yeast to the mycelial form. J Cell Biol 125:
113–127.
Marck C. 1988. ‘DNA Strider’: a ‘C’ program for the fast
analysis of DNA and protein sequences on the Apple
Macintosh family of computers. Nucleic Acids Res 16:
1829–1836.
Nakase T, Komagata K. 1971. Signification of DNA base
composition in the classification of yeast genes: Candida.
J Gen Appl Microbiol Tokyo 17: 259–279.
Ozier-Kalogeropoulus O, Malpertuy A, Boyer J, Tekaia F,
Dujon B. 1998. Random exploration of the Kluyveromyces
lactis genome and comparison with that of Saccharomyces
cerevisiae. Nucleic Acids Res 23: 5511–5524.
Pearson WR, Lipman DJ. 1988. Improved tools for biological
sequence comparison. Proc Natl Acad Sci U S A 85:
2444–2448.
Sambrook J, Fritsch E, Maniatis T (eds). 1989. Molecular
Cloning: A Laboratory Manual. Cold Spring Harbor Labora-
tory Press: New York.
Sanchez M, Beckerich J-M, Gaillardin C, Domınguez A. 1997.
Isolation and cloning of the Yarrowia lipolytica SEC65 gene, a
component of the yeast signal recognition particle displaying
homology with the human SRP19 gene. Gene 203: 75–84.
Sanchez M, Prado M, Iglesias FJ, Domınguez A. 1995. Cloning
and sequencing of the URA5 gene form the yeast Yarrowia
lipolytica. Yeast 11: 425–433.
Sanchez M, del Rey F, Domınguez A, Moreno S, Revuelta JL.
1999. DNA sequencing and analysis of a 40 kb region from the
right arm of chromosome II from Schizosaccharomyces pombe.
Yeast 15: 419–426.
Shen X, Belcher AM, Hansma PK, Stucky GD, Morse DE. 1997.
Molecular cloning and characterization of lustrin A, a matrix
protein from shell and pearl nacre of Haliotis rufescens. J Biol
Chem 272: 32472–32481.
Sychrova H, Braun V, Potier S, Souciet J-L. 2000. Organization
of specific genomic regions of Zygosaccharomyces rouxii and
Pichia sorbitophila: comparison with Saccharomyces cerevisiae.
Yeast 16: 1377–1385.
Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTAL W:
improving the sensitivity of progressive multiple sequence
alignment through sequence weighting, position-specific gap
penalties and weight matrix choice. Nucleic Acids Res 22:
4673–4680.
Wesolowski-Louvel M, Breuning KD, Fukuhara H. 1996.
Kluyveromyces lactis. In Non-conventional Yeasts in Biotechnol-
ogy, Wolf K (ed.). Springer: Berlin; 139–201.
Wesolowski-Louvel M, Fukuhara H. 1995. A map of the
Kluyveromyces lactis genome. Yeast 11: 211–218.
Xiang Z, Moore K, Wood V, et al. 2000. Analysis of 114 kb of
DNA sequence from fission yeast chromosome 2 immediately
centromere-distal to his5. Yeast 16: 1405–1411.
Xuan JW, Fournier P, Gaillardin C. 1988. Cloning of the LYS5
gene encoding saccharopine dehydrogenase from the yeast
Yarrowia lipolytica by target integration. Curr Genet 14: 15–21.
The URA5–SEC65 region of ascomycetous yeasts 813
Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 807–813.