articles sequenceandanalysisofchromosome2 of the plant ... · 2/20/2012 · articles...
TRANSCRIPT
© 1999 Macmillan Magazines LtdNATURE | VOL 402 | 16 DECEMBER 1999 | www.nature.com 761
articles
Sequence and analysis of chromosome 2of the plant Arabidopsis thalianaXiaoying Lin*, Samir Kaul*, Steve Rounsley², Terrance P. Shea², Maria-Ines Benito, Christopher D. Town, Claire Y. Fujii, Tanya Mason,Cheryl L. Bowman, Mary Barnstead, Tamara V. Feldblyum, C. Robin Buell, Karen A. Ketchum, John Lee, Catherine M. Ronning,Hean L. Koo, Kelly S. Moffat, Lisa A. Cronin, Mian Shen, Grace Pai, Susan Van Aken, Lowell Umayam, Luke J. Tallon, John E. Gill,Mark D. Adams², Ana J. Carrera, Todd H. Creasy, Howard M. Goodman², Chris R. Somerville², Greg P. Copenhaver², Daphne Preuss²,William C. Nierman, Owen White, Jonathan A. Eisen, Steven L. Salzberg, Claire M. Fraser & J. Craig Venter²
The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA
* These authors contributed equally to this work............................................................................................................................................................................................................................................................................
Arabidopsis thaliana (Arabidopsis) is unique among plant model organisms in having a small genome (130±140 Mb), excellentphysical and genetic maps, and little repetitive DNA. Here we report the sequence of chromosome 2 from the Columbia ecotype intwo gap-free assemblies (contigs) of 3.6 and 16 megabases (Mb). The latter represents the longest published stretch ofuninterrupted DNA sequence assembled from any organism to date. Chromosome 2 represents 15% of the genome and encodes4,037 genes, 49% of which have no predicted function. Roughly 250 tandem gene duplications were found in addition to large-scaleduplications of about 0.5 and 4.5 Mb between chromosomes 2 and 1 and between chromosomes 2 and 4, respectively. Sequencingof nearly 2 Mb within the genetically de®ned centromere revealed a low density of recognizable genes, and a high density anddiverse range of vestigial and presumably inactive mobile elements. More unexpected is what appears to be a recent insertion of acontinuous stretch of 75% of the mitochondrial genome into chromosome 2.
Arabidopsis thaliana (thale cress), a member of the mustard family,is a dicotyledenous ¯owering plant with a diploid number of 10 thathas become a widely used model for the study of plant biologybecause of its small size, short generation time, facile genetics andease of transformation. Whole-genome analysis began ten years agowith the production of cosmid and yeast arti®cial chromosome(YAC)-based physical maps1, while early large-scale sequencingefforts focused on ordered cosmids2. On August 20±21 1996,representatives of six research groups from Japan, Europe and theUSA met in Washington DC to discuss strategies for facilitatinginternational cooperation in completing the sequencing of theArabidopsis genome (the Arabidopsis Genome Initiative (AGI)) bythe year 2004. The group agreed on a workplan to ensure that thesequencing of all ®ve chromosomes was completed except for thedif®cult to sequence repetitive regions: the nucleolar organizerregions (NORs) and centromeres (http://genome-www.stanford.edu/Arabidopsis/AGI/AGI_memo.html)3.
The main molecular and cytological features of the A. thalianachromosomes had been de®ned by the time high-throughputsequencing began (Fig. 1). The NORs in A. thaliana are found onthe upper ends of chromosomes 2 and 4. Each region, 3.5±4 Mb inlength, consists of an array of tandemly repeated ribosomal RNAgenes4. Using a combination of restriction analysis, polymerasechain reaction (PCR) and sequencing, it has been shown that therDNA of the NOR on chromosome 4 immediately abuts thetelomeric repeats with less than 500 base pairs (bp) of interveningsequence5. Restriction analysis is consistent with a similar arrange-ment on chromosome 2, with the rDNA terminating at a differentpoint within the repeat unit. By contrast, the genomic organizationof A. thaliana centromeres is not well characterized. These regionsare heterochromatic6 and contain large tandem arrays of a family of180-bp elements7±9 that are reminiscent of the 170-bp alphoid
repeats found in primate centromeres10. Large blocks of DNAconsisting primarily of 180-bp repeats have been characterized by2D gel electrophoresis and assigned to speci®c centromeres on thebasis of restriction-fragment-length polymorphisms11. The 180-bprepeat block on chromosome 2 is 820±830 kb in size11, which shouldbe considered as a minimum estimate for the size of theunsequenced centromeric region on this chromosome.
Features of chromosome 2Chromosome 2 is acrocentric and was originally estimated to be13.5 Mb in length from the YAC-based physical map12. Sequencingwas initiated using bacterial arti®cial chromosomes (BACs) thathad been placed onto the physical map of chromosome 2 byhybridization to YACs and genetic markers. Subsequently, BACend sequences and BAC ®ngerprint data13 allowed extension fromthese initial seed points and completion of the entire chromosome.A total of 257 BAC and P1 clones (including 5 BACs completed byother groups) were sequenced to produce over 24 Mb of ®nishedsequence, which has been assembled as two contigs terminating inblocks of 180-bp repeats. These repeats represent the inner bound-aries of our ®nished sequence.
The upper (short) arm, measured from the lower end of the NORto the centromeric 180 bp repeats, is 3.6 Mb. The northern-mostBAC (F23H14) in this contig contains a single rDNA repeat unitthat is oriented in the same direction as the telomere-proximalrepeat, suggesting that all the rDNA units are arranged in a head-to-tail fashion running 59 to 39 from the telomere towards thecentromere14. Immediately adjoining the rDNA is about 60 kb ofhighly repetitive sequence containing many transposons. The lower(long) arm from centromere to telomere is 16 Mb. The sequence ofthe southern-most BAC in this contig (F11L15) is joined by a smallPCR fragment to the A. thaliana sequence in pAtT51, a telomere-containing clone previously mapped to TEL2S15 (http://genome-www.stanford.edu/Arabidopsis/ww/Nov98RImaps/index.html).Within this region, a putative transcriptional co-activator gene wasidenti®ed. Because pAtT51 was derived from the Landsberg eco-type, the last 2.5 kb of this contig may not be identical to thesequence in the Columbia ecotype.
² Present addresses: Cereon Genomics, 270 Albany Street, Cambridge, Massachusetts 02139, USA (S.R.,T.P.S.); Celera Genomics Corporation, 45 West Gude Drive, Rockville, Maryland 20850, USA (M.D.A.,
J.C.V.); Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachussets 02114,
USA (H.M.G.); Department of Plant Biology, Carnegie Institution of Washington, 260 Panama St.,
Stanford, California 94305, USA (C.R.S.); Department of Molecular Genetics and Cell Biology, University
of Chicago, Chicago, Illinois 60637, USA (G.P.C., D.P.).
© 1999 Macmillan Magazines Ltd
At 19.6 Mb, the overall length of chromosome 2 (excluding theNOR and centromere) is 45% larger than the original estimate. Thisdifference is due in part to gaps in the YAC-based physical map andto a deletion in one of the YACs (see Fig. 1). Similar increases in theactual physical lengths of the other A. thaliana chromosomesindicate that the genome size is actually 130±140 Mb, rather thanthe earlier estimates of 70±100 Mb.
Using sequence information for 37 markers from the ArabidopsisRecombinant Inbred genetic map (http://genome-www.stanford.edu/Arabidopsis/ww/Nov98RImaps/index.html), the relationshipbetween physical and genetic distances along the chromosomewas examined. As the centromere represents a discontinuity,regression analysis for the two arms was performed separately.The ratios of physical to genetic distance on the short arm andlong arm were not signi®cantly different, with values of 244 kb cM-1
and 223 kb cM-1, respectively (Fig. 2a). Although these two valuesindicate a similar overall rate of recombination along the two armsof the chromosome, local distortions are evident which may re¯ectchromosomal rearrangements between the two ecotypes used toconstruct the mapping population (Fig. 2b).
Gene contentUsing a combination of gene prediction programs and databasesearches, the chromosome was annotated (Fig. 3 (facing page 768);and Table 1). Statistics of the chromosomal composition are listedin Table 2, along with corresponding data from other eukaryoticgenome projects. A graphical distribution of various features isshown in Fig. 4. Of the 4,037 genes identi®ed, 2,078 (51.5%) wereassigned either a de®nite or putative function, 865 (21.4%) werelabelled as unknown genes and 1,094 genes (27.1%) were designatedas hypothetical. In addition, 400 pseudogenes, most of which arerelated to proteins found in retrotransposons and are located nearthe centromere, and 79 genes encoding structural RNAs (73 tRNA, 4tRNA and 2 small nucleolar (snRNA)) were found.
On average, a gene occurs every 4.4 kb, is about 2 kb in length(from start to stop codon) and contains 4.6 exons. More than 50%of the chromosome is therefore intergenic sequence. Twenty-threeper cent of the genes consist of a single exon, whereas the largest
gene (T20F21.17) contains 52 predicted exons and encodes aprotein that is 50% identical to the human ch-TOG protein16. Atleast 1,352 (33.5%) genes are expressed, as re¯ected by the presenceof a matching complementary DNA or expressed sequence tag(EST) entry in GenBank. Genes that are most highly representedby ESTs include a glycine-rich RNA-binding protein, aquaporins,ribulose bisphosphate carboxylase activase, a chlorophyll a/bbinding protein and lipid transfer proteins. The distribution ofgenes and ESTs along the chromosome is shown in Fig. 4b, c.
Classi®cation by biological role or biochemical function of theknown and putative genes shows that most major cellular processesappear to be represented, with genes involved in regulatory func-tions and signal transduction (DNA-binding proteins/transcriptionfactors and protein kinases) comprising the largest functionalgroups (Table 1). These homology-based assignments are consistentwith results from searching the hidden Markov models (HMMs) ofprotein domains in the pFam database17. Of the 4,037 genespredicted to encode a protein, 1,436 match at least one of the 434distinct pFam HMMs (22% of the total HMMs). The mostfrequently seen matches are to the LRR (leucine-rich repeat, 239matches), the pkinase (eukaryotic protein kinase, 161 matches) andthe zf-C3HC4 (zinc ®nger, C3HC4 type, 46 matches) HMMs. Thecellular destinations of the proteins were also predicted. Over 17%(698) contained signal peptides, whereas over 10% (416) werepredicted to be organelle-targeted (135 for the chloroplast and281 for the mitochondrion). In addition, more than 16% of theproteins are probably located in cellular membranes, with 665sequences predicted to have 4 or more transmembrane segments(TMSs). Furthermore, about 25% of the proteins with 1, 2 or 3predicted TMSs (1,438) are predicted to be localized to themembrane.
Gene and chromosomal duplicationsChromosome 2 contains many duplicated genes. More than 60% of
articles
762 NATURE | VOL 402 | 16 DECEMBER 1999 | www.nature.com
cMMarker
49< er
19< mi421
69< ve018
87< mi79a
3.5< kk1
Telomere
Centromere
Telomere
NOR
∆
{
Figure 1 Features of Arabidopsis thaliana chromosome 2. The chromosome is roughly
23 Mb long (including the NOR), spans a genetic map distance of 90 cM and contains
15% of the unique sequence of the A. thaliana genome. The locations of a number of
genetic markers are shown. The original YAC contigs are represented by the orange line,
and the location of the deletion noted in the text is shown (D).
2,500
500
400
300
200100
2,0001,5001,000
20,000
16,000
12,000
8,000
4,000
b
0 40 60 80 10020
0 40 60 80 10020
Genetic map distance (cM)Phy
sica
l/gen
etic
dis
tanc
e (k
b c
M–1
)P
hysi
cal d
ista
nce
(kb
) a
Figure 2 Representation of physical/genetic distance along chromosome 2. a, Physical
distance (in kb) is plotted against position of each marker on the Recombinant Inbred (RI)
map. Data above and below the centromere are treated as separate sets for regression
analysis. Note that genetic distance is measured from the telomere (TEL2N), whereas
physical distance is measured from the bottom of the NOR. b, The ratio of physical to
genetic distance between successive pairs of markers is plotted against map position
for each marker pair on the RI map.
© 1999 Macmillan Magazines Ltd
the predicted proteins (2,542 out of 4,037) have a signi®cant match(FASTA, P , 10 2 10) to another A. thaliana protein, with 2,138matching at least one other protein encoded on chromosome 2. Ofthese, 593 are found in 239 tandem duplications that range in sizefrom 2 to 9 genes (average 2.5). A particularly striking example isfound in BAC F16P2, which contains tandem repeats of three genefamilies (Fig. 5). Of the 12 intact copies of a putative short-chaindehydrogenase/reductase gene in this region, 9 are found in tandem.This gene family is most similar to tropinone reductase, a branchpoint in the biosynthesis of tropane alkaloids including suchmedicinally important compounds as atropine, scopolamine andcocaine18,19. Although these alkaloids are not known to occur in theBrassicaceae, at least one of these genes is expressed, suggesting arole in secondary metabolism which has yet to be elucidated. Thissame BAC also contains a tandem array of seven genes encodingglutathione S-transferase (GST), which has been implicated innumerous plant processes including stress and pathogen responses,xenobiotic tolerance and vacuolar sequestration20. In total, 13 GSTgenes have been found on chromosome 2, with 4 of the other genesoccurring as tandem pairs. The GSTs within the long tandem repeatare more closely related to one another than to any other of the 25GSTs known in Arabidopsis, indicating that this expansion is ofrelatively recent origin. There is also a smaller duplication consisting
of three pumilio-like genes, two of which are in tandem. Theproximity of these gene duplications suggests either that this is arepeat-prone region of the chromosome or that they arose as part ofthe same expansion event.
Some of the duplicated genes are found within large chromoso-mal duplications (Fig. 6). A segment of the bottom arm ofchromosome 2 (13.4±14.1 Mb, containing 170 genes) from cloneF20M17 to F4P9 was found to be duplicated on chromosome 1(from clone F19P19 to F3F20) with an inversion in the middle(13.8±13.9 Mb on chromosome 2). Within this region, 57 gene pairs(33%) are duplicated between the two chromosomes. Severalinstances of tandem duplications on just one of the two chromo-somes were also observed. For example, F4P9.8 encodes an auxin-inducible protein that matched two adjacent copies of a similarprotein-coding sequence (F19P19.31 and F19P19.32) on chromo-some 1; and a single copy of a gene encoding an ion-channel proteinon chromosome 1 (YUP8H12.9) matched two adjacent copies onchromosome 2 (T32F6.9 and T32F6.8). Sequence comparison of theproteins in these asymmetric duplications suggests that the tandemduplication events occurred after the large-scale duplication. Anadditional smaller section near the top of chromosome 2 (200±500 kb) shares 16 genes with a region on chromosome 1 (T10B6).An even larger duplication (4.6 Mb) between chromosome 2 (6.7±11.3 Mb, from clone F9O13 to F18A8) and chromosome 4 (fromclone F20O9 to the bottom end of the chromosome) was found. Inthis region, 39% of the genes (430 out of 1,100) are duplicatedbetween the two chromosomes, including some singleton-tandemduplication pairs, as described above. In addition, although thisregion contains megabase-scale rearrangements (translocations orinversions), within each rearranged segment gene order is stillpreserved between the two chromosomes. A small part of thisduplication (45 kb) has been described recently21.
Comparative and evolutionary genomicsComparison of the predicted proteins on chromosome 2 to otherA. thaliana proteins, to proteins from other plants, and to allavailable complete genome sequences provides insight into theevolution of plant genomes. More speci®cally, the timing of geneduplications can be better understood. For example, as most of the2,542 chromosome-2 proteins (83%) that have paralogues withinthe A. thaliana genome are more similar to their paralogue than toany protein in the available completed genomes, it is likely that thesegene duplications occurred after the separation of plants from theanimal and fungal lineages. This analysis also suggests that all plantshave a common set of genes for many functions. Even thoughsequencing in other plants has focused primarily on particular typesof genes, more than 40% (1,656) of the proteins encoded onchromosome 2 have a signi®cant match to a protein previously
articles
NATURE | VOL 402 | 16 DECEMBER 1999 | www.nature.com 763
Table 1 Classi®cation of chromosome-2 proteins according to role orfunction
Percentage ofchromosome-2
proteins.............................................................................................................................................................................
Regulatory functions 7.8Signal transduction 4.0Cellular structure, organization and biogenesis 3.6Transport and binding proteins 3.6Protein fate 3.0Energy metabolism 2.7Growth and development 2.6Secondary metabolism 2.5Cellular processes 2.4Pathogen responses 2.0Protein synthesis 1.9Fatty acid and phospholipid metabolism 1.5General transcription 1.4Environmental response 1.0DNA metabolism 0.7Central intermediary metabolism 0.7Amino-acid biosynthesis 0.6Purine and pyrimidine base, nucleoside and nucleotide metabolism 0.6Biosynthesis of cofactors, prosthetic groups and carriers 0.4Other categories 6.9Unclassi®ed 1.4Unknown protein 21.4Hypothetical protein 27.1.............................................................................................................................................................................
4,037 genes are included in this analysis.
Table 2 Compositional analysis of Arabidopsis chromosome 2 and comparison with other eukaryotic chromosomes and genomes
Feature A. thaliana* P. falciparum* S. cerevisiae² C. elegans²...................................................................................................................................................................................................................................................................................................................................................................
Overall length Total: 19.6 MbShort arm: 3.6 MbLong arm: 16.0 Mb
0.95 Mb 13.4 Mb 97 Mb
Base composition (%GC) Overall: 35.8%Coding: 43.6%
Non-coding: 32.1%
19.7%24.3%13.3%
38%40%35%
36%§§
Number of genes 4,037 209 6,339 19,099Gene density 4.4 kb per gene 4.5 kb per gene 2.1 kb per gene 5.0 kb per gene
Exon statistics³ 20,352 exons4.6 exons per gene
295 bp per exon
355 exons1.7 exons per gene
728 bp per exon
6,561 exons1.04 exons per gene
§
§5 exons per gene
§
Intron statistics 15,926 introns3.6 introns per gene
178 bp per intron
144 introns1.6 introns per gene
207 bp per intron
§§§
§§§
...................................................................................................................................................................................................................................................................................................................................................................
* Chromosome 2 only.² Complete genome.³ Includes intronless genes.§ Not published.
© 1999 Macmillan Magazines Ltd
identi®ed in other plant species. Of these, 48% (803) do not have amatch in any complete genome sequence. In addition, comparisonwith other species indicates that many of the genes on chromosome2 probably originated within the plant lineage. More than 65%(2,745) of the predicted proteins do not show signi®cant similarityto any other completed genome, suggesting that they are unique toplants. Of the remaining proteins (1,293), the majority are mostsimilar to proteins from Saccharomyces cerevisiae (368 or 28%) orCaenorhabditis elegans (633 or 49%). The number of proteins whosebest match is to one of these two proteomes is proportional to thetotal number of proteins encoded in each genome, suggesting that A.thaliana is roughly equally evolutionarily distant from animals andfungi. The remaining 292 proteins (23%) are most similar to bacterialand archaeal proteins, which may be due in part to transfers of genesfrom organellar genomes to the nucleus (see below).
Lateral transfers of genes from organelles to the nucleus have beenwell established22. In most cases, such genes were identi®ed becausetheir products were known to function in the organelle. The
sequence of chromosome 2 allows a reverse type of analysis,identi®cation of nuclear genes that originated in the chloroplastby their evolutionary ancestry. Genes most similar to cyanobacterialgenes relative to genes from other complete genomes were con-sidered to have been acquired from the chloroplast genome. Usingthis methodology, 135 putative chloroplast-derived genes wereidenti®ed. Of these, 71 (52.5%) are predicted to have chloroplast-targeting peptides. The putative functions of these genes includeDNA repair, protein synthesis, regulatory functions, cell division,photosynthesis, transport and oxidation±reduction. As these genesare not found in any of the completed plant chloroplast genomes,most of these gene transfers probably occurred soon after thesymbiosis with the chloroplast evolved. Many other genes are ofunknown function and will be worth pursuing as potential novelchloroplast-targeted functions.
A mitochondrial genome in the nucleusCompletion of the chromosome has revealed a large and
articles
764 NATURE | VOL 402 | 16 DECEMBER 1999 | www.nature.com
Trinucleotide compositionvariation
GC content (%)
Gene density
EST density
Tandem gene duplications
tRNA gene density
Transposon density
Intermediate repeats
20304050607080
020406080
02468
0
2
4
6
0
5
10
15
Chromosome position (Mb)
0 5 10 15 20
c
a
b
g
f
e
d
0
10
20
30
Figure 4 Distribution of various features along chromosome 2. The two arms of the
chromosome are represented as a continuous sequence of DNA, beginning immediately
after the NOR (position 1) and continuing to the centromere (position 3,606,929), where it
is joined to the long arm (which terminates with the most telomeric base (position
19,647,005)) by a 120-bp spacer. The various features are plotted relative to the
nucleotide position on the chromosome. a, Chi-squared statistics of trinucleotide
frequency50 and GC composition in a 10-kb sliding window. b, Gene density per 100 kb;
c, EST density per 100 kb. d, Tandem repeats, showing repeat number as well as
location. e, tRNAs per 100 kb. f, Density of transposon-related sequences per 100 kb.
g, Position of pericentromeric repeat classes along chromosome 2. From top to bottom
they are 180 bp, Athila, 106B, 11B7RE, 163A, 164A, 278A, mi167, pAtT27, new repeats
1±4. The GC peak (45%) and the gaps in gene and transposon density from 3.2 to 3.5 Mb
reveal the site of the mitochondrial DNA insertion.
© 1999 Macmillan Magazines Ltd
unexpected organellar-to-nuclear gene-transfer event. Within thegenetically de®ned centromere, there is a stretch of 270 kb ofsequence that is nearly identical to that of the Arabidopsis mito-chondrial genome. The authenticity of this insertion in the Colum-bia ecotype was con®rmed by PCR ampli®cation across thejunctions of mitochondrial and unique nuclear DNA, followed bythe sequencing of the corresponding fragments. This insertion is
much larger than any of the previously reported organellar-nucleartransfers23, and is 99% identical to the mitochondrial genome,suggesting that the transfer event was very recent. As the publishedsequence is derived from the C24 ecotype24, it is not possibledetermine whether the observed sequence polymorphisms are dueto divergence of the insertion from the mitochondrial genome or todifferences among ecotypes. The organization of the mitochondrial
articles
NATURE | VOL 402 | 16 DECEMBER 1999 | www.nature.com 765
Chromosome 2 coordinate (Mb)
Chr
omos
ome
1 re
lativ
e p
ositi
on(M
b)
13.4 13.6 13.8 14.0
1.0
1.4
1.2
0.8
0.6
a
b
Chromosome 2 coordinate (Mb)
Chr
omos
ome
4 re
lativ
e p
ositi
on(M
b)
7.0 8.0 9.0 10.0 11.0
16.0
15.0
14.0
13.0
12.0
17.0
Figure 6 Large chromosomal duplications. Regions of similarity were initially detected
using the MUMmer program after which sequences were aligned using the dds program
with criteria of at least 75% identity over 100 bp. Points in the dot plot represent the
coordinates of each such match. Values show positions (in bp) within each contig
assembled from all available sequence. a, Duplication between chromosomes 1 and 2.
b, Duplication between chromosomes 4 and 2. Arrows indicate the small, previously
described duplication21.
G GG G G G G
T T T T T T T TT T
PP T T
P
26349 30113 33877 37641 41405 45169 48933
101829
75281715176775363989602255646152697
978659410190337865738280979045
127977124213120449116685112921109157105393
Figure 5 Organization of genes on BAC F16P2 showing the three tandem gene
duplications. The display from TIGR Annotator shows the exon±intron structure of the
annotated genes. The glutathione S-transferase and tropinone reductase genes are
labelled G and T, respectively. A smaller duplication of pumilio-like protein (P) is also
present.
© 1999 Macmillan Magazines Ltd
DNA in the nucleus differs from that of the published mitochon-drial genome and its predicted alternate forms25; it corresponds toanother possible isoform with an internal deletion (Fig. 7). Thisdeletion may have occurred during or after transfer or mayrepresent an alternate form of the Columbia mitochondrialgenome.
The pericentromeric regionUntil the completion of chromosome 2, it was unclear what featureswould be discovered in the regions around the centromere (thepericentromeric region). In addition to the detection of the mito-chondrial genome insertion, an analysis of this region has revealedthe presence, distribution and diversity of speci®c repetitive ele-ments on a single chromosome. Transposons were identi®ed bysearching for predicted amino-acid sequences that are similar tocharacterized transposases, reverse-transcriptases, polyproteins orother transposon-related coding sequences. Of the 563 transposonproteins annotated, 50% are pseudogenes found either as fragmentsof previously intact elements or as genes interrupted by stop codons,frameshifts or other mutations. A breakdown of the classes oftransposons identi®ed is shown in Table 3. Representative membersof most of the large transposon families seen in other plants(Mutator, En-Spm and Ac-Ds (haT/mariner)) are present. Retro-elements including members of the LINE-like Ta11 elements, long
terminal repeat (LTR) elements in both the Ty3-gypsy and the Ty1-copia family, and the Arabidopsis Athila family represent the largestclass of transposons. This accumulation of retroelements and othertransposons may be due to either preferential insertion into theregions ¯anking the centromere or their elimination from the rest ofthe genome at a higher rate26. In addition, representatives ofpreviously described intermediate repeats as well as some newrepeats were identi®ed (Fig. 4f), a distribution consistent withprevious hybridization data27±30.
As shown in Fig. 4, the number of recognizable genes decreasesclose to the centromere, whereas the number of transposableelements and various classes of intermediate repeats increases.Nevertheless, within this region there are some identi®able genes:a protein kinase (T25N22.14); a plasma membrane proton ATPase(F9A16.7); an ABC transporter (T5E7.1); a DNA-replication licen-sing factor (T12J2.1), an RNA helicase (T12J2.7) and a 40Sribosomal protein S16 (F7B19.13). The RNA helicase, proteinkinase and 40S ribosomal protein appear to be actively expressedgenes, as each is represented in the EST collection.
ConclusionsArabidopsis has been considered to be not only a tractable species forplant biology studies, but also a suitable model from which afundamental understanding of basic plant processes could beapplied to agriculturally relevant species. The analysis of thecomplete sequence of chromosome 2 validates this concept becausemany genes that have previously been characterized in other plantshave been found. As observed with other eukaryotic genomeprojects, however, roughly half of the predicted genes have noknown function. This represents a large, untapped resource ofinformation that will require a combination of knockoutapproaches, antisense technologies and global gene-expressionanalyses to understand their functions.
Both gene duplication and large-scale chromosomal duplicationsare observed, events that apparently occurred after the divergence ofplants from other lineages. These duplications are larger than anyreported in the C. elegans genome31, but less extensive thanthe complete genome duplications proposed for maize32 andS. cerevisiae33. As gene duplication is often accompanied by func-tional divergence34, the identi®cation of such duplications in theplant lineage may help identify new plant functions. Many of theduplications occurred in tandem, generating closely linked genefamilies which may provide some of the allelic diversity thatbreeders and agricultural researchers are seeking. Some newfamilies, such as the family of genes that may encode one step inalkaloid biosynthesis, could provide a reservoir of molecular tools tomodify plant metabolism and productivity in a bene®cial manner.
articles
766 NATURE | VOL 402 | 16 DECEMBER 1999 | www.nature.com
D' 1 B 3 C 1 A' 3c
d
100 200 300 Mito (kb)
3430
3230
3330
Chr
2 (k
b)
A
1
B3
C
1
D
3'
D'
A'
1
1
B
C
Possibleinsertionpoint
3
3
ba
A. thalianamitochondrial
genome366,923 bp
Figure 7 Proposed model for insertion of mitochondrial DNA into chromosome 2.
a, Diagram of the major form of the published A. thaliana C24 mitochondrial genome25.
The different colours correspond to the major segments of the genome: A (nucleotide
297,579±44,698); B (48,895±112,146); C (118,737±178,862); D (183,060±
290,989); and two major repeats, 1 (44,698±48,894, 178,863±183,059) and 3
(112,147±118,736, 297,579±290,990). A small repeat (repeat number 2) is not shown.
The orientation of one of the three repeats is inverted relative to the other. The arrows are
used to indicate orientation for comparison with the alternative form. b, Hypothetical
alternative form, generated by in silico rearrangements around some of the repeats.
Arrows and primes indicate inversions. c, Linearized form of genome shown in b if
inserted at the point between repeat 3 and segment D9. d, Dot plot of the linearized
alternate form against the region of chromosome 2 containing the mitochondrial DNA. The
gap in the diagonal indicates that there was a deletion of a large segment (see text).
Table 3 Representaton of transposable elements on chromosome 2
Transposon class Subclass Family Number of openreading frames
.............................................................................................................................................................................
DNA elementsMutator 73
En-Spm/Tam1/Psl 71Ac-Ds 12mariner 2
hAT 1
RetroelementsNon-LTR LINE-likeretrotransposons
Ta11-like 152TSCL 1
LTR retrotransposonsAthila 75Other 141
Ty1-copia-like (Ta1)Ty3-gypsy-like (Tat1)
Retroviral-like Replication protein 28Helicase 7
.............................................................................................................................................................................
Total 563.............................................................................................................................................................................
© 1999 Macmillan Magazines Ltd
The completion of chromosomes 2 and 4 (ref. 35), whichaccounts for 30% of the entire Arabidopsis genome, represents alandmark achievement in eukaryotic genomics. The 16-Mb contigon the lower arm of chromosome 2 is the largest sequence assemblypublished to date. Sequencing within the genetically de®ned cen-tromere has shown not only an abundance of degenerate transpos-able elements, but also a number of recognizable intact genes, someof which appear to be expressed. As researchers move on tosequence larger and more complex genomes, it remains to be seenwhether the level of genome closure achieved in this model plantcan be duplicated.Note added in proof : Since this paper was accepted for publication, a23Mb contig from human chromosome 22 has been published(See Nature 402, 489±495 (1999). M
MethodsSequencing
Three libraries made from the Columbia ecotype of A. thaliana were used in sequencingchromosome 2: the TAMU BAC library (clones pre®xed with T36); the IGF BAC library(clones pre®xed with F37); and the Mitsui P1 library (clones pre®xed with M38). ShearedBAC DNA (1.6±2 kb) was ligated to a modi®ed pUC19 vector and transformed intoEscherichia coli. Sequencing reactions were performed using either BigDye primers,BigDye Terminators or Dichlorohodamine Terminators (P.E. Biosystems), and were runon ABI 377 and ABI 3700 sequencers (P.E. Biosystems). Shotgun clones were sequenced togenerate 7±8-fold coverage of each BAC. In total, 590,165 reactions were carried out togenerate the sequence of the chromosome. In collaboration with Celera Genomics, 99,072reactions were run on ABI 3700 sequencers, which accounted for nearly 20% of the overallsequence for the chromosome. The discrepancy rate on these machines, as determined bycomparison of overlapping BACs, was comparable to that of ABI 377 machines (1unresolved discrepancy per 32,000 bases). A base difference is possibly due to residualheterozygosity or to different DNA sources used to make the various libraries. IndividualBACs were assembled from the shotgun sequences using the TIGR Assembler program39.Following assembly into contigs, the Grouper program (TIGR, unpublished software) wasused to link individual assemblies within the BACs. The BACs were then closed using acombination of BAC walking, directed PCR and resequencing of individual shotgunclones. To con®rm the assembly of the entire BAC sequence, predicted and actualrestriction digest patterns were compared. The collinearity of the BACs with theArabidopsis chromosome 2 sequence was con®rmed by alignment of the sequence ofchromosome 2 with markers obtained from the Arabidopsis Recombinant Inbred map(http://genome-www.stanford.edu/Arabidopsis/ ww/Nov98RImaps/index.html) andwith YAC end sequences. Several markers that clearly mapped to single genetic loci wereexcluded from this analysis because they contained repetitive sequence and could not beassigned a unique location on the sequence map.
Two BACs (T5M2 and T5E7) within the genetically de®ned centromere26 were found tocontain long stretches of sequence with very high similarity to the Arabidopsis C24mitochondrial genome (69 kb on T5M2, and 74 kb on T5E7) joined directly to nuclearsequence. Two overlapping BACs provided an additional 122 kb of purely mitochondrialsequence to close the gap, suggesting that the total size of the organellar DNA insertion isabout 270 kb. Veri®cation of the mitochondrial insertion involved PCR ampli®cation ofthe two ¯anking junction sequences from Arabidopsis Columbia genomic DNA. PCRproducts with the expected size were ampli®ed that had sequence identical to the junctionBACs. Similar sized junction PCR products were obtained using additional BACs fromboth the TAMU and IGF libraries. Determination of the exact size and sequence of theinsertion will be facilitated once the sequence of the Columbia mitochondrial genome isknown, and it becomes possible to distinguish between nuclear- and organellar-derivedmitochondrial sequences.
The sequence of chromosome 2 identi®ed by the BAC clone F11L15 did not contain thetelomeric repeat consensus sequence 59-TTTAGGG-39 (ref. 15). However, the sequence ofthe lower telomere of chromosome 2 had been previously identi®ed in the clone pAtT51(ref. 15) (http://genome-www.stanford.edu/Arabidopsis/ww/Nov98RImaps/index.html).pAtT51 was sequenced to identify primer sites that could be used to walk inwards from thetelomere. Southern hybridization using probes derived from F11L15 and pAtT51suggested that the gap between the BAC contig and pAtT51 was less than 1 kb. A 702-bpPCR product that spanned this physical gap was ampli®ed from Arabidopsis Columbiagenomic DNA and sequenced.
Analysis
Annotation of chromosome 2 involved both DNA and protein database searches and geneprediction programs. Gene predictions were made by Genscan40, Gene®nder (P. Green andL. Hillier, unpublished data) and GRAIL41. Splice sites were identi®ed by these programs,by NetPlantGene42 and by alignment with EST and protein sequences using the dds anddps programs43. Predicted protein sequences were searched against a non-redundantamino-acid database and against HMMs of the protein domains from pFAM3 (1,407) andTIGR built HMMs (502) with the HMMer2 program17. tRNAs were identi®ed usingtRNAscan-SE44. Repeats were identi®ed using RepeatMasker (A. F. A. Smit and P. Green,http://ftp.genome.washington.edu/RM/RepeatMasker.html) and a modi®ed version ofthe suf®x tree algorithm used in the MUMmer system45. Output from the gene ®nding andsignal detection programs was displayed using the Annotator genome viewer (L. Zhou,
unpublished data) and manually edited. The accuracy of computational gene ®nders forArabidopsis is quite variable and all gene predictions that are not supported by indepen-dent evidence, such as protein or EST homology, should be regarded as tentative, pendingfurther research.
Putative membrane-spanning proteins were identi®ed using TopPred46. SignalP47 wasused to detect signal peptide sequences, while Predotar (N. Peeters et al., unpublished)searched for organellar targets. In addition, ChloroP48 and Mitoprot49 were used to ®ndproteins targeted for the chloroplast and mitochondria, respectively. The results presentedare the concordance of the predictions.
We analysed the proteome of chromosome 2 by comparing each predicted protein witha database of all proteins from chromosome 2, all available completed genomes and allproteins from A. thaliana and C. elegans using described methods50. To identify genes onother chromosomes that were related to chromosome-2 genes, nucleotide searches werealso used as much of the nucleotide sequence in GenBank is unannotated. Each predictedtranscript was searched against all available genomic sequence using a cut-off of 75%identity over 50% of the transcript length in order to take intron sequences into account.
To search for large-scale duplications within the genome, we assembled pseudomole-cules representing all available sequence of chromosomes 1 and 4. The sequence ofchromosome 2 was then compared with chromosomes 1 and 4 using the MUMmerprogram45 with a word size of 20. Long-range duplications were found by plotting theresults, and a local alignment program dds43 was used to identify matches between theexact repeats within the duplicated regions (stringency .75%). Because of insuf®cientsequence or overlap information, it is dif®cult at the present time to constructchromosome-size pseudomolecules of chromosomes 3 and 5 and assess the possibility ofother large duplications.
Received 5 October; accepted 29 October 1999.
1. Schmidt, R. & Dean, C. Towards construction of an overlapping YAC library of the Arabidopsis
thaliana genome. Bioessays 15, 63±69 (1993).
2. Bevan, M. et al. Analysis of 1.9 Mb of contiguous sequence from chromosome 4 of Arabidopsis
thaliana. Nature 391, 485±488 (1998).
3. Bevan, M. et al. Objective: the complete sequence of a plant genome. Plant Cell 9; 476±478 (1997).
4. Copenhaver, G. P. & Pikaard, C. S. Two-dimensional RFLP analyses reveal megabase-sized clusters of
rRNA gene variants in Arabidopsis thaliana, suggesting local spreading of variants as the mode for gene
homogenization during concerted evolution. Plant J. 9, 273±282 (1996).
5. Copenhaver, G. P. & Picaard, C. S. RFLP and physical mapping with an rDNA-speci®c endonuclease
reveals that nucleolus organizer regions of Arabidopsis thaliana adjoin the telomeres on chromosomes
2 and 4. Plant J. 9, 259±272 (1996).
6. Schweizer, D., Loidl, J. & Hamilton, B. Heterochromatin and the phenomenon of chromosome
banding. Results Probl. Cell Differ. 14, 235±254 (1987).
7. Martinez-Zapater, J. M., Estelle, M. A. & Somerville, C. R. A highly repeated DNA sequence in
Arabidopsis thaliana. Mol. Gen. Genet. 204, 417±423 (1986).
8. Maluszynska, J. & Heslop-Harrison, J. S. Localization of tandemly repeated DNA sequences in
Arabidopsis thaliana. Plant J. 1, 159±166 (1991).
9. Simoens, C. R., Gielen, J., Van Montagu, M. & Inze, D. Characterization of highly repetitive sequences
of Arabidopsis thaliana. Nucleic Acids Res. 16, 6753±6766 (1988).
10. Pluta, A. F., Mackay, A. M., Ainsztein, A. M., Goldberg, I. G. & Earnshaw, W. C. The centromere: hub
of chromosomal activities. Science 270, 1591±1594 (1995).
11. Round, E. K., Flowers, S. K. & Richards, E. J. Arabidopsis thaliana centromere regions: genetic map
positions and repetitive DNA structure. Genome Res. 7, 1045±1053 (1997).
12. Zachgo, E. A. et al. A physical map of chromosome 2 of Arabidopsis thaliana. Genome Res. 6, 19±25
(1996).
13. Marra, M. A. et al. High throughput ®ngerprint analysis of large-insert clones. Genome Res. 7, 1072±
1084 (1997).
14. Copenhaver, G. P., Doelling, J. H., Gens, J. S. & Pikaard, C. S. Use of RFLPs larger than 100 kbp to map
the position and internal organization of the nucleolus organizer region on chromosome 2 in
Arabidopsis thaliana. Plant J. 7, 273±286 (1995).
15. Richards, E. J., Chao, S., Vongs, A. & Yang, J. Characterization of Arabidopsis thaliana telomeres
isolated in yeast. Nucleic Acids Res. 20, 4039±4046 (1992).
16. Charrasse, S. et al. Characterization of the cDNA and pattern of expression of a new gene over-
expressed in human hepatomas and colonic tumors. Eur. J. Biochem. 234, 406±413 (1995).
17. Sonnhammer, E. L., Eddy, S. R., Birney, E., Bateman, A. & Durbin, R. Pfam: multiple sequence
alignments and HMM-pro®les of protein domains. Nucleic Acids Res. 26, 320±322 (1998).
18. Leete, E. Recent developments in the biosynthesis of the tropane alkaloids. Planta Med. 56, 339±352
(1990).
19. Yamad, Y. et al. in Secondary Products from Plant Tissue Culture (eds Charlwood, B. V. & Rhodes, M. J.
C.) 227±242 (Clarendon, Oxford, 1990).
20. Marrs, K. A. The functions and regulation of glutathione S-transferases in plants. Annu. Rev. Plant
Physiol. 47, 127±158 (1996).
21. Terryn, N. et al. Evidence for an ancient chromosomal duplication in Arabidopsis thaliana by
sequencing and analyzing a 400-kb contig at the APETALA2 locus on chromosome 4. FEBS Lett. 445,
237±245 (1999).
22. Martin, W. & Herrmann, R. G. Gene transfer from organelles to the nucleus: how much, what
happens, and why? Plant Physiol. 118, 9±17 (1998).
23. Blanchard, J. L. & Schmidt, G. W. Pervasive migration of organellar DNA to the nucleus in plants. J.
Mol. Evol. 41, 397±406 (1995).
24. Unseld, M., Marienfeld, J. R., Brandt, P. & Brennicke, A. The mitochondrial genome of Arabidopsis
thaliana contains 57 genes in 366,924 nucleotides. Nature Genet. 15, 57±61 (1997).
25. Klein, M. et al. Physical mapping of the mitochondrial genome of Arabidopsis thaliana by cosmid and
YAC clones. Plant J. 6, 447±455 (1994).
26. Copenhaver, G. P. et al. Genetic de®nition and sequence analysis of Arabidopsis centromeres. Science
(in the press).
27. Thompson, H. L., Schmidt, R. & Dean, C. Identi®cation and distribution of seven classes of middle-
repetitive DNA in the Arabidopsis thaliana genome. Nucleic Acids Res. 24, 3017±3022 (1996).
articles
NATURE | VOL 402 | 16 DECEMBER 1999 | www.nature.com 767
© 1999 Macmillan Magazines Ltd
28. Thompson, H., Schmidt, R., Brandes, A., Heslop-Harrison, J. S. & Dean, C. A novel repetitive
sequence associated with the centromeric regions of Arabidopsis thaliana chromosomes. Mol. Gen.
Genet. 253, 247±252 (1996).
29. Thompson, H. L., Schmidt, R. & Dean, C. Analysis of the occurrence and nature of repeated DNA in
an 850 kb region of Arabidopsis thaliana chromosome 4. Plant Mol. Biol. 32, 553±557 (1996).
30. Brandes, A., Thompson, H., Dean, C. & Heslop-Harrison, J. S. Multiple repetitive DNA sequences in
the paracentromeric regions of Arabidopsis thaliana L. Chromosome Res. 5, 238±246 (1997).
31. The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for
investigating biology. Science 282, 2012±2018 (1998).
32. Gale, M. D. & Devos, K. M. Comparative genetics in the grasses. Proc. Natl Acad. Sci. USA 95, 1971±
1974 (1997).
33. Wolfe, K. H. & Shields, D. C. Molecular evidence for an ancient duplication of the entire yeast genome.
Nature 387, 708±713 (1997).
34. Hughes, A. L. The evolution of functionally novel proteins after gene duplication. Proc. R. Soc. Lond. B
Biol. Sci. 256, 119±124 (1994).
35. The European Union Arabidopsis Genome Sequencing Consortium & The Cold Spring Harbor,
Washington University in St Louis and PE Biosystems Arabidopsis Sequencing Consortium. Sequence
and analysis of chromosome 4 of the plant Arabidopsis thaliana. Nature 402, 769±777 (1999).
36. Choi, S., Creelman, R. A. Mullet, J. E. & Wing, R. A. Construction and characterization of bacterial
arti®cial chromosome library of Arabidopsis thaliana. Plant Mol. Biol. Rep. 13, 124±128 (1995).
37. Mozo, T., Fischer, S., Meier-Ewert, S., Lehrach, H. & Altmann, T. Use of the IGF BAC library for
physical mapping of the Arabidopsis thaliana genome. Plant J. 16, 377±384 (1998).
38. Liu, Y.-G., Mitsukawa, N., Vasquez-Tell, A. & Whittier, R. F. Generation of a high-quality P1 library of
Arabidopsis suitable for chromosome walking. Plant J. 7, 351±358 (1995).
39. Sutton, G. G., White, O., Adams, M. D. & Kerlavage, A. R. TIGR Assembler: a new tool for assembling
large shotgun sequencing projects. Genome 1, 9±19 (1995).
40. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol.
268, 78±94 (1997).
41. Uberbacher, E. C. & Mural, R. J. Locating protein-coding regions in human DNA sequences by a
multiple sensor-neural network approach. Proc. Natl Acad. Sci. USA 88, 11261±11265 (1991).
42. Hebsgaard, S. M. et al. Splice site prediction in Arabidopsis thaliana DNA by combining local and
global sequence information. Nucleic Acids Res. 24, 3439±3452 (1996).
43. Huang, X., Adams, M. D., Zhou, H. & Kerlavage, A. R. A tool for analyzing and annotating genomic
sequences. Genomics 46, 37±45 (1997).
44. Lowe, T. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in
genomic sequence. Nucleic Acids Res. 25, 955±964 (1997).
45. Delcher, A. L. et al. Alignment of whole genomes. Nucleic Acids Res. 27; 2369±2376 (1999).
46. Claros, M. G. & von Heijne, G. TopPred II: an improved software for membrane protein structure
predictions. Comput. Appl. Biosci. 10, 685±686 (1994).
47. Nielsen, H., Brunak, S. & von Heijne, G. Machine learning approaches for the prediction of signal
peptides and other protein sorting signals. Protein Eng. 12, 3±9 (1999).
48. Emanuelsson, O., Nielsen, H. & von Heijne, G. ChloroP, a neural network-based method for
predicting chloroplast transit peptides and their cleavage sites. Protein. Sci. 8, 978±984 (1999).
49. Claros, M. G. & Vincens, P. Computational method to predict mitochondrially imported proteins and
their targeting sequences. Eur. J. Biochem. 241, 779±786 (1996).
50. Nelson, K. E. et al. Evidence for lateral gene transfer between Archaea and bacteria from genome
sequence of Thermotoga maritima. Nature 399, 323±329 (1999).
Acknowledgements
We thank S. Jackson, J. Jiang, K. Meyer, E. Richards and S. Tabata for helpful contributionsin completing the chromosome. In addition, we would like to thank the TIGR faculty,system administrators, sequencing facility, bioinformatics department and administrativestaff. This work was supported by the National Science Foundation, the Department ofEnergy and the US Department of Agriculture.
Correspondence and requests for materials should be addressed to J. C. Venter(e-mail: [email protected]).
articles
768 NATURE | VOL 402 | 16 DECEMBER 1999 | www.nature.com
Figure 3 Gene map of Arabidopsis thaliana chromosome 2. Predicted gene models are
shown with arrowheads indicating the direction of transcription. Genes are colour coded
according to broad role categories as shown in the key. The region highlighted in pink
indicates the genetically de®ned centromere within which the cross-hatched gene-free
region represents the mitochondrial insertion. The gap indicates the highly repetitive
centromeric region. Note that the NOR is not drawn to scale. Gene identi®cation numbers
follow the recently adopted nomenclature (At2gNNNNN), where At is Arabidopsis thaliana,
2 is chromosome number, g indicates that the feature is a gene and numbering is from top
to bottom of each chromosome in increments of 10.
Q
© 1999 Macmillan Magazines Ltd
Puj
a
Dan
iela
Nuc
leol
ar o
rgan
izer
reg
ion
Tel
omer
e
tRN
A
18S
RN
AS
eque
nce
gap
Mito
chon
dria
l reg
ion
Gen
etic
cen
trom
ere
5.8S
RN
A
26S
RN
A
snR
NA
1010
1020
1030
2940
29
20
2000
019
990
1998
019
970
1996
0
1039
0
2381
0
1038
0
2380
0
1037
0
2379
0
1036
0
2378
0
1035
0
2377
0
1034
0
2376
0
1033
0
2375
0
1032
0
2374
023
730
2742
027
430
2744
027
450
2746
027
470
2748
027
490
2750
027
510
6830
6840
6850
6860
6870
6880
6890
6310
6320
6330
6340
6350
6360
6370
6380
6390
6400
6410
1987
019
860
1985
019
840
1983
019
820
3487
0
1981
0
3488
0
1980
0
3489
0
1979
0
3490
0
1978
0
3491
034
920
3493
034
940
3496
0
4387
043
860
4385
043
840
4383
043
820
4381
043
800
1977
019
760
1975
019
740
1973
019
720
3497
0
1971
019
700
3498
034
990
1969
0
3500
0
2163
0
1968
0
3501
0
2162
021
610
3502
035
030
2160
0
3504
0
2159
0
3505
0
2158
0
3506
0
2156
021
550
2154
0
4210
042
090
4208
042
070
4206
042
050
4204
042
030
1967
0
3246
032
470
3248
032
490
3250
032
510
3252
032
530
4113
041
120
4111
041
100
4109
041
080
4107
041
060
3215
0
4105
0
3214
032
130
4104
0
3212
032
100
3209
032
080
3207
032
060
1699
017
040
4103
041
020
4101
041
000
4099
040
980
3205
032
040
3203
032
020
3201
032
000
3199
031
980
3197
031
960
4090
040
910
4092
040
930
4094
040
950
4096
040
970
1212
012
130
1214
012
150 25
570
1216
012
170
2558
0
1218
0
2559
0
1219
0
2560
025
610
1220
0
2562
0
1221
0
2563
025
640
1166
011
650
1164
011
630
1162
011
610
1160
011
590
1222
0
1158
0
1223
012
240
1225
012
260
1227
012
280
1229
012
300
1231
012
320
5350
2874
028
750
2876
028
780
2879
028
800
4012
040
130
4014
040
150
4019
040
200
4021
0
2134
021
350
2137
021
380
2139
021
400
2141
021
420
2143
0
1185
011
860
1187
011
880
1189
011
900
1191
011
920
1193
0
2144
021
450
2146
021
470
2148
0
2372
0
2149
021
500
2371
023
700
2151
0
2369
0
2152
0
4771
0
2367
0
2153
0
2366
0
4772
047
730
2364
0
4775
0
2363
0
4776
047
770
4778
047
790
1734
017
350
4780
0
1736
017
290
1738
017
390
1740
017
410
1742
017
430
9790
9800
9810
9820
9830
9840
9850
9860
6740
6730
6720
2368
0
6710
6700
6690
4781
0
6680
4782
0
6670
4783
0
6660
4784
0
6650
4785
047
860
4787
047
880
4789
0
1744
0
4790
0
1737
0
1737
0
6640
6630
6620
4791
047
9204
7930
4794
047
950
4796
047
970
4798
047
990
2335
023
340
2333
023
320
2331
023
300
2329
023
280
4549
045
500
4551
045
520
4553
045
540
4555
045
560
1524
015
230
1522
015
210
1520
015
190
1518
015
170
2830
028
290
2828
028
270
2826
0
2826
0
2825
028
240
7240
7250
7260
7270
7280
7290
7300
7310
7320
2633
026
320
2631
026
300
2629
026
280
2627
0
6990
7000
4429
0
7010
4430
0
7020
4431
0
7030
7040
4433
0
7050
4434
044
350
4436
044
370
4438
044
390
4440
044
410
4442
044
430
4444
044
450
4446
0
4447
044
480
1303
013
040
1305
0
1710
0
1306
013
070
1713
0
1308
0
1715
0
4449
0
1309
0
1700
0
4450
0
1310
0
1701
0
4451
0
1702
017
030
3282
032
810
3280
032
790
3278
032
770
1705
017
060
1707
017
090
1711
017
120
1714
0
3109
0
3108
031
070
3106
031
050
3104
0
1485
014
860
1487
014
880
1489
014
900
1491
014
920
1793
0
1493
0
1792
017
910
1790
017
890
4141
041
400
4139
041
380
4137
041
360
2130
2120
2110
2100
2090
2080
2070
3893
038
940
3895
038
960
3897
038
980
4707
047
060
3899
0
4705
0
3900
0
4704
0
3901
0
5900
4703
0
5910
3902
0
4702
0
5920
4701
0
1887
0
5930
4700
0
5940
1888
0
4699
0
1889
0
5950
6220
4698
0
6230
5960
6240
5970
5980
6250
5990
6260
6270
6280
6290
6300
3903
039
040
3905
039
060
3907
0
4697
0
3908
0
4696
0
3909
0
2220
4695
0
2210
3910
0
4694
0
6000
2200
3911
0
4693
0
3912
0
2190
4692
0
2180
4691
0
2170
2160
4690
046
890
2150
4688
0
2140
3878
038
770
3876
038
750
3874
038
730
3872
038
710
3913
0
4687
046
860
1476
014
750
1474
014
730
1472
014
710
2460
024
590
2458
0
3629
0
2457
024
560
3627
0
2455
0
3626
0
2454
0
3625
0
2453
0
3624
0
2452
0
3623
036
220
3621
036
200
2544
025
450
2546
025
470
2548
025
490
2550
025
510
2552
025
530
2382
023
830
2384
023
8502
3860
2387
023
880
2389
0
3619
036
180
2554
025
550
2556
0
1608
016
090
1610
01611
016
120
1613
016
140
1615
016
1601
6170
4870
4860
1506
0
4850
1507
015
080
1509
015
100
1511
015
120
1618
016
190
1620
016
210
1622
0
2304
023
050
2306
023
070
2308
023
090
2310
023
110
2312
0
7530
7520
7510
7500
7490
7480
7630
1333
013
320
1331
0
1094
0
1330
0
1095
010
960
1329
013
280
1097
010
980
1327
0
1099
011
000
3661
0
1101
0
3662
0
3662
0
1102
0
3663
036
640
3665
036
660
3667
036
680
3669
0
5800
5810
3810
3820
3830
3840
3850
3860
3870
3880
3890
1264
0
4517
045
160
4515
045
140
4513
045
120
4511
045
100
2043
020
440
2830
2820
2045
020
460
2810
2047
0
2800
2048
0
2790
2780
2049
020
500
2770
1422
014
210
1420
0
3936
039
370
3938
039
390
3940
039
410
3942
039
430
1390
1400
1410
1420
1430
1440
1450
1460
1470
3418
034
190
3420
034
210
3422
0
3422
034
230
3424
034
250
4300
4280
4070
4060
4050
4040
1390
013
890
1388
013
870
1386
0
1810
018
090
1808
018
070
1806
018
050
1804
0
2429
024
300
2431
024
320
2433
024
340
2435
024
360
2902
029
040
2905
029
060
2907
029
080
2951
029
500
2909
029
490
2910
029
480
2911
029
470
2946
029
450
2944
029
430
5820
5830
5840
5850
2536
0
5860
2537
0
5870
2538
025
390
5880
5890
2541
025
420
2543
0
2912
029
130
2942
029
410
2940
029
390
2938
029
370
2936
029
350
2934
0
2780
0
2933
0
2779
027
780
2798
027
770
2797
027
760
2796
027
750
2795
027
740
2773
027
940
2793
027
720
2792
0
1764
0
2771
0
1770
0
2791
0
1775
0
2790
0
1778
0
2789
0
1779
017
760
1773
0
2680
2670
2660
2650
3870
0
2640
3869
0
2932
0
2630
3868
0
2931
0
2620
3867
0
2930
0
2610
3866
0
2929
0
3865
0
2928
0
3864
0
2927
0
3863
0
2926
029
250
1342
0
3862
038
610
2924
0
1343
0
2770
0
1344
0
2923
0
1345
0
2769
027
680
1346
0
2767
027
880
2787
027
660
2765
027
640
2763
027
620
2761
0
3860
0
2922
0
3859
038
580
2921
0
3857
0
2920
029
190
3856
038
550
2918
029
170
3854
0
2916
0
3853
0
2915
0
3852
038
510
2914
0
2760
027
590
2758
027
570
2755
027
540
2753
027
520
3850
038
490
3848
0
1356
013
570
1358
013
590
1360
013
610
1362
013
630
1364
013
650
3300
032
990
3298
032
970
4211
042
120
4213
042
140
4215
042
160
4217
042
180
3465
034
640
3463
034
620
3461
034
600
3459
034
580
3457
0
3380
033
810
3382
033
830
3384
033
850
3387
0
3885
038
860
3887
038
880
3889
038
900
6140
3891
0
6130
3892
0
6120
6110
6100
6090
6080
6070
6060
6050
3369
033
780
3372
0
4022
0
3004
030
030
3002
030
010
3000
0
1872
0
2999
0
1873
018
740
1875
018
760
6040
1877
018
780
6030
6020
1879
0
3386
0
6010
1880
0
3388
0
1881
0
4388
043
890
4390
043
910
4392
043
930
1882
0
1882
0
4394
0
1883
0
4395
0
2250
0
1884
0
4396
0
2249
0
1730
1885
0
1720
1886
0
2248
022
470
1710
2246
0
1700
1690
2245
0
1680
2244
022
430
1670
1660
2242
0
1650
2241
0
1640
4538
045
370
4536
045
350
4534
045
330
4532
045
310
4530
0
2240
0
1630
2239
0
1620
2237
0
1610
2236
0
2998
0
2235
022
340
2997
029
960
2233
0
2995
0
1669
0
2994
029
930
1670
0
2992
0
1671
0
1672
016
730
1674
0
4529
0
1675
0
4528
0
1676
0
4527
0
1677
0
4526
0
1678
0
4525
045
240
4523
0
4522
045
210
4520
0
1286
012
850
1284
012
830
1679
0
4490
4480
1680
016
810
4470
1682
0
4460
4450
1683
0
4440
4519
0
4430
4518
0
4420
4410
4400
4390
4380
4370
4360
4350
4340
4330
4320
4310
3607
036
060
3604
036
030
3602
036
010
3600
0
4304
0
1995
0
4303
0
1994
019
930
4302
0
1992
0
4301
0
1991
019
900
1989
019
880
2960
029
590
2958
0
2957
0
2957
0
2956
029
550
2954
029
530
4170
4180
4190
2741
0
4200
2740
0
4210
2739
027
380
2737
027
360
2735
027
340
2733
0
3799
037
980
3797
037
960
3795
037
940
3793
037
920
3791
037
900
1772
017
670
1765
017
800
1777
017
740
1771
017
690
1768
0
3789
037
880
3787
037
860
3785
037
840
3783
037
820
3781
0
3980
0
3780
0
3981
039
820
3983
039
840
3985
0
1766
0
3638
036
390
3640
036
410
3642
036
430
3644
036
450
3646
0
3779
037
780
3714
037
130
3712
037
110
3710
037
090
3398
0
3708
0
3399
0
3560
3400
0
3707
0
3570
3401
0
3706
0
3402
0
3580
3705
0
3403
0
3590
3404
0
3600
3405
0
3610
3620
3406
034
070
3630
3640
3408
034
090
3410
0
2732
0
3411
0
2731
0
3412
0
2730
0
3413
034
140
2729
0
3415
0
2728
027
270
3416
0
2726
0
3417
0
2725
027
240
2723
0
2838
028
370
2836
028
350
2834
028
330
2832
028
310
3879
038
800
3881
038
820
3883
038
840
3721
037
200
3719
037
180
3717
037
160
3715
0
3808
038
090
3810
0
3811
038
120
3813
038
140
3815
038
160
3029
030
300
3031
030
320
3033
030
340
3035
030
360
3037
030
380
4632
046
310
4630
046
290
4628
046
270
4626
0
4762
047
610
2590
2580
2570
2560
2550
2540
2530
2520
2510
2696
0
3055
0
4307
0
3054
0
2697
026
980
3053
0
4308
0
3052
0
2699
0
4309
0
3051
0
2700
0
4310
0
3050
0
2701
0
4311
0
3049
0
2702
0
4312
043
130
3047
0
2703
0
4314
0
3046
0
2704
0
4315
0
2500
2490
2480
4812
048
130
4814
048
150
4816
0
3044
030
420
3041
030
400
3039
0
3039
030
480
3043
030
710
3072
030
730
3074
030
750
3076
030
770
3078
030
790
1920
1930
1111
011
120
1113
011
140
1115
011
160
1117
011
180
4800
0
1119
0
4801
048
020
4803
048
040
1409
014
100
1411
014
120
1413
014
140
1415
014
160
4805
048
060
4807
048
080
4809
048
100
4811
0
4290
042
890
4288
042
870
4286
042
850
4284
042
830
4282
042
810
4280
042
790
4278
042
770
4276
042
750
4274
042
730
4272
042
710
2230
2240
2232
022
310
2250
2260
2230
0
2270
2229
0
2280
2280
2290
2227
0
2300
2226
022
250
2565
025
660
2567
025
680
2569
0
4270
042
690
2570
025
710
4268
042
670
2572
0
4266
0
2573
0
3577
035
760
3575
035
740
3573
035
720
3020
030
170
3224
032
230
2168
0
3222
032
210
2166
0
3220
0
1909
0
2165
0
3219
032
180
1910
0
2164
0
1911
0
3217
032
160
1912
019
130
1914
0
2686
0
1915
0
2685
0
1916
0
2684
0
1917
0
2683
026
820
2681
026
800
2679
0
3692
036
930
1918
0
3694
0
1919
0
3695
036
960
2786
0
3697
0
1920
0
3698
036
990
2784
027
830
3368
0
3700
0
2782
0
3370
0
3701
0
4033
0
3371
0
2781
0
4034
0
3373
0
4035
0
4035
0
3374
0
4036
0
3375
0
4037
0
3376
0
4038
0
3377
033
790
4039
040
400
4041
040
420
3063
0
3702
0
3064
0
3703
037
040
3065
030
660
3067
030
680
3069
030
700
1692
016
930
1694
016
950
1696
016
970
1698
0
1084
010
830
1082
010
810
2587
0
1080
0
2586
025
850
1079
010
780
2584
025
830
2582
025
810
2580
025
790
2578
0
1921
019
220
1923
019
240
1925
019
260
1927
019
280
1929
0
2577
025
760
2575
0
1311
013
120
2574
0
1313
013
140
1315
013
160
1016
010
170
1018
0
1120
0
1019
0
1121
0
1020
0
1122
0
1021
0
1123
011
240
1022
0
3158
0
1125
0
1023
0
3159
0
1126
0
1024
0
3160
031
610
3162
031
630
3164
031
650
3166
031
670
4149
041
500
4151
041
520
4153
041
540
4155
041
560
4157
0
3168
031
690
3170
031
710
3172
0
2930
2950
2960
2970
2980
2990
3000
3010
3020
3990
4000
4010
4020
4030
3030
3040
3050
3060
3070
3080
3090
3964
039
650
3966
039
670
3968
039
690
3970
0
7620
7610
7600
7590
7580
7570
7560
7550
7540
4397
043
980
4399
044
000
1863
0
4401
044
020
1864
018
650
4403
044
040
1866
018
670
4405
0
1868
0
4406
0
1869
018
700
1871
0
3021
030
220
3023
030
240
3025
030
260
3027
030
280
4407
0
1077
0
4408
044
090
1076
010
750
4410
044
110
1074
0
4412
0
1073
010
720
4413
044
140
1071
010
700
4415
044
160
4417
044
180
4419
0
4726
047
270
4728
047
290
4730
047
310
4732
047
330
4734
047
350
4489
044
880
4487
044
860
4485
044
840
4483
044
820
4481
0
3324
033
230
3322
0
3935
0
3321
033
200
3934
039
330
3319
0
3932
0
3318
0
3931
0
3317
0
3930
0
1819
0
3929
039
280
1818
0
4736
047
370
1817
0
3927
0
1816
0
4738
047
390
1815
018
140
4740
0
1813
0
4509
045
080
4507
045
060
4505
045
040
4503
045
010
3117
031
180
3119
031
200
3121
031
220
3123
031
240
3125
0
2125
021
260
4500
0
2127
021
280
4499
044
980
2129
0
4497
0
2130
0
4496
0
2131
0
4495
0
2132
021
330
4494
0
6900
6910
6920
6930
6940
6950
6960
6970
6980
1256
012
570
1258
012
590
1260
012
610
1262
012
630
1220
1230
1240
1250
1260
1270
1280
1290
1300
1310
4306
043
050
2694
026
930
1320
1684
0
2692
0
1330
1685
0
2691
026
900
1686
0
1340
2689
0
1687
0
1350
2688
0
1688
0
1360
2687
0
1370
1689
016
900
1380
1691
0
4717
047
180
4719
047
200
4721
047
220
4723
047
240
4725
0
3817
038
180
3819
038
200
3821
038
220
3823
038
240
3825
038
260
1058
0
3325
033
260
1057
010
560
3327
0
1055
0
3328
0
1054
0
3329
0
1053
0
3330
033
310
1052
0
3332
0
1051
010
500
4597
0
3827
038
280
4596
045
950
3829
0
4594
045
930
3276
032
750
3274
032
730
3272
032
710
3270
032
690
3268
032
670
4180
041
790
3266
0
4178
041
770
3265
0
3264
0
4176
041
750
4174
041
730
4172
041
710
1157
011
560
1155
011
540
1153
011
520
1151
0
3148
0
1150
0
3147
0
1149
011
480
3146
031
450
3144
031
430
3142
0
4170
0
1273
012
720
1271
012
700
1269
012
680
1267
0
1147
0
2390
023
910
2392
023
930
2394
023
950
1062
010
610
2396
0
1060
0
2397
023
980
1059
0
2399
024
000
2401
024
020
2403
024
040
2405
024
060
2407
024
080
2409
0
7680
4548
045
470
4546
045
450
4544
045
430
4542
045
410
4540
0
4461
044
620
4463
044
640
4465
0
3683
0
4466
0
3684
036
850
4467
044
680
3687
0
4469
0
3688
0
4470
0
3689
036
900
1417
014
180
3691
0
1419
0
4471
044
720
4473
044
740
4475
044
760
4477
044
780
4672
0
4480
044
790
4010
040
110
7690
7700
7710
7720
7730
7740
7750
7760
7770
3470
3460
3450
3440
3430
3420
3410
3400
3390
3380
3370
3360
3350
3340
3330
3320
2811
0
3310
3300
2810
028
090
3290
2808
0
3280
2807
028
060
2805
0
1930
0
2804
0
1931
0
2803
0
1932
019
330
1934
019
350
1936
019
370
1938
019
390
3841
038
400
3839
038
380
3837
038
360
3835
038
340
3833
038
320
3270
1532
015
310
1530
0
1530
0
1529
015
280
1527
015
260
1940
019
410
2327
023
260
2325
023
240
2323
0
3831
038
300
2322
023
210
2320
023
190
5430
2318
0
5420
5410
1726
0
5400
1727
0
5390
1728
0
5380
5370
1730
0
5360
1731
017
320
1733
0
2317
023
160
2315
023
140
2313
0
4620
4610
4600
4590
4580
2058
020
570
2988
029
890
2990
029
910
2289
022
900
2291
022
920
2293
022
940
2295
0
2030
2535
025
340
2040
2533
0
2050
2532
0
2060
1590
0
4121
0
1589
0
4081
0
4120
041
190
4082
0
1588
015
870
4083
0
4118
041
170
1586
0
4084
0
4116
0
1585
0
4085
040
860
1584
0
4115
0
4087
0
1583
0
4114
0
4114
0
1582
0
4763
0
4088
040
890
4764
0
1581
0
4765
047
660
4767
047
680
4769
0
4769
047
700
2051
0
2760
2750
2052
020
530
2740
2730
2054
0
2498
024
970
2055
0
2720
2056
0
2710
2496
0
2700
2495
0
2690
2494
024
930
2492
024
910
2490
0
1580
015
790
2489
0
1578
015
770
1576
0
2488
024
870
2486
024
850
2484
024
830
2482
024
810
2480
024
790
3062
030
610
3060
030
590
3058
030
570
3056
0
4300
0
2478
0
2280
022
810
2282
022
830
1598
0
1598
0
2284
0
1597
0
2285
0
1596
0
2286
0
1594
0
2287
0
6550
1593
0
2288
0
6560
1592
0
6570
1591
0
6580
6590
6600
6610
1538
015
370
1536
015
350
1534
015
330
1470
014
690
1468
014
670
1439
014
380
1437
014
360
1435
014
340
1433
014
320
3560
0
4716
0
3559
035
580
3557
035
560
3555
035
540
3553
0
4682
046
830
4684
046
850
4685
0
2296
022
970
2298
022
990
2300
023
010
2303
0
1640
016
390
1638
016
370
1636
016
350
1634
016
330
1632
016
310
1630
016
290
1628
0
4299
042
980
1627
016
260
4296
0
1625
0
4295
042
940
1623
0
4293
042
920
4291
0
2507
025
060
2505
025
040
2503
025
020
2501
025
000
2499
0
2846
028
470
2848
028
490
2850
028
510
4880
2852
028
530
4890
4900
4910
4920
4930
4940
4950
4960
1724
017
250
2854
028
550
2856
028
570
1040
1050
1060
1070
1080
1090
1100
1110
1120
3333
033
340
3335
0
1599
0
3336
0
1600
0
3337
0
1601
0
3338
0
1602
0
3339
0
1603
0
3340
0
1604
0
3341
0
1605
0
3342
0
1606
016
070
2114
021
130
2112
0
2111
021
100
2109
021
080
2107
021
060
2105
0
1049
0
3343
0
1048
0
3344
0
1047
0
3345
033
460
3347
033
480
3349
033
500
3351
033
520
1900
019
010
1902
0
2104
0
1903
0
2103
0
1904
0
2102
0
1905
0
2101
0
2840
1906
0
2100
020
990
2850
1907
019
080
2860
2098
0
2870
2097
0
2880
2096
020
950
2890
2910
3353
033
540
3355
0
2673
0
3356
033
570
2674
026
750
3358
0
2676
0
3359
033
600
2677
026
780
3361
033
620
3847
038
460
3845
038
440
2094
0
3843
038
420
3363
033
640
3367
0
4229
042
300
4231
042
320
4233
042
340
4235
042
360
4237
0
1659
016
580
1657
016
560
1655
016
540
1653
0
1452
014
510
1450
014
490
1448
014
470
1446
014
450
1444
014
430
1025
010
260
1027
0
3260
3250
3240
4238
042
390
4240
042
410
4243
042
440
4245
042
460
4247
0
1442
014
410
1440
0
1942
019
430
1944
019
450
1946
019
470
4248
0
1948
0
4249
0
1949
0
4250
0
4970
4980
4990
5000
5010
5020
5030
5040
5050
5060
1203
0
1203
012
040
1205
012
060
1207
0
5070
1208
0
5080
1209
012
100
5090
1211
0
4840
4830
4820
4810
4800
4790
4780
4770
4592
045
910
4590
045
890
4588
045
870
4586
0
4760
4585
0
4750
4740
4584
045
830
4730
4720
4710
4700
4690
4680
4670
3944
039
450
4582
0
3946
039
470
4581
045
800
3948
039
490
4579
045
780
3950
039
510
4577
045
760
4660
3952
039
530
4575
0
4650
4640
4574
045
730
4630
3954
039
550
3956
0
4572
0
3957
0
4571
045
700
3958
039
590
3962
039
630
1341
013
400
1339
013
380
1337
013
360
1335
013
340
1745
017
460
3752
037
530
1747
017
480
3754
0
1749
0
3755
0
1750
0
3756
0
1751
0
3757
0
1752
0
3758
0
1753
0
3759
0
3089
0
5200
3090
0
5210
3091
0
5220
3092
030
930
5230
3094
0
5240
3095
0
5250
5260
3096
030
970
5270
5280
3098
0
5290
1293
012
920
1291
012
900
1289
012
880
1287
0
2469
0
3099
0
5300
3100
0
2468
024
670
3101
0
5310
5320
3102
0
2466
0
3103
0
5330
2465
0
5340
2464
024
630
2462
024
610
2256
022
550
2254
022
530
2252
022
510
1652
0
3900
1651
0
3910
1650
0
3920
1649
0
3930
3940
3950
3960
3970
3980
4493
044
920
4491
044
900
2819
028
180
2817
028
160
2815
028
140
2813
028
120
4557
045
580
4559
045
600
4561
045
620
4563
045
640
4565
045
660
4625
046
240
4623
046
220
4621
046
200
4619
046
180
4617
046
160
4567
045
680
4569
0
4615
046
140
4613
046
110
4610
0
4609
0
4609
0
4608
046
070
4606
0
3316
033
150
3313
033
120
3311
033
100
3722
0
3309
0
3723
0
3308
0
3724
037
250
3307
0
3726
0
4327
0
3727
037
280
3728
0
4328
0432
90
3729
037
300
4331
043
320
4333
043
340
4335
042
650
4264
0
2336
0
2336
0
4605
0
2337
0
4604
0
2338
0
4603
0
2339
0
3477
0
4602
0
2340
0
4601
0
2341
0
3476
0
4600
0
2342
0
3475
034
740
4599
0
2343
0
4598
0
2344
0
3473
034
720
3306
0
3471
0
3305
0
3470
0
3304
033
030
3302
033
010
1399
014
000
1401
014
020
1403
014
040
1405
014
060
1407
014
080
2722
027
210
2720
027
190
2718
0
4420
044
210
4422
0
1494
0
4423
0
1495
0
4424
044
250
1496
0
4426
044
270
4428
0
4160
4150
4140
2893
0
4130
2894
0
4120
2895
0
4110
2896
028
970
4100
4090
2898
028
990
4080
2900
029
010
7780
4024
040
250
4026
0
3926
0
4027
0
3925
0
4028
0
3924
0
4029
0
1175
0
3923
0
4030
0
1174
0
3922
0
1173
0
4031
0
3921
0
1170
0
4032
0
3920
0
1282
0
3919
0
1169
0
1281
0
1168
0
3918
0
1280
0
1167
0
3917
0
1279
0
1172
0
1278
0
1171
0
1277
0
1130
1276
0
1140
1275
0
1150
1274
0
1170
1180
1190
1200
1210
1480
1490
1500
1510
1520
1530
1540
1550
1560
1570
3916
039
150
3914
0
1516
015
150
1514
015
130
1580
1590
1600
4055
040
540
1788
0
4053
0
1787
0
4052
0
1786
0
4051
0
1785
0
4050
0
1784
017
830
4049
040
480
1782
0
4047
0
2514
0
4046
0
2266
022
670
2268
022
690
2270
022
710
2272
022
730
2274
0
4045
040
440
4043
0
2515
0
1846
018
450
1844
018
420
1812
018
410
1811
018
400
2516
0
2869
028
680
2987
0
2867
028
660
2986
029
850
2865
028
640
2984
0
2863
0
2983
0
2862
0
2982
029
810
2093
020
920
2091
020
900
2089
020
880
2087
020
860
2085
020
840
2517
0
2124
021
230
2122
0
2083
020
820
2081
0
4641
0
2080
0
4642
0
2079
0
4643
0
2078
020
770
4644
046
450
2076
020
750
4646
0
2074
0
4647
0
2518
0
2073
020
720
2071
0
2845
028
440
2070
020
690
2843
0
2068
0
2842
028
410
2840
028
390
4379
043
780
4377
043
760
4375
043
740
4373
043
720
4371
043
700
4072
040
710
4070
0
2509
0
4069
040
680
4067
040
660
4065
040
640
3426
034
270
3428
034
290
3430
034
310
3432
034
330
4369
043
680
3434
034
350
4367
043
660
4365
043
640
4363
043
620
4361
043
600
1302
013
010
1300
012
990
1298
012
970
1296
0
3436
0
1295
012
940
3437
0
4359
043
580
4357
0
2275
022
760
2277
022
780
2279
0
1547
015
480
1549
015
500
1551
015
520
1553
015
540
4158
041
590
4160
041
610
4162
041
630
4164
041
650
4166
041
670
2634
026
350
2636
026
370
2638
0
1466
0
2639
0
1465
0
2640
0
1464
0
2641
0
1463
0
2642
0
1462
014
610
1460
0
4168
0
1459
0
4169
0
1458
0
2802
028
010
2800
027
990
1660
016
610
4570
1662
016
630
4560
4550
1664
0
4540
1665
0
1754
0
3760
0
4530
1666
0
1755
0
3761
037
620
4520
1667
0
1756
017
570
4510
1668
0
3763
0
1758
0
4500
3764
037
650
1759
017
600
3766
037
670
3768
037
690
3770
037
710
3772
0
5550
3773
0
5540
3774
0
5530
3775
0
5520
3776
0
5510
3777
0
5500
5490
5480
5470
5460
5450
5440
5790
5780
5770
3230
5760
5750
3220
3210
3200
3190
3180
3170
3160
3150
3140
3130
3120
3110
3100
4270
4260
4250
4240
3116
0
4290
3115
0
4230
4220
3114
031
130
3112
031
110
3110
0
2470
024
710
2472
024
730
2474
024
750
2476
0
2477
0
2310
2320
2330
2340
2350
2360
1810
2370
2380
1800
2390
1790
1780
2400
1770
1760
1750
1740
2035
020
340
2033
020
320
2031
020
300
2029
0
2029
0
2028
020
270
2026
0
3296
0
2410
3295
0
2224
022
230
2420
3294
0
2430
3293
0
2221
0
2440
3292
0
2450
3291
0
2220
0
2460
3290
0
2219
022
180
2470
3289
0
2217
0
3288
0
2216
0
3287
0
2215
0
2025
020
240
2023
020
220
1069
0
2021
0
1068
0
2020
0
1067
010
660
1065
010
640
1063
0
3286
0
2214
0
3285
0
2067
0
3284
0
2066
0
3283
0
2065
020
640
2063
020
620
2061
020
600
2059
0
1375
013
760
1377
013
780
1379
0
1380
0
1184
011
830
1182
011
810
1180
011
790
1178
0
4633
0
1177
0
4634
0
1176
0
4635
046
360
4637
046
380
4639
046
400
2042
020
410
2040
020
390
2038
020
370
2036
0
1244
012
450
1246
012
470
1248
012
490
1250
012
510
1252
0
2881
028
820
6420
2883
028
840
6430
3608
036
090
6440
2885
0
6450
2886
0
3610
036
110
2887
0
3612
0
2888
0
3613
0
2889
0
4648
0
5740
2890
0
4649
0
5730
4650
0
3616
0
5720
4651
0
5710
5700
4652
046
530
5690
4654
0
5680
4655
0
5670
4656
0
4017
0
1046
010
450
1044
010
430
1042
0
2891
028
920
1041
010
400
6820
6810
6800
6790
6780
6770
6760
6750
1555
015
560
1557
015
580
1559
015
600
1561
015
620
1563
015
640
1347
013
480
1349
013
500
1351
013
520
1353
013
540
1355
0
2823
028
220
2821
028
200
1565
015
660
1567
015
680
1569
015
700
1571
015
720
1573
015
740
1575
0
1940
1950
1960
1970
1980
1990
2000
2010
2020
1031
010
300
1029
0
1028
0
1028
0
2664
026
650
2666
026
670
2668
026
690
2670
026
710
2672
0
3080
030
810
3082
030
830
3084
030
860
3087
030
880
1366
013
670
1368
0
9960
1369
0
9970
1370
0
9980
1371
013
720
9990
1000
0
1373
013
740
1001
010
020
1003
010
040
1005
0
2178
021
790
2180
021
810
2182
021
830
2184
021
850
2186
021
870
1006
010
070
7470
1008
0
7460
1009
010
100
1011
010
120
1013
0
1242
0
1014
0
1243
0
1015
0
2188
021
890
2190
0
4760
0
2191
0
4759
0
2192
0
4758
0
2193
0
4757
0
2194
0
2194
0
4756
0
2195
021
960
4755
047
540
2197
0
4753
047
520
2019
0
4751
0
2018
020
170
2016
020
150
2014
020
130
2012
020
110
2198
0219
9022
000
4750
0
2201
0
4749
0
2202
0
4748
047
470
4746
047
450
4744
047
430
4742
047
410
1708
0
3647
036
480
3649
036
500
3653
036
540
3655
036
560
1385
013
840
6460
1383
0
6470
1382
013
810
6480
6490
4673
0
6500
4674
0
6510
4675
0
6520
4676
0
3657
0
6530
4677
0
3658
0
6540
4678
0
3659
0
4679
046
800
4681
0
1093
010
920
1091
010
900
1089
010
880
1087
010
860
1085
0
1862
018
610
1860
018
590
1858
018
570
1856
0
4135
041
340
4133
041
320
4131
041
300
4018
0
4129
041
280
4127
041
260
4125
0
2608
0
4124
0
2607
0
4123
0
2606
0
4122
0
2605
026
040
2603
026
020
2601
026
000
4715
047
140
4713
047
120
4711
047
090
4708
0
2599
025
980
2597
025
960
2595
025
940
2593
0
3571
035
700
3569
035
680
3567
035
660
3565
035
630
3562
035
610
3186
031
850
3184
031
830
3182
0
3564
0
3181
031
800
3179
031
780
3177
031
760
3175
0
7330
7340
3174
0
7350
3173
0
7360
7370
7380
7390
7400
7410
7420
1763
017
620
1761
0
7430
7440
7450
4080
040
790
4078
040
770
4076
040
750
4074
040
730
4201
042
000
4199
041
980
4197
041
960
3731
0
4194
0
3732
0
4193
0
3733
0
4192
0
3734
0
4191
0
3735
037
360
3737
037
380
3739
037
400
1233
012
340
1235
012
360
1237
012
380
1239
012
400
1241
0
4190
041
890
4188
041
870
4186
0
3741
0
4195
0
3742
0
1253
0
3743
0
1254
0
3744
0
1255
0
3745
037
460
3747
037
4803
7490
3750
0
3987
039
880
3989
039
900
3991
039
920
3993
0
3751
0
5100
1803
018
020
1801
018
000
1799
017
980
1797
017
960
1795
017
940
1497
014
980
1499
015
000
1501
015
020
1503
015
040
1505
0
3016
030
150
3014
030
130
3012
030
110
3010
030
090
3008
030
070
1839
018
380
1837
018
360
1835
018
340
1833
018
320
1830
0
2531
025
300
2529
025
280
2527
025
260
2525
0
3006
0
2524
0
3005
0
2523
025
220
2592
025
910
2590
0
1829
0
2589
0
1828
018
270
2588
0
1826
018
250
1824
0
7670
1823
018
220
7660
4336
0
7650
1821
0
4337
0
7640
1820
0
4338
043
390
4340
043
410
4342
043
430
2521
0
4185
0
4344
0
2520
0
4345
0
4184
0
2519
0
4183
0
4183
0
4182
041
810
7140
7150
7160
7170
7180
7190
7200
7210
7220
7220
7230
4346
043
470
1716
0
4348
0
1717
0
4349
0
1718
0
4350
0
1719
0
4351
0
1720
0
4352
0
1721
0
4353
0
1722
0
4354
0
1723
0
4355
0
1950
0
1951
019
520
1953
019
540
4023
0
1955
019
560
1957
019
580
1959
0
1431
0
4356
0
1430
014
290
1428
014
270
1426
014
250
1424
014
230
1960
019
610
1962
019
630
1964
019
650
1966
0
4016
0
4263
042
620
4261
042
600
4259
042
580
4257
042
560
4255
042
540
1146
011
450
1144
011
430
1142
0
1141
0
4253
0
1140
0
4252
0
1139
0
4251
0
3141
031
400
3139
031
380
3137
031
360
3135
031
340
3133
0
2205
022
040
2203
0
3807
038
060
3805
038
040
3802
038
010
3800
0
4142
041
430
4144
041
450
1326
0
4146
0
1325
0
4147
0
1324
0
4148
0
1323
013
220
1321
013
200
1319
013
180
1317
0
3550
3540
3530
3520
3510
3500
3480
1457
014
560
1455
0
3552
0
1454
014
530
3551
035
500
5110
3549
035
480
5120
5130
3547
035
460
5140
5150
3545
0
5160
3544
0
5170
3543
0
5180
5190
4219
042
200
4221
042
220
4223
042
240
4225
042
260
4227
042
280
3542
035
410
3540
035
390
3538
035
370
3536
035
350
3534
035
330
3994
039
950
3996
0
2626
0
3997
0
2625
0
3998
0
2624
0
3999
0
2623
0
4000
0
2622
0
4001
0
2621
0
4002
0
2620
0
4003
0
2619
026
180
2617
0
1103
011
040
1105
011
060
1107
0
3617
0
1108
011
380
1109
011
100
1137
011
360
1135
011
340
1133
011
320
1131
0
4004
0
1130
0
4005
0
1129
0
4006
040
070
4008
040
090
2410
024
110
2412
0
2412
024
130
2414
024
150
2416
024
170
2418
0
2345
023
460
1128
0
2347
023
480
1127
0
2349
023
500
2351
023
520
2353
023
540
2355
0
1484
0
2356
0
1483
0
2357
0
1482
0
2358
0
1481
0
2359
0
1480
0
2360
0
1479
0
2361
0
1478
0
2362
0
1477
0
1855
018
540
1853
018
520
1851
018
500
1849
018
480
1847
0
3149
0
2705
0
3150
0
2706
027
070
3151
0
2708
0
3152
031
530
2709
027
100
3154
0
3254
0
2711
0
3155
0
3255
0
3156
0
2712
0
3256
0
2713
0
3157
0
3257
032
580
3469
0
3259
032
600
3468
0
3261
0
3467
034
660
3262
032
630
2714
027
150
2716
027
170
4316
0
2695
0
4317
043
180
4319
043
200
4321
043
220
4323
043
240
4325
0
1648
016
470
5660
1646
0
5650
1645
0
5640
1644
016
430
1642
016
410
3238
032
370
3236
032
350
3234
032
330
3232
0
2643
0
3231
0
2644
0
3230
0
2645
026
460
3229
0
2647
0
4326
0
2648
026
490
2650
026
510
2652
0
2257
022
590
4657
046
580
2260
022
610
4659
046
600
2262
0
4661
0
2263
0
4662
0
2264
022
650
3228
032
270
3226
032
250
2653
026
540
2655
026
560
2657
026
580
1266
0
2659
026
600
1265
0
2861
0
2661
0
2860
0
2662
0
2859
028
580
3670
0
2169
0
3671
036
720
2171
0
3673
0
2172
0
3674
0
2173
0
3675
0
2174
0
3676
0
2175
0
3677
036
780
2177
0
3679
0
2663
0
7060
7070
7080
7090
7100
7110
7120
7130
2980
029
790
2419
0
2978
0
2420
0
2977
029
760
2421
0
3680
0
2975
0
2422
0
3681
0
2974
0
2423
0
3682
0
2973
0
2424
024
250
2972
029
710
2426
024
270
2428
0
2970
029
690
2968
029
670
2966
029
650
2964
029
630
2962
029
610
3523
035
210
3520
035
190
3518
035
170
3516
035
150
3514
035
130
3512
035
110
3510
035
090
3508
035
070
1525
0
4663
046
640
4665
046
660
4667
046
680
4669
046
700
4671
0
1546
0
3456
0
1545
0
3455
0
1544
0
3454
0
1543
015
420
3453
0
1541
0
3452
0
1540
0
3451
0
1539
0
3450
034
490
3448
034
470
2121
021
200
2119
021
180
2117
021
160
2115
0
3446
034
450
3444
034
430
3442
034
410
3440
034
390
3438
0
5630
5620
5610
5600
5590
5580
5570
5560
3599
035
980
3597
035
960
3595
035
940
3593
035
920
3591
035
900
3195
031
940
3193
031
920
3191
031
900
3189
031
880
3187
0
3187
0
1910
3589
035
880
3587
035
860
3585
035
840
3583
0
3582
0
3582
0
3581
035
800
3245
032
440
3243
032
420
3241
032
400
3239
0
2870
028
710
3579
0
2872
0
3578
0
3478
0
3605
0
3479
034
800
3481
034
820
3483
034
840
3485
034
860
2508
0
3637
036
360
3635
036
340
3633
036
320
3631
036
300
1194
011
950
1196
011
970
1198
011
990
6210
1200
0
6200
1201
0
6190
1202
0
6180
6170
3650
6160
6150
3670
3680
3690
3971
039
720
3710
3973
0
3720
3974
0
3730
3975
0
3740
3976
039
770
3978
039
790
2510
0
3750
3760
3126
031
270
3770
3128
0
3780
3129
0
3800
3130
031
310
1890
0
3132
0
1891
018
9201
8930
1894
018
950
1896
018
970
4452
044
530
1898
0
4454
0
1899
0
4455
044
560
4457
0
2511
0
4458
044
590
4460
0
2437
024
390
2440
024
410
2442
024
430
2444
024
450
2446
0
2512
0
2447
024
480
2449
024
500
2451
0
1391
013
920
1393
013
940
1395
0
4063
0
1396
0
4062
0
1397
0
4061
0
1398
0
4060
040
590
4058
040
570
4056
0
2513
0
9870
9880
9890
9900
9910
9920
9930
9940
9950
2616
026
150
2614
026
130
2612
026
110
2610
0
2213
022
120
2010
020
090
2210
0
2008
0
2209
0
2007
0
2208
0
2006
0
2207
022
060
2005
020
040
2003
020
020
2001
0
3532
035
310
3530
035
290
3528
035
270
3526
0
1820
1830
3525
0
1840
3524
0
1850
18601
870
1880
1890
1900
Unk
now
nU
ncla
ssifi
edO
ther
cat
egor
ies
Con
serv
ed h
ypot
hetic
alT
rans
port
and
bin
ding
pro
tein
sP
rote
in fa
te/p
rote
in s
ynth
esis
Sec
onda
ry m
etab
olis
mR
egul
ator
y fu
nctio
ns/s
igna
l tra
nsdu
ctio
nN
ucle
osid
es, a
nd n
ucle
otid
es
Tra
nscr
iptio
n
Env
ironm
enta
l res
pons
e/pa
thog
en r
espo
nse
Fat
ty a
cid
and
phos
phol
ipid
met
abol
ism
Ene
rgy
met
abol
ism
DN
A m
etab
olis
m
Gro
wth
and
dev
elop
men
tC
ellu
lar
stru
ctur
e, o
rgan
izat
ion
and
biog
enes
isC
ellu
lar
proc
esse
sC
ofac
tors
, pro
sthe
tic g
roup
s, a
nd c
arrie
rsA
min
o ac
id b
iosy
nthe
sis
Cen
tral
inte
rmed
iary
met
abol
ism