whole-genome characteristics and polymorphic analysis of...

12
Research Article Whole-Genome Characteristics and Polymorphic Analysis of Vietnamese Rice Landraces as a Comprehensive Information Resource for Marker-Assisted Selection Hien Trinh, 1 Khoa Truong Nguyen, 2 Lam Van Nguyen, 1 Huy Quang Pham, 1 Can Thu Huong, 3 Tran Dang Xuan, 3 La Hoang Anh, 3 Mario Caccamo, 4 Sarah Ayling, 4 Nguyen Thuy Diep, 2 Cuong Nguyen, 1 Khuat Huu Trung, 2 and Tran Dang Khanh 2 1 Laboratory of Bioinformatics, Institute of Biotechnology, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet, Cau Giay, Hanoi, Vietnam 2 Department of Genetic Engineering, Agricultural Genetics Institute, Vietnam Academy of Agricultural Sciences, Km2 Pham Van Dong, Tuliem, Hanoi, Vietnam 3 Graduate School for International Development and Cooperation, Hiroshima University, Hiroshima 739-8529, Japan 4 Genetics and Breeding, National Institute of Agricultural Botany, Cambridge, UK Correspondence should be addressed to Cuong Nguyen; [email protected] and Tran Dang Khanh; [email protected] Received 24 July 2016; Revised 21 November 2016; Accepted 20 December 2016; Published 7 February 2017 Academic Editor: Hieu Xuan Cao Copyright © 2017 Hien Trinh et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Next generation sequencing technologies have provided numerous opportunities for application in the study of whole plant genomes. In this study, we present the sequencing and bioinformatic analyses of five typical rice landraces including three indica and two japonica with potential blast resistance. A total of 688.4 million 100bp paired-end reads have yielded approximately 30- fold coverage to compare with the Nipponbare reference genome. Among them, a small number of reads were mapped to both chromosomes and organellar genomes. Over two million and eight hundred thousand single nucleotide polymorphisms (SNPs) and insertions and deletions (InDels) in indica and japonica lines have been determined, which potentially have significant impacts on multiple transcripts of genes. SNP deserts, contiguous SNP-low regions, were found on chromosomes 1, 4, and 5 of all genomes of rice examined. Based on the distribution of SNPs per 100 kilobase pairs, the phylogenetic relationships among the landraces have been constructed. is is the first step towards revealing several salient features of rice genomes in Vietnam and providing significant information resources to further marker-assisted selection (MAS) in rice breeding programs. 1. Introduction Whole-genome sequencing (WGS) has revealed genetic information and genome structure and facilitated the iden- tification of gene function of different plant species including some major crops such as rice, wheat, tomato, and soybean [1–5]. Next generation sequencing (NGS) methods are rapid and cost-effective, providing many promising applications for plant genomics and affording further insight into massive genomic variations [6, 7]. Combining NGS with bioinformat- ics is a powerful approach to detect DNA polymorphisms for quantitative trait loci (QTLs) analyses, marker-assisted selection, genome-wide association studies (GWAS), and linkage disequilibrium analysis in plants [8–11]. Moreover, DNA polymorphisms have been widely applied as DNA markers in genetic crop research [12]. Rice (Oryza sativa L.) is a staple crop and provides daily food for over half of the world’s population. It contains two major groups, indica and japonica, which diverged more than one million years ago [13]. e recent sequencing of one representative from each group facilitated the study of the genomic structure of rice. In 2002, the first draſt sequence of the indica genome was established by shotgun sequenc- ing and homologous genes were predicted by comparison Hindawi International Journal of Genomics Volume 2017, Article ID 9272363, 11 pages https://doi.org/10.1155/2017/9272363

Upload: others

Post on 14-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Whole-Genome Characteristics and Polymorphic Analysis of ...Thewholegenomesoffivericelandraceshaveyielded high-quality reads, from 112 to 161 million reads per line. Most of the reads

Research ArticleWhole-Genome Characteristics and Polymorphic Analysis ofVietnamese Rice Landraces as a Comprehensive InformationResource for Marker-Assisted Selection

Hien Trinh1 Khoa Truong Nguyen2 Lam Van Nguyen1 Huy Quang Pham1

Can Thu Huong3 Tran Dang Xuan3 La Hoang Anh3 Mario Caccamo4 Sarah Ayling4

Nguyen Thuy Diep2 Cuong Nguyen1 Khuat Huu Trung2 and Tran Dang Khanh2

1Laboratory of Bioinformatics Institute of Biotechnology Vietnam Academy of Science and Technology18 Hoang Quoc Viet Cau Giay Hanoi Vietnam2Department of Genetic Engineering Agricultural Genetics Institute Vietnam Academy of Agricultural SciencesKm2 Pham Van Dong Tuliem Hanoi Vietnam3Graduate School for International Development and Cooperation Hiroshima University Hiroshima 739-8529 Japan4Genetics and Breeding National Institute of Agricultural Botany Cambridge UK

Correspondence should be addressed to Cuong Nguyen cuongnguyenibtacvn andTran Dang Khanh khanhkonkukgmailcom

Received 24 July 2016 Revised 21 November 2016 Accepted 20 December 2016 Published 7 February 2017

Academic Editor Hieu Xuan Cao

Copyright copy 2017 Hien Trinh et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Next generation sequencing technologies have provided numerous opportunities for application in the study of whole plantgenomes In this study we present the sequencing and bioinformatic analyses of five typical rice landraces including three indicaand two japonica with potential blast resistance A total of 6884 million 100 bp paired-end reads have yielded approximately 30-fold coverage to compare with the Nipponbare reference genome Among them a small number of reads were mapped to bothchromosomes and organellar genomes Over two million and eight hundred thousand single nucleotide polymorphisms (SNPs)and insertions and deletions (InDels) in indica and japonica lines have been determined which potentially have significant impactson multiple transcripts of genes SNP deserts contiguous SNP-low regions were found on chromosomes 1 4 and 5 of all genomesof rice examined Based on the distribution of SNPs per 100 kilobase pairs the phylogenetic relationships among the landraceshave been constructed This is the first step towards revealing several salient features of rice genomes in Vietnam and providingsignificant information resources to further marker-assisted selection (MAS) in rice breeding programs

1 Introduction

Whole-genome sequencing (WGS) has revealed geneticinformation and genome structure and facilitated the iden-tification of gene function of different plant species includingsome major crops such as rice wheat tomato and soybean[1ndash5] Next generation sequencing (NGS) methods are rapidand cost-effective providingmany promising applications forplant genomics and affording further insight into massivegenomic variations [6 7] CombiningNGSwith bioinformat-ics is a powerful approach to detect DNA polymorphismsfor quantitative trait loci (QTLs) analyses marker-assisted

selection genome-wide association studies (GWAS) andlinkage disequilibrium analysis in plants [8ndash11] MoreoverDNA polymorphisms have been widely applied as DNAmarkers in genetic crop research [12]

Rice (Oryza sativa L) is a staple crop and provides dailyfood for over half of the worldrsquos population It contains twomajor groups indica and japonica which divergedmore thanone million years ago [13] The recent sequencing of onerepresentative from each group facilitated the study of thegenomic structure of rice In 2002 the first draft sequenceof the indica genome was established by shotgun sequenc-ing and homologous genes were predicted by comparison

HindawiInternational Journal of GenomicsVolume 2017 Article ID 9272363 11 pageshttpsdoiorg10115520179272363

2 International Journal of Genomics

Table 1 Abbreviation list of Vietnamese rice genomes used in the study

Abbreviation Name of rice landraces(name in Vietnamese) Subspecies Origin

indica 12 Chiem nho Bac Ninh 2 indica Bac Ninh provinceindica 13 Nep lun indica Ha Giang provinceindica 15 OM6377 indica CanTho provincejaponica 11 Bletelo japonica Lang Son provincejaponica 14 Khau mac buoc japonica Nghe An province

with the Arabidopsis thaliana genome by Yu et al [14]Subsequently the complete genome sequence of japonicawas generated in 2005 [15] and then updated using NGSand optical mapping in 2013 [1] This genome version hasbeen widely utilized as the reference rice genome for DNApolymorphism findings reported in some previous studies[16 17]

Asian rice is one of the major worldwide cereal cropsAmongAsian countries Vietnam is one of theworldrsquos leadingrice exporters accounting for 16 of the world trade volumeof rice [18] To the best of our knowledge the lack of whole-genome sequencing has raised concerns in relation to thecharacteristics of Vietnamese rice landraces therefore theobjective of the current workwas to perform genome analysisof five rice landraces three indica and two japonica collectedin different ecological areas of Vietnam by applying Illu-minarsquos paired-end sequencingThe generated reads were thenmapped to the Nipponbare reference sequence to analyzethe genomic features and discover and annotate candidatesingle nucleotide polymorphisms (SNPs) and insertions anddeletions (InDels) The results presented may help to unravelthe genetic basis of our rice genomes and polymorphicresources for molecular marker identification in the future

2 Materials and Methods

21 Plant Materials and Whole-Genome Sequencing Thefive Vietnamese rice landraces were collected from differ-ent ecological areas in Vietnam Ha Giang (104∘5810158405110158401015840E221199004910158400010158401015840N) and Lang Son (106∘4510158404010158401015840E 21∘5110158401410158401015840N) inthe northeast Bac Ninh (106∘0410158402410158401015840E 21∘1110158401510158401015840N) in theRed River Delta Nghe An (104∘5810158403810158401015840E 19∘1010158403510158401015840N) in theNorthCentral Coast andCanTho (105∘4710158400310158401015840E 10∘0110158405710158401015840N)in the Mekong River Delta For convenience the genomeswere labeled (Table 1 Table S1 in the Supplementary Materialavailable online at httpsdoiorg10115520179272363) asindicated in the list of Vietnamese native rice landracesaccording to the report of Trung and Ham [19] Total DNAof each landrace was extracted from young leaf tissue usingQiagen DNeasy kit (Qiagen Germany) The library prepa-ration and sequencing of the rice genomes were carried outusing Illumina HiSeq 2000 by applying Illumina pipeline 19at the Genome Analysis Centre (TGAC) UK The obtainedFASTQ files were further assessed by FastQC softwareand deposited in the NCBI sequence read archive (SRA)with accession numbers SRP064171 (indica 12 SRR2529343

indica 13 SRR2543299 indica 15 SRR2543338 japonica 11SRR2543336 and japonica 14 SRR2543337)

22 Mapping and Identification of SNPs and InDels Thepaired-end reads were first aligned with the nuclear referencegenome (O sativa L cv Nipponbare MSU release 70GenBank accession PRJDB1747) and the organellar genomes(chloroplast genome GenBank accessionNC 0013201 mito-chondria genome GenBank accession BA000293) usingthe alignment software BWA (version 062) with defaultparameters The mapping quality was assessed by Qualimap(version 21) and BEDtools The duplicated sequences weremarked and removed by Picard tools (version 179) SNPsand InDels were then called and qualified by SAMtools(version 11) and VarScan (version 237) with the followingparameters mapping quality of 20 depth of coverage of10 average of base quality of 30 and variant frequencyof 01 with SNPs and 03 with InDels The distribution ofSNPs and InDels per 100 kb along the chromosome wasused to determine SNP-poor regions (lt1 SNP1 kb) (FigureS1) Pearson correlation coefficients of SNP density amongthe landraces were calculated using R The common readsmapped to both chromosomes and organelles were removedfrom BAM files using SAMtools (version 11) and BWA(version 062)

23 Annotation of SNPs and InDels The SNPs and InDelswere annotated using SnpEff (version 36) with the GFF fileof the reference genome containing positional information ofrice genomic regions exons 51015840UTR 31015840UTR and CDS Toidentify outlier genes the cutoff values of nonsynonymousSNPs were identified by using the five-number summary ofbox-and-whisker plot

3 Results

31 Mapping ofWhole-Genome Sequencing Reads Thewholegenomes of three Vietnamese indica and two japonica ricelandraces were resequenced and produced 6884 million100 bp paired-end reads in total Among them 6218 millionreads (9032) were successfully aligned to the Nipponbarenuclear reference genome The alignment rates of eachlandrace were relatively high ranging from 8633 to 9387yielding 30xndash40x coverage in depth (Table 2) (Table S2andashe forchromosome coverage in detail)

The genome coverage ranged from 8918 to 9649 ofthe reference (Table 2 and Tables S2andashS2e) The organelle

International Journal of Genomics 3

Table 2 Mapping and coverage of the reads from the landraces to Nipponbare reference genome

Landraces indica 12 indica 13 indica 15 japonica 11 japonica 14Total readsa 129251696 112867645 151576754 133602832 161141384 mapped readsb 112667155 101814936 130851417 125253671 151261159Mapped reads () 8717 9021 8633 9375 9387 unmapped readsc 16584541 11052709 20725337 8349161 9880225Unmapped reads () 1283 979 1367 625 613Mean mapping qualityd 4221 4459 4203 4781 4814Coveragee 9130 8918 9256 9550 9649Depthf 2994 2709 3474 3346 4041aThe raw reads in the FASTQ files bThe number of reads mapped to 12 chromosomes in the nucleus cThe number of unmapped reads to both nuclear andorganellar genomes dThe error probability of read mapping scaled by Phred quality eThe breadth of coverage across the nuclear genome fThe sequencingdepth of reads

Table 3 Number of SNP effects annotated by SnpEff SNPs were classified into various genomic regions (intergenic regions intron and exon)and effect terms by typesThe total of intergenic intronic and exonic SNPs was more than the total number of SNPs due to overlapping genemodels in the GFF file

indica 12 indica 13 indica 15 japonica 11 japonica 14Intergenic 1374419 6845 1329647 6946 1527754 6816 488888 6610 454811 6583Genic 633408 3155 584505 3054 713664 3184 250766 3390 236104 3417Intron amp regulatory sequences 288375 4553 279350 4779 320956 4497 109191 4354 101187 4286UTRs 85749 1354 82611 1413 96948 1358 29587 1180 28138 1192CDS 259284 4093 222544 3807 295760 4144 111988 4466 106779 4523Nonsynonymous 151998 5862 131754 5920 172767 5841 66645 5951 63962 5990Synonymous 107286 4138 90790 4080 122993 4159 45343 4049 42817 4010

genomes (mitochondrial and chloroplast) had coverage of94minus100 (Tables S3a and S3b) There was a portion ofreads (017sim026) that could be aligned to both nuclearDNA genome and organelle genomes mitochondrial andchloroplast (Tables S4andashS4e)

32 Detection and Distribution of SNPs and InDels By usingSAMtools and VarScan software the qualified SNPs andInDels were called The distributions of SNPs and InDels bychromosomes of each landrace are shown in Figures 1 and 2

For the indica lines the numbers of SNPs and InDelswere approximately twomillion and three hundred thousandrespectively (Tables S5andashS5c) Accordingly for the japonicalines the numbers of SNPs and InDels were approximatelyseven hundred thousand and one hundred thousand respec-tively (Tables S5d and S5e) For the indica lines chromosomes1 and 9 had the highest and lowest variation rates in termsof both SNP and InDel respectively However there was anexception in indica 12 and the lowest InDel rate was on chro-mosome 10 instead of chromosome 9 (Tables S5andashS5c) Forthe japonica lines the highest SNP rate was on chromosome8 while the lowest SNP rate was on chromosomes 2 (japonica11) and 3 (japonica 14) The highest and lowest InDel rateswere on chromosomes 1 and 5 respectively for both (TablesS5d and S5e)

The distribution of DNA polymorphisms has been exam-ined on each 100 kb nonoverlapping window to obtainaverage densities of SNP and InDels of chromosomes Theaverage densities of SNPs and InDels of indica landraces

were about 25 times that of japonica landraces (Tables S5andashS5e) The average densities of deletions tended to be higherthan those of insertions on all chromosomes (Tables S5andashS5e) Moreover SNP deserts where the SNP densities werebelow 1 SNPkb have been identified with differing sizes(100 kb to 67Mb average of 300 kb) and chromosomallocations Of all landraces there were three SNP deserts oflarger sizes (26MB 08MB and 07Mb) on chromosomes5 4 and 1 respectively (Figure 3) Moreover within all fivelines there is a small SNP desert of 01Mb located fromposition 124 to 125Mb on chromosome 11 in which thereare no genes found We have used the SNP densities per100 kb interval for reconstructing the relationships amongthe rice landraces By calculating the Pearson correlationcoefficient the relationship between the rice lines wasobserved because the patterns grouped the rice landracesinto two major subspecies confirming the classification oflandraces based on the chloroplast DNA presenceabsenceof a deletion in the Pst-12 fragment [19] In detail thelandraces in the same subspecies had a positive correlationclose to one whereas therewas no linear relationship betweenindica and japonica lines with the coefficients around zero(Figure 4)

33 Annotation of SNPs and InDels SNPs and InDels wereannotated against the GFF file of the Nipponbare referencegenome using SnpEff SNPsmostly occurred in the intergenicregions (approximately 680 for indica 660 for japon-ica) respectively (Figure 5 Table 3) For SNPs within genic

4 International Journal of Genomics

300000

200000

100000

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

20000

15000

10000

5000

0 0

SNPs InDels

Chromosomes Chromosomes(a)

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

20000

15000

10000

5000

0

300000

200000

100000

0

SNPs InDels

Chromosomes Chromosomes(b)

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

20000

15000

10000

5000

0

300000

200000

100000

0

SNPs InDels

Chromosomes Chromosomes(c)

80000

60000

40000

20000

8000

6000

4000

2000

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 1200

SNPs InDels

Chromosomes Chromosomes(d)

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

8000

6000

4000

2000

00

80000

60000

40000

20000

0

SNPs InDels

Chromosomes Chromosomes(e)

Figure 1 The number of SNPs and InDels on each chromosome in five landraces in comparison with the Nipponbare reference The 119909-axis represents the chromosomes The 119910-axis represents the number of SNPs (horizontal-line bars) insertions (white bars) and deletions(downward-diagonal bars) The five rice lines were (a) indica 12 (b) indica 13 (c) indica 15 (d) japonica 11 and (e) japonica 14

regions (approximately 320 for indica 340 for japonica)SNPs occurred in introns and regulatory sequences (449 to478 for indica about 430 for japonica) and UTR regions(approximately 1375 for indica and 12 for japonica)

Among CDS regions the split between nonsynonymousand synonymous SNPs was 5875 4125 for indica and597 403 for japonica (Figure 5 Table 3) Similarly of theInDels more than 730 were detected in intergenic regions

Most of the InDels within genic regions were within InDelsor regulatory sequences (more than 620) with 933 to1195 within coding sequences (Figure S2) The length ofinsertions ranged from one to 27 bp while the length ofdeletions detected was up to 41 bp The majority of InDelswere mononucleotide (asymp555) and dinucleotide (asymp1665)In order to provide more insights into the effects of nonsyn-onymous SNPs on the genes the distribution and skewness

International Journal of Genomics 5

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

Chr01

Chr02

Chr03

Chr04

Chr05

Chr06

Chr07

Chr08

Chr09

Chr10

Chr11

Chr12(142749)

(432Mb)

(359Mb)

(364Mb)

(355Mb)

(300Mb)

(312Mb)

(297Mb)

(284Mb)

(230Mb)

(232Mb)

(290Mb)

(275Mb)

(233111)

(204691)

(200651)

(149505)

(147438)

(177397)

(156409)

(151679)

(133830)

(142862)

(167507)

10 20 30 40

Figure 2 Distribution of SNPs between indica 12 and Nipponbare on the 12 chromosomes The 119909-axis shows the physical distance ofchromosome into 100 kb windows The chromosome size is indicated in brackets The 119910-axis represents the number of SNPs per 100 kbThe total of SNPs in each chromosome is shown in parentheses

were calculated to identify outlier genes which possess veryhigh numbers of nonsynonymous SNPs as shown in Figure 6According to the Nipponbare reference genome nearly halfof the 149 outlier genes were retrotransposon proteins (TableS6)

We have also observed that the number of transitionalSNPs (AG and CT) was much higher than that of transver-sions (AC AT CG and GT) for all of the five landraceswith the ratios (TsTv) ranging from 219 (japonica 14) to237 (indica 12) Within transitions the frequency of CTwas slightly higher than those of AT and much larger thantransversion SNPs (Table 4)

4 Discussion

In this study we have sequenced five Vietnamese ricegenomes consisting of three indica and two japonica lan-dracesThe sequence datasets were aligned to theNipponbarereference genome to identify genetic variations The geneticvariation annotation and analysis providednovel insights intothe specific Vietnamese rice landraces which should be agood resource for further molecular breeding in rice Wehave analyzed the sequence datasets of two major groupsof rice landraces indica and japonica through five elitelandraces in Vietnam This provides significant information

6 International Journal of Genomics

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(432Mb)

10 20 30 40

(Mb)N

umbe

r of g

enes

(a)

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(355Mb)

10 20 30

(Mb)

Num

ber o

f gen

es

(b)

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(300Mb)

10 20 30

(Mb)

Num

ber o

f gen

es

(c)

Figure 3 Distribution of ldquoSNP desertsrdquo on (a) chromosome 1 (b) chromosome 4 and (c) chromosome 5 in the five landraces The low-SNPdensity regions are shown as gray bars or vertical lines The vertical-dash intervals indicate the ldquoSNP desertsrdquo occurring in all five landracesThe line charts represent the distribution of genes across the chromosomes The chromosome sizes are indicated in brackets

as to the genetic diversity of the two types of rice lines in thedomestication process The results have further contributedadditional evidence for the transfer of DNA regions in theorganelle into the nucleus in the rice genome Our study hasalso strengthened the classification of the relationship amongthe landraces The detection of DNA polymorphisms can beused to identify novel genes that differentiated between ourlandraces and will serve as a reference for studies relating to

specialty characteristics of Vietnamese rice landraces Thesepolymorphisms are also available for use in marker-assistedselection in rice breeding programs

The whole genomes of five rice landraces have yieldedhigh-quality reads from 112 to 161 million reads per lineMost of the reads (ge8633) were successfully mappedto the Nipponbare reference genome with the breadth ofcoverage of more than 8918 proving that the selected rice

International Journal of Genomics 7

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica15

indica12

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica12

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica15

indica12

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica15

indica12

indica13

minus1 1Value

Color key

0

Chr01 Chr02 Chr03

Chr04 Chr05 Chr06

Chr07 Chr08

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

Chr09

Chr10 Chr11 Chr12

Figure 4 The heatmap and cluster view showing the correlation among the five landraces on the 12 chromosomes Pearson correlationcoefficients were calculated based on the SNP densities per 100 kb on the chromosome

genomes are similar to the reference Interestingly we founda small number of reads aligned with both chromosomes andorganelle genomes including chloroplast and mitochondriaThis phenomenon is known as ldquoorganellar insertionrdquo in thenuclear genome which has been previously reported by theInternational Rice Genome Sequencing Project [20] The

reads mapped to the mitochondria concentrated on chro-mosome 12 (10) and the reads mapped to the chloroplastlocated on chromosome 10 (08) Therefore using NGSthese data are consistent with some previous reports andalso reconfirmed that chromosomes 10 and 12 have had moreinsertions than the others [20]

8 International Journal of Genomics

Intergenic 1374419

Genic 633408

Intron amp regulatory sequences 288375

UTRs85749

CDS259284

Non-synonymous

151998

Synonymous107286

(a)

Intergenic 1329647

Genic 584505

Intron amp regulatory sequences

279350

UTRs 82611

CDS 222544

Non-synonymous

131754

Synonymous90790

(b)

Intergenic 1527754

Genic 713664

Intron amp regulatory sequences

320956

UTRs 96948

CDS 295760

Non-synonymous

172767

Synonymous122993

(c)

Intergenic 488888

Genic 250766

Intron amp regulatory sequences

109191

UTRs29587

CDS 111988

Non-synonymous

66645

Synonymous45343

(d)

Intergenic 454811

Genic 236104

Intron amp regulatory sequences

101187

UTRs 28138

CDS 106779

Non-synonymous

63962

Synonymous42817

(e)

Figure 5 Annotation of single nucleotide polymorphisms (SNPs) between five Vietnamese rice landraces ((a) indica 12 (b) indica 13 (c)indica 15 (d) japonica 11 and (e) japonica 14) and Nipponbare based on the annotations of Nipponbare reference genome

International Journal of Genomics 9

2000018823

2248

1554

1082

825

298 327

478

304239

159 137 99 72 56 36 28 36 30 15 12 11 7 4 9 5 15

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

Number of non-synonymous SNPs (SNPskb)

le2 3 4 5 6

69 7 8 910

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

(a)

16770

1827

1321

903

581

59

473334

214 197134 116 74 61 43 34 19 18 14 11 13 8 7 2 3 4 12

18000

16000

17000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

639 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(b)

19953

2352

1724

1293

924

705

534407

298 236160 127 97 81 52 38 29 31 24 16 17 12 14 7 2 1814

20000

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

6497 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(c)

12133

982

650

96

314 302231 181

98 83 51 46 35 23 22 23 17 11 13 5 5 5 6 5 6

12000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

668 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

ge25

Number of non-synonymous SNPs (SNPskb)

(d)

12121

957

613

5

402254 207 150 106 77 57 52 44 23 30 21 12 10 6 6 6 64 4 3 5 2

12000

14000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

647 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(e)

Figure 6 The distribution and skewness of the nonsynonymous SNPs SNPs per 1 kb in (a) indica 12 (b) indica 13 (c) indica 15 (d) japonica11 and (e) japonica 14 The black dash indicates the outlier value cutoff

Table 4 Base changes in five rice genomes

Landraces indica 12 indica 13 indica 15 japonica 11 japonica 14Transitions

AG 705908 671815 787002 255473 236847CT 707496 672959 787976 255542 237620

TransversionsAC 155310 149265 174014 59111 55410AT 182882 176243 205343 70949 68165GC 101541 95523 113736 39460 37120GT 154639 148304 173286 59011 55639

TsTv 2378 2362 2364 2236 2193

10 International Journal of Genomics

The sequencing depth of approximately 30x has beensufficient for detecting DNA polymorphisms Our results ofvariant calling of indica lines (1914152 to 2241418 SNPs268400 to 303039 InDels) are in agreementwith the previousstudy reporting that the 93-11 indica possesses about 17million SNPs and 480 thousand InDels [21] However thenumbers of SNPs and InDels (about 714 thousand SNPs and102 thousand InDels) of japonica lines were about five andthree times those of Omachi line (132462 SNPs and 35766InDels) but were similar to those of Moroberekan (827448SNPs and 159597 InDels) a tropical japonica cultivar [1222] The distinct difference between tropical and temperatejaponica landraces has been reported by Arai-Kichise et al[22] Further validation is underway to obtain SNP markersfor rice breeding selection

SNP deserts genome regions of SNP rate less than1 SNPkb were identified in all 5 Vietnamese landraces withsizes varying from 100 kb to 67MB The japonica SNPdeserts are longer than those of indica Mostly SNP desertshave been previously found on chromosome 5 [23 24]however we have identified two additional SNP desertswithin all five Vietnamese landraces with the size of 07Mbon chromosome 1 and 08Mb on chromosome 4 Chenget al [25] have reported that SNP deserts were in thevicinity of the centromere on chromosome 5 but far from thecentromere on chromosomes 1 and 4 Therefore the currentresults have supported the hypothesis that SNP-low regionshave not been correlated with low recombination [26] ThecommonSNPdesertsmight include highly conserved regionsamong the five Vietnamese rice landraces They could resultfrom selective sweeps reducing the variants during humanselection and rice domestication [27] These SNP deserts areable to raise fascinating questions for future studies as towhether the persistence of chromosomal regions is randomor special for the individual landrace

The SNP distribution correlation analysis of the fivelandraces has demonstrated the distinct divergence betweenindica and japonica and disclosed the phylogenetic relation-ships among the landraces (Figure 4) thus it is possible tobe exploited for the verification of landrace classificationsThe relationship among the rice landraces in the correlationof chromosomes could be utilized for the genetic linkagedisequilibrium studies

Additionally the phenomenon ldquotransition biasrdquo whichmeans the ratios of transitions (Ts) and transversions (Tv)larger than 1 2 occurred among the five landraces Thetransitional SNPs aremore tolerated than transversional onesduring mutation and natural selection because they are morelikely synonymous in coding protein resulting in conservingthe protein structure [28] Within transitions a number ofboth AG and CT changes have been rather similar Amongtransversions the AT transversions have been shown to behigher than others which was also reported in the genomesof citrus and rice [17 28 29]

In summary by sequencing and aligning five Vietnameserice genomes with the reference genome Nipponbare ourresults have disclosed interesting genetic information suchas SNP and InDel distributions and effects and SNP desertsThree ldquoSNPdesertrdquo regions whichmight result from selective

sweeps in the domestication of rice landraces were alsoobserved in the different chromosomes Furthermore theSNP distribution analysis has revealed the phylogeny ofindica and japonica with distinct classification Further SNPvalidations need to be examined to identify accurate SNPmarkers for molecular breeding

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Hien Trinh and Khoa Truong Nguyen contributed equallyto this work Mario Caccamo Cuong Nguyen and KhuatHuu Trung contributed to the conception and design of thestudy Sarah Ayling Khoa Truong Nguyen and Hien Trinhcarried out the quality control of sequencing data and dataanalysis Tran Dang Xuan La Hoang Anh Can Thu HuongNguyenThuy Diep LamVan Nguyen and Huy Quang Phamconceived the study and designed and performed the dataanalysis Tran Dang Khanh Hien Trinh and Cuong Nguyendrafted and revised the manuscript All authors contributedto the editing of the final version of the manuscript

Acknowledgments

The authors would like to thank Hung Nguyen from MUInformatics Institute University of Missouri Columbia MO65211 USA for his help in developing the homemade Pythonscript to split 100 kb window sizes across the chromosomesThisworkwas supported by theMinistry of Science andTech-nology (MOST) Vietnam and the Genome Analysis Centre(TGAC) Biotechnology and Biological Sciences ResearchCouncil (BBSRC) through a collaboration program betweenVietnam and theUKThe authors are grateful to colleagues inthis program that have made significant contributions to thecollection analysis or interpretation of samples and results

References

[1] Y Kawahara M Bastide J P Hamilton et al ldquoImprovementof the Oryza sativa Nipponbare reference genome using nextgeneration sequence and optical map datardquo Rice vol 6 no 1article 4 2013

[2] R Brenchley M Spannagl M Pfeifer et al ldquoAnalysis of thebread wheat genome using whole-genome shotgun sequenc-ingrdquo Nature vol 491 no 7426 pp 705ndash710 2012

[3] M Kobayashi H Nagasaki V Garcia et al ldquoGenome-wideanalysis of intraspecific dna polymorphism in lsquoMicro-Tomrsquo amodel cultivar of tomato (solanum lycopersicum)rdquo Plant andCell Physiology vol 55 no 2 pp 445ndash454 2014

[4] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[5] M Shimomura H Kanamori S Komatsu et al ldquoThe Glycinemax cv enrei genome for improvement of Japanese soybean

International Journal of Genomics 11

cultivarsrdquo International Journal of Genomics vol 2015 ArticleID 358127 8 pages 2015

[6] D R Bentley ldquoWhole-genome re-sequencingrdquoCurrent Opinionin Genetics and Development vol 16 no 6 pp 545ndash552 2006

[7] OMorozova andMAMarra ldquoApplications of next-generationsequencing technologies in functional genomicsrdquo Genomicsvol 92 no 5 pp 255ndash264 2008

[8] K LMcNally K L Childs R Bohnert et al ldquoGenomewide SNPvariation reveals relationships among landraces and modernvarieties of ricerdquoProceedings of theNational Academy of Sciencesof the United States of America vol 106 no 30 pp 12273ndash122782009

[9] T YamamotoHNagasaki J-I Yonemaru et al ldquoFine definitionof the pedigree haplotypes of closely related rice cultivars bymeans of genome-wide discovery of single-nucleotide polymor-phismsrdquo BMC Genomics vol 11 article 267 2010

[10] X Huang X Wei T Sang et al ldquoGenome-wide asociationstudies of 14 agronomic traits in rice landracesrdquoNatureGeneticsvol 42 no 11 pp 961ndash967 2010

[11] Y Liu X Qi N D Young K M Olsen A L Caicedo and YJia ldquoCharacterization of resistance genes to rice blast fungusMagnaporthe oryzae in a ldquoGreen Revolutionrdquo rice varietyrdquoMolecular Breeding vol 35 article no 52 2015

[12] Y Arai-Kichise Y Shiwa H Nagasaki et al ldquoDiscovery ofgenome-wide DNA polymorphisms in a landrace cultivar ofjaponica rice by whole-genome sequencingrdquo Plant and CellPhysiology vol 52 no 2 pp 274ndash282 2011

[13] J L Bennetzen ldquoComparative sequence analysis of plantnuclear genomes microcolinearity and its many exceptionsrdquoPlant Cell vol 12 no 7 pp 1021ndash1029 2000

[14] J Yu S Hu J Wang et al ldquoA draft sequence of the rice genome(Oryza sativa L ssp indica)rdquo Science vol 296 no 5565 pp 79ndash92 2002

[15] S A Goff D Ricke T H Lan et al ldquoA draft sequence of therice genome (Oryza sativa L ssp japonica)rdquo Science vol 296no 5565 pp 92ndash100 2005

[16] M Jain K C Moharana R Shankar R Kumari and RGarg ldquoGenomewide discovery of DNA polymorphisms in ricecultivars with contrasting drought and salinity stress responseand their functional relevancerdquoPlant Biotechnology Journal vol12 no 2 pp 253ndash264 2014

[17] P Rathinasabapathi N Purushothaman V L Ramprasad andM Parani ldquoWhole genome sequencing and analysis of Swarnaa widely cultivated indica rice variety with low glycemic indexrdquoScientific Reports vol 5 Article ID 11303 2015

[18] K Tsukada ldquoVietnam food security in a rice-exporting coun-tryrdquo in The World Food Crisis and the Strategies and Asian RiceExporter S Shigetomi K Kubo and K Tsukada Eds SpotSurvey 32 IDE-JETRO Chiba Japan 2011

[19] K H Trung and L H Ham ldquoSequencing the genomes of anumber of native Vietnamese rice linerdquo in Proceedings of the1st National Proceeding of Crop Science VAAS Hanoi VietnamSeptember 2013

[20] Sequencing Project IRG ldquoThe map-based sequence of the ricegenomerdquo Nature vol 436 pp 793ndash800 2005

[21] Y-J Shen H Jiang J-P Jin et al ldquoDevelopment of genome-wide DNA polymorphism database for map-based cloning ofrice genesrdquo Plant Physiology vol 135 no 3 pp 1198ndash1205 2004

[22] Y Arai-Kichise Y Shiwa K Ebana et al ldquoGenome-wide DNApolymorphisms in seven rice cultivars of temperate and tropicaljaponica groupsrdquo PLoS ONE vol 9 no 1 Article ID e863122014

[23] G K Subbaiyan D L EWaters S K Katiyar A R SadanandaS Vaddadi and R J Henry ldquoGenome-wide DNA polymor-phisms in elite indica rice inbreds discovered by whole-genomesequencingrdquo Plant Biotechnology Journal vol 10 no 6 pp 623ndash634 2012

[24] S G Krishnan D L EWaters and R J Henry ldquoAustralian wildrice reveals pre-domestication origin of polymorphism desertsin rice genomerdquoPLoSONE vol 9 no 6 Article ID e98843 2014

[25] Z Cheng F Dong T Langdon et al ldquoFunctional rice cen-tromeres are marked by a satellite repeat and a centromere-specific retrotransposonrdquoThe Plant Cell vol 14 no 8 pp 1691ndash1704 2002

[26] F A Feltus J Wan S R Schulze J C Estill N Jiang and AH Paterson ldquoAn SNP resource for rice genetics and breedingbased on subspecies Indica and Japonica genome alignmentsrdquoGenome Research vol 14 no 9 pp 1812ndash1819 2004

[27] D L Waters and R J Henry ldquoAustralian wild rice reveals pre-domestication origin of polymorphism deserts in rice genomerdquoPLoS ONE vol 9 no 6 Article ID e98843 2014

[28] J Wakeley ldquoThe excess of transitions among nucleotide substi-tutions new methods of estimating transition bias underscoreits significancerdquo Trends in Ecology amp Evolution vol 11 no 4 pp158ndash162 1996

[29] J Terol M A Naranjo P Ollitrault and M Talon ldquoDevelop-ment of genomic resources for Citrus clementina characteriza-tion of three deep-coverage BAC libraries and analysis of 46000BAC end sequencesrdquo BMC Genomics vol 9 article 423 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Whole-Genome Characteristics and Polymorphic Analysis of ...Thewholegenomesoffivericelandraceshaveyielded high-quality reads, from 112 to 161 million reads per line. Most of the reads

2 International Journal of Genomics

Table 1 Abbreviation list of Vietnamese rice genomes used in the study

Abbreviation Name of rice landraces(name in Vietnamese) Subspecies Origin

indica 12 Chiem nho Bac Ninh 2 indica Bac Ninh provinceindica 13 Nep lun indica Ha Giang provinceindica 15 OM6377 indica CanTho provincejaponica 11 Bletelo japonica Lang Son provincejaponica 14 Khau mac buoc japonica Nghe An province

with the Arabidopsis thaliana genome by Yu et al [14]Subsequently the complete genome sequence of japonicawas generated in 2005 [15] and then updated using NGSand optical mapping in 2013 [1] This genome version hasbeen widely utilized as the reference rice genome for DNApolymorphism findings reported in some previous studies[16 17]

Asian rice is one of the major worldwide cereal cropsAmongAsian countries Vietnam is one of theworldrsquos leadingrice exporters accounting for 16 of the world trade volumeof rice [18] To the best of our knowledge the lack of whole-genome sequencing has raised concerns in relation to thecharacteristics of Vietnamese rice landraces therefore theobjective of the current workwas to perform genome analysisof five rice landraces three indica and two japonica collectedin different ecological areas of Vietnam by applying Illu-minarsquos paired-end sequencingThe generated reads were thenmapped to the Nipponbare reference sequence to analyzethe genomic features and discover and annotate candidatesingle nucleotide polymorphisms (SNPs) and insertions anddeletions (InDels) The results presented may help to unravelthe genetic basis of our rice genomes and polymorphicresources for molecular marker identification in the future

2 Materials and Methods

21 Plant Materials and Whole-Genome Sequencing Thefive Vietnamese rice landraces were collected from differ-ent ecological areas in Vietnam Ha Giang (104∘5810158405110158401015840E221199004910158400010158401015840N) and Lang Son (106∘4510158404010158401015840E 21∘5110158401410158401015840N) inthe northeast Bac Ninh (106∘0410158402410158401015840E 21∘1110158401510158401015840N) in theRed River Delta Nghe An (104∘5810158403810158401015840E 19∘1010158403510158401015840N) in theNorthCentral Coast andCanTho (105∘4710158400310158401015840E 10∘0110158405710158401015840N)in the Mekong River Delta For convenience the genomeswere labeled (Table 1 Table S1 in the Supplementary Materialavailable online at httpsdoiorg10115520179272363) asindicated in the list of Vietnamese native rice landracesaccording to the report of Trung and Ham [19] Total DNAof each landrace was extracted from young leaf tissue usingQiagen DNeasy kit (Qiagen Germany) The library prepa-ration and sequencing of the rice genomes were carried outusing Illumina HiSeq 2000 by applying Illumina pipeline 19at the Genome Analysis Centre (TGAC) UK The obtainedFASTQ files were further assessed by FastQC softwareand deposited in the NCBI sequence read archive (SRA)with accession numbers SRP064171 (indica 12 SRR2529343

indica 13 SRR2543299 indica 15 SRR2543338 japonica 11SRR2543336 and japonica 14 SRR2543337)

22 Mapping and Identification of SNPs and InDels Thepaired-end reads were first aligned with the nuclear referencegenome (O sativa L cv Nipponbare MSU release 70GenBank accession PRJDB1747) and the organellar genomes(chloroplast genome GenBank accessionNC 0013201 mito-chondria genome GenBank accession BA000293) usingthe alignment software BWA (version 062) with defaultparameters The mapping quality was assessed by Qualimap(version 21) and BEDtools The duplicated sequences weremarked and removed by Picard tools (version 179) SNPsand InDels were then called and qualified by SAMtools(version 11) and VarScan (version 237) with the followingparameters mapping quality of 20 depth of coverage of10 average of base quality of 30 and variant frequencyof 01 with SNPs and 03 with InDels The distribution ofSNPs and InDels per 100 kb along the chromosome wasused to determine SNP-poor regions (lt1 SNP1 kb) (FigureS1) Pearson correlation coefficients of SNP density amongthe landraces were calculated using R The common readsmapped to both chromosomes and organelles were removedfrom BAM files using SAMtools (version 11) and BWA(version 062)

23 Annotation of SNPs and InDels The SNPs and InDelswere annotated using SnpEff (version 36) with the GFF fileof the reference genome containing positional information ofrice genomic regions exons 51015840UTR 31015840UTR and CDS Toidentify outlier genes the cutoff values of nonsynonymousSNPs were identified by using the five-number summary ofbox-and-whisker plot

3 Results

31 Mapping ofWhole-Genome Sequencing Reads Thewholegenomes of three Vietnamese indica and two japonica ricelandraces were resequenced and produced 6884 million100 bp paired-end reads in total Among them 6218 millionreads (9032) were successfully aligned to the Nipponbarenuclear reference genome The alignment rates of eachlandrace were relatively high ranging from 8633 to 9387yielding 30xndash40x coverage in depth (Table 2) (Table S2andashe forchromosome coverage in detail)

The genome coverage ranged from 8918 to 9649 ofthe reference (Table 2 and Tables S2andashS2e) The organelle

International Journal of Genomics 3

Table 2 Mapping and coverage of the reads from the landraces to Nipponbare reference genome

Landraces indica 12 indica 13 indica 15 japonica 11 japonica 14Total readsa 129251696 112867645 151576754 133602832 161141384 mapped readsb 112667155 101814936 130851417 125253671 151261159Mapped reads () 8717 9021 8633 9375 9387 unmapped readsc 16584541 11052709 20725337 8349161 9880225Unmapped reads () 1283 979 1367 625 613Mean mapping qualityd 4221 4459 4203 4781 4814Coveragee 9130 8918 9256 9550 9649Depthf 2994 2709 3474 3346 4041aThe raw reads in the FASTQ files bThe number of reads mapped to 12 chromosomes in the nucleus cThe number of unmapped reads to both nuclear andorganellar genomes dThe error probability of read mapping scaled by Phred quality eThe breadth of coverage across the nuclear genome fThe sequencingdepth of reads

Table 3 Number of SNP effects annotated by SnpEff SNPs were classified into various genomic regions (intergenic regions intron and exon)and effect terms by typesThe total of intergenic intronic and exonic SNPs was more than the total number of SNPs due to overlapping genemodels in the GFF file

indica 12 indica 13 indica 15 japonica 11 japonica 14Intergenic 1374419 6845 1329647 6946 1527754 6816 488888 6610 454811 6583Genic 633408 3155 584505 3054 713664 3184 250766 3390 236104 3417Intron amp regulatory sequences 288375 4553 279350 4779 320956 4497 109191 4354 101187 4286UTRs 85749 1354 82611 1413 96948 1358 29587 1180 28138 1192CDS 259284 4093 222544 3807 295760 4144 111988 4466 106779 4523Nonsynonymous 151998 5862 131754 5920 172767 5841 66645 5951 63962 5990Synonymous 107286 4138 90790 4080 122993 4159 45343 4049 42817 4010

genomes (mitochondrial and chloroplast) had coverage of94minus100 (Tables S3a and S3b) There was a portion ofreads (017sim026) that could be aligned to both nuclearDNA genome and organelle genomes mitochondrial andchloroplast (Tables S4andashS4e)

32 Detection and Distribution of SNPs and InDels By usingSAMtools and VarScan software the qualified SNPs andInDels were called The distributions of SNPs and InDels bychromosomes of each landrace are shown in Figures 1 and 2

For the indica lines the numbers of SNPs and InDelswere approximately twomillion and three hundred thousandrespectively (Tables S5andashS5c) Accordingly for the japonicalines the numbers of SNPs and InDels were approximatelyseven hundred thousand and one hundred thousand respec-tively (Tables S5d and S5e) For the indica lines chromosomes1 and 9 had the highest and lowest variation rates in termsof both SNP and InDel respectively However there was anexception in indica 12 and the lowest InDel rate was on chro-mosome 10 instead of chromosome 9 (Tables S5andashS5c) Forthe japonica lines the highest SNP rate was on chromosome8 while the lowest SNP rate was on chromosomes 2 (japonica11) and 3 (japonica 14) The highest and lowest InDel rateswere on chromosomes 1 and 5 respectively for both (TablesS5d and S5e)

The distribution of DNA polymorphisms has been exam-ined on each 100 kb nonoverlapping window to obtainaverage densities of SNP and InDels of chromosomes Theaverage densities of SNPs and InDels of indica landraces

were about 25 times that of japonica landraces (Tables S5andashS5e) The average densities of deletions tended to be higherthan those of insertions on all chromosomes (Tables S5andashS5e) Moreover SNP deserts where the SNP densities werebelow 1 SNPkb have been identified with differing sizes(100 kb to 67Mb average of 300 kb) and chromosomallocations Of all landraces there were three SNP deserts oflarger sizes (26MB 08MB and 07Mb) on chromosomes5 4 and 1 respectively (Figure 3) Moreover within all fivelines there is a small SNP desert of 01Mb located fromposition 124 to 125Mb on chromosome 11 in which thereare no genes found We have used the SNP densities per100 kb interval for reconstructing the relationships amongthe rice landraces By calculating the Pearson correlationcoefficient the relationship between the rice lines wasobserved because the patterns grouped the rice landracesinto two major subspecies confirming the classification oflandraces based on the chloroplast DNA presenceabsenceof a deletion in the Pst-12 fragment [19] In detail thelandraces in the same subspecies had a positive correlationclose to one whereas therewas no linear relationship betweenindica and japonica lines with the coefficients around zero(Figure 4)

33 Annotation of SNPs and InDels SNPs and InDels wereannotated against the GFF file of the Nipponbare referencegenome using SnpEff SNPsmostly occurred in the intergenicregions (approximately 680 for indica 660 for japon-ica) respectively (Figure 5 Table 3) For SNPs within genic

4 International Journal of Genomics

300000

200000

100000

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

20000

15000

10000

5000

0 0

SNPs InDels

Chromosomes Chromosomes(a)

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

20000

15000

10000

5000

0

300000

200000

100000

0

SNPs InDels

Chromosomes Chromosomes(b)

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

20000

15000

10000

5000

0

300000

200000

100000

0

SNPs InDels

Chromosomes Chromosomes(c)

80000

60000

40000

20000

8000

6000

4000

2000

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 1200

SNPs InDels

Chromosomes Chromosomes(d)

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

8000

6000

4000

2000

00

80000

60000

40000

20000

0

SNPs InDels

Chromosomes Chromosomes(e)

Figure 1 The number of SNPs and InDels on each chromosome in five landraces in comparison with the Nipponbare reference The 119909-axis represents the chromosomes The 119910-axis represents the number of SNPs (horizontal-line bars) insertions (white bars) and deletions(downward-diagonal bars) The five rice lines were (a) indica 12 (b) indica 13 (c) indica 15 (d) japonica 11 and (e) japonica 14

regions (approximately 320 for indica 340 for japonica)SNPs occurred in introns and regulatory sequences (449 to478 for indica about 430 for japonica) and UTR regions(approximately 1375 for indica and 12 for japonica)

Among CDS regions the split between nonsynonymousand synonymous SNPs was 5875 4125 for indica and597 403 for japonica (Figure 5 Table 3) Similarly of theInDels more than 730 were detected in intergenic regions

Most of the InDels within genic regions were within InDelsor regulatory sequences (more than 620) with 933 to1195 within coding sequences (Figure S2) The length ofinsertions ranged from one to 27 bp while the length ofdeletions detected was up to 41 bp The majority of InDelswere mononucleotide (asymp555) and dinucleotide (asymp1665)In order to provide more insights into the effects of nonsyn-onymous SNPs on the genes the distribution and skewness

International Journal of Genomics 5

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

Chr01

Chr02

Chr03

Chr04

Chr05

Chr06

Chr07

Chr08

Chr09

Chr10

Chr11

Chr12(142749)

(432Mb)

(359Mb)

(364Mb)

(355Mb)

(300Mb)

(312Mb)

(297Mb)

(284Mb)

(230Mb)

(232Mb)

(290Mb)

(275Mb)

(233111)

(204691)

(200651)

(149505)

(147438)

(177397)

(156409)

(151679)

(133830)

(142862)

(167507)

10 20 30 40

Figure 2 Distribution of SNPs between indica 12 and Nipponbare on the 12 chromosomes The 119909-axis shows the physical distance ofchromosome into 100 kb windows The chromosome size is indicated in brackets The 119910-axis represents the number of SNPs per 100 kbThe total of SNPs in each chromosome is shown in parentheses

were calculated to identify outlier genes which possess veryhigh numbers of nonsynonymous SNPs as shown in Figure 6According to the Nipponbare reference genome nearly halfof the 149 outlier genes were retrotransposon proteins (TableS6)

We have also observed that the number of transitionalSNPs (AG and CT) was much higher than that of transver-sions (AC AT CG and GT) for all of the five landraceswith the ratios (TsTv) ranging from 219 (japonica 14) to237 (indica 12) Within transitions the frequency of CTwas slightly higher than those of AT and much larger thantransversion SNPs (Table 4)

4 Discussion

In this study we have sequenced five Vietnamese ricegenomes consisting of three indica and two japonica lan-dracesThe sequence datasets were aligned to theNipponbarereference genome to identify genetic variations The geneticvariation annotation and analysis providednovel insights intothe specific Vietnamese rice landraces which should be agood resource for further molecular breeding in rice Wehave analyzed the sequence datasets of two major groupsof rice landraces indica and japonica through five elitelandraces in Vietnam This provides significant information

6 International Journal of Genomics

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(432Mb)

10 20 30 40

(Mb)N

umbe

r of g

enes

(a)

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(355Mb)

10 20 30

(Mb)

Num

ber o

f gen

es

(b)

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(300Mb)

10 20 30

(Mb)

Num

ber o

f gen

es

(c)

Figure 3 Distribution of ldquoSNP desertsrdquo on (a) chromosome 1 (b) chromosome 4 and (c) chromosome 5 in the five landraces The low-SNPdensity regions are shown as gray bars or vertical lines The vertical-dash intervals indicate the ldquoSNP desertsrdquo occurring in all five landracesThe line charts represent the distribution of genes across the chromosomes The chromosome sizes are indicated in brackets

as to the genetic diversity of the two types of rice lines in thedomestication process The results have further contributedadditional evidence for the transfer of DNA regions in theorganelle into the nucleus in the rice genome Our study hasalso strengthened the classification of the relationship amongthe landraces The detection of DNA polymorphisms can beused to identify novel genes that differentiated between ourlandraces and will serve as a reference for studies relating to

specialty characteristics of Vietnamese rice landraces Thesepolymorphisms are also available for use in marker-assistedselection in rice breeding programs

The whole genomes of five rice landraces have yieldedhigh-quality reads from 112 to 161 million reads per lineMost of the reads (ge8633) were successfully mappedto the Nipponbare reference genome with the breadth ofcoverage of more than 8918 proving that the selected rice

International Journal of Genomics 7

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica15

indica12

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica12

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica15

indica12

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica15

indica12

indica13

minus1 1Value

Color key

0

Chr01 Chr02 Chr03

Chr04 Chr05 Chr06

Chr07 Chr08

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

Chr09

Chr10 Chr11 Chr12

Figure 4 The heatmap and cluster view showing the correlation among the five landraces on the 12 chromosomes Pearson correlationcoefficients were calculated based on the SNP densities per 100 kb on the chromosome

genomes are similar to the reference Interestingly we founda small number of reads aligned with both chromosomes andorganelle genomes including chloroplast and mitochondriaThis phenomenon is known as ldquoorganellar insertionrdquo in thenuclear genome which has been previously reported by theInternational Rice Genome Sequencing Project [20] The

reads mapped to the mitochondria concentrated on chro-mosome 12 (10) and the reads mapped to the chloroplastlocated on chromosome 10 (08) Therefore using NGSthese data are consistent with some previous reports andalso reconfirmed that chromosomes 10 and 12 have had moreinsertions than the others [20]

8 International Journal of Genomics

Intergenic 1374419

Genic 633408

Intron amp regulatory sequences 288375

UTRs85749

CDS259284

Non-synonymous

151998

Synonymous107286

(a)

Intergenic 1329647

Genic 584505

Intron amp regulatory sequences

279350

UTRs 82611

CDS 222544

Non-synonymous

131754

Synonymous90790

(b)

Intergenic 1527754

Genic 713664

Intron amp regulatory sequences

320956

UTRs 96948

CDS 295760

Non-synonymous

172767

Synonymous122993

(c)

Intergenic 488888

Genic 250766

Intron amp regulatory sequences

109191

UTRs29587

CDS 111988

Non-synonymous

66645

Synonymous45343

(d)

Intergenic 454811

Genic 236104

Intron amp regulatory sequences

101187

UTRs 28138

CDS 106779

Non-synonymous

63962

Synonymous42817

(e)

Figure 5 Annotation of single nucleotide polymorphisms (SNPs) between five Vietnamese rice landraces ((a) indica 12 (b) indica 13 (c)indica 15 (d) japonica 11 and (e) japonica 14) and Nipponbare based on the annotations of Nipponbare reference genome

International Journal of Genomics 9

2000018823

2248

1554

1082

825

298 327

478

304239

159 137 99 72 56 36 28 36 30 15 12 11 7 4 9 5 15

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

Number of non-synonymous SNPs (SNPskb)

le2 3 4 5 6

69 7 8 910

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

(a)

16770

1827

1321

903

581

59

473334

214 197134 116 74 61 43 34 19 18 14 11 13 8 7 2 3 4 12

18000

16000

17000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

639 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(b)

19953

2352

1724

1293

924

705

534407

298 236160 127 97 81 52 38 29 31 24 16 17 12 14 7 2 1814

20000

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

6497 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(c)

12133

982

650

96

314 302231 181

98 83 51 46 35 23 22 23 17 11 13 5 5 5 6 5 6

12000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

668 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

ge25

Number of non-synonymous SNPs (SNPskb)

(d)

12121

957

613

5

402254 207 150 106 77 57 52 44 23 30 21 12 10 6 6 6 64 4 3 5 2

12000

14000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

647 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(e)

Figure 6 The distribution and skewness of the nonsynonymous SNPs SNPs per 1 kb in (a) indica 12 (b) indica 13 (c) indica 15 (d) japonica11 and (e) japonica 14 The black dash indicates the outlier value cutoff

Table 4 Base changes in five rice genomes

Landraces indica 12 indica 13 indica 15 japonica 11 japonica 14Transitions

AG 705908 671815 787002 255473 236847CT 707496 672959 787976 255542 237620

TransversionsAC 155310 149265 174014 59111 55410AT 182882 176243 205343 70949 68165GC 101541 95523 113736 39460 37120GT 154639 148304 173286 59011 55639

TsTv 2378 2362 2364 2236 2193

10 International Journal of Genomics

The sequencing depth of approximately 30x has beensufficient for detecting DNA polymorphisms Our results ofvariant calling of indica lines (1914152 to 2241418 SNPs268400 to 303039 InDels) are in agreementwith the previousstudy reporting that the 93-11 indica possesses about 17million SNPs and 480 thousand InDels [21] However thenumbers of SNPs and InDels (about 714 thousand SNPs and102 thousand InDels) of japonica lines were about five andthree times those of Omachi line (132462 SNPs and 35766InDels) but were similar to those of Moroberekan (827448SNPs and 159597 InDels) a tropical japonica cultivar [1222] The distinct difference between tropical and temperatejaponica landraces has been reported by Arai-Kichise et al[22] Further validation is underway to obtain SNP markersfor rice breeding selection

SNP deserts genome regions of SNP rate less than1 SNPkb were identified in all 5 Vietnamese landraces withsizes varying from 100 kb to 67MB The japonica SNPdeserts are longer than those of indica Mostly SNP desertshave been previously found on chromosome 5 [23 24]however we have identified two additional SNP desertswithin all five Vietnamese landraces with the size of 07Mbon chromosome 1 and 08Mb on chromosome 4 Chenget al [25] have reported that SNP deserts were in thevicinity of the centromere on chromosome 5 but far from thecentromere on chromosomes 1 and 4 Therefore the currentresults have supported the hypothesis that SNP-low regionshave not been correlated with low recombination [26] ThecommonSNPdesertsmight include highly conserved regionsamong the five Vietnamese rice landraces They could resultfrom selective sweeps reducing the variants during humanselection and rice domestication [27] These SNP deserts areable to raise fascinating questions for future studies as towhether the persistence of chromosomal regions is randomor special for the individual landrace

The SNP distribution correlation analysis of the fivelandraces has demonstrated the distinct divergence betweenindica and japonica and disclosed the phylogenetic relation-ships among the landraces (Figure 4) thus it is possible tobe exploited for the verification of landrace classificationsThe relationship among the rice landraces in the correlationof chromosomes could be utilized for the genetic linkagedisequilibrium studies

Additionally the phenomenon ldquotransition biasrdquo whichmeans the ratios of transitions (Ts) and transversions (Tv)larger than 1 2 occurred among the five landraces Thetransitional SNPs aremore tolerated than transversional onesduring mutation and natural selection because they are morelikely synonymous in coding protein resulting in conservingthe protein structure [28] Within transitions a number ofboth AG and CT changes have been rather similar Amongtransversions the AT transversions have been shown to behigher than others which was also reported in the genomesof citrus and rice [17 28 29]

In summary by sequencing and aligning five Vietnameserice genomes with the reference genome Nipponbare ourresults have disclosed interesting genetic information suchas SNP and InDel distributions and effects and SNP desertsThree ldquoSNPdesertrdquo regions whichmight result from selective

sweeps in the domestication of rice landraces were alsoobserved in the different chromosomes Furthermore theSNP distribution analysis has revealed the phylogeny ofindica and japonica with distinct classification Further SNPvalidations need to be examined to identify accurate SNPmarkers for molecular breeding

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Hien Trinh and Khoa Truong Nguyen contributed equallyto this work Mario Caccamo Cuong Nguyen and KhuatHuu Trung contributed to the conception and design of thestudy Sarah Ayling Khoa Truong Nguyen and Hien Trinhcarried out the quality control of sequencing data and dataanalysis Tran Dang Xuan La Hoang Anh Can Thu HuongNguyenThuy Diep LamVan Nguyen and Huy Quang Phamconceived the study and designed and performed the dataanalysis Tran Dang Khanh Hien Trinh and Cuong Nguyendrafted and revised the manuscript All authors contributedto the editing of the final version of the manuscript

Acknowledgments

The authors would like to thank Hung Nguyen from MUInformatics Institute University of Missouri Columbia MO65211 USA for his help in developing the homemade Pythonscript to split 100 kb window sizes across the chromosomesThisworkwas supported by theMinistry of Science andTech-nology (MOST) Vietnam and the Genome Analysis Centre(TGAC) Biotechnology and Biological Sciences ResearchCouncil (BBSRC) through a collaboration program betweenVietnam and theUKThe authors are grateful to colleagues inthis program that have made significant contributions to thecollection analysis or interpretation of samples and results

References

[1] Y Kawahara M Bastide J P Hamilton et al ldquoImprovementof the Oryza sativa Nipponbare reference genome using nextgeneration sequence and optical map datardquo Rice vol 6 no 1article 4 2013

[2] R Brenchley M Spannagl M Pfeifer et al ldquoAnalysis of thebread wheat genome using whole-genome shotgun sequenc-ingrdquo Nature vol 491 no 7426 pp 705ndash710 2012

[3] M Kobayashi H Nagasaki V Garcia et al ldquoGenome-wideanalysis of intraspecific dna polymorphism in lsquoMicro-Tomrsquo amodel cultivar of tomato (solanum lycopersicum)rdquo Plant andCell Physiology vol 55 no 2 pp 445ndash454 2014

[4] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[5] M Shimomura H Kanamori S Komatsu et al ldquoThe Glycinemax cv enrei genome for improvement of Japanese soybean

International Journal of Genomics 11

cultivarsrdquo International Journal of Genomics vol 2015 ArticleID 358127 8 pages 2015

[6] D R Bentley ldquoWhole-genome re-sequencingrdquoCurrent Opinionin Genetics and Development vol 16 no 6 pp 545ndash552 2006

[7] OMorozova andMAMarra ldquoApplications of next-generationsequencing technologies in functional genomicsrdquo Genomicsvol 92 no 5 pp 255ndash264 2008

[8] K LMcNally K L Childs R Bohnert et al ldquoGenomewide SNPvariation reveals relationships among landraces and modernvarieties of ricerdquoProceedings of theNational Academy of Sciencesof the United States of America vol 106 no 30 pp 12273ndash122782009

[9] T YamamotoHNagasaki J-I Yonemaru et al ldquoFine definitionof the pedigree haplotypes of closely related rice cultivars bymeans of genome-wide discovery of single-nucleotide polymor-phismsrdquo BMC Genomics vol 11 article 267 2010

[10] X Huang X Wei T Sang et al ldquoGenome-wide asociationstudies of 14 agronomic traits in rice landracesrdquoNatureGeneticsvol 42 no 11 pp 961ndash967 2010

[11] Y Liu X Qi N D Young K M Olsen A L Caicedo and YJia ldquoCharacterization of resistance genes to rice blast fungusMagnaporthe oryzae in a ldquoGreen Revolutionrdquo rice varietyrdquoMolecular Breeding vol 35 article no 52 2015

[12] Y Arai-Kichise Y Shiwa H Nagasaki et al ldquoDiscovery ofgenome-wide DNA polymorphisms in a landrace cultivar ofjaponica rice by whole-genome sequencingrdquo Plant and CellPhysiology vol 52 no 2 pp 274ndash282 2011

[13] J L Bennetzen ldquoComparative sequence analysis of plantnuclear genomes microcolinearity and its many exceptionsrdquoPlant Cell vol 12 no 7 pp 1021ndash1029 2000

[14] J Yu S Hu J Wang et al ldquoA draft sequence of the rice genome(Oryza sativa L ssp indica)rdquo Science vol 296 no 5565 pp 79ndash92 2002

[15] S A Goff D Ricke T H Lan et al ldquoA draft sequence of therice genome (Oryza sativa L ssp japonica)rdquo Science vol 296no 5565 pp 92ndash100 2005

[16] M Jain K C Moharana R Shankar R Kumari and RGarg ldquoGenomewide discovery of DNA polymorphisms in ricecultivars with contrasting drought and salinity stress responseand their functional relevancerdquoPlant Biotechnology Journal vol12 no 2 pp 253ndash264 2014

[17] P Rathinasabapathi N Purushothaman V L Ramprasad andM Parani ldquoWhole genome sequencing and analysis of Swarnaa widely cultivated indica rice variety with low glycemic indexrdquoScientific Reports vol 5 Article ID 11303 2015

[18] K Tsukada ldquoVietnam food security in a rice-exporting coun-tryrdquo in The World Food Crisis and the Strategies and Asian RiceExporter S Shigetomi K Kubo and K Tsukada Eds SpotSurvey 32 IDE-JETRO Chiba Japan 2011

[19] K H Trung and L H Ham ldquoSequencing the genomes of anumber of native Vietnamese rice linerdquo in Proceedings of the1st National Proceeding of Crop Science VAAS Hanoi VietnamSeptember 2013

[20] Sequencing Project IRG ldquoThe map-based sequence of the ricegenomerdquo Nature vol 436 pp 793ndash800 2005

[21] Y-J Shen H Jiang J-P Jin et al ldquoDevelopment of genome-wide DNA polymorphism database for map-based cloning ofrice genesrdquo Plant Physiology vol 135 no 3 pp 1198ndash1205 2004

[22] Y Arai-Kichise Y Shiwa K Ebana et al ldquoGenome-wide DNApolymorphisms in seven rice cultivars of temperate and tropicaljaponica groupsrdquo PLoS ONE vol 9 no 1 Article ID e863122014

[23] G K Subbaiyan D L EWaters S K Katiyar A R SadanandaS Vaddadi and R J Henry ldquoGenome-wide DNA polymor-phisms in elite indica rice inbreds discovered by whole-genomesequencingrdquo Plant Biotechnology Journal vol 10 no 6 pp 623ndash634 2012

[24] S G Krishnan D L EWaters and R J Henry ldquoAustralian wildrice reveals pre-domestication origin of polymorphism desertsin rice genomerdquoPLoSONE vol 9 no 6 Article ID e98843 2014

[25] Z Cheng F Dong T Langdon et al ldquoFunctional rice cen-tromeres are marked by a satellite repeat and a centromere-specific retrotransposonrdquoThe Plant Cell vol 14 no 8 pp 1691ndash1704 2002

[26] F A Feltus J Wan S R Schulze J C Estill N Jiang and AH Paterson ldquoAn SNP resource for rice genetics and breedingbased on subspecies Indica and Japonica genome alignmentsrdquoGenome Research vol 14 no 9 pp 1812ndash1819 2004

[27] D L Waters and R J Henry ldquoAustralian wild rice reveals pre-domestication origin of polymorphism deserts in rice genomerdquoPLoS ONE vol 9 no 6 Article ID e98843 2014

[28] J Wakeley ldquoThe excess of transitions among nucleotide substi-tutions new methods of estimating transition bias underscoreits significancerdquo Trends in Ecology amp Evolution vol 11 no 4 pp158ndash162 1996

[29] J Terol M A Naranjo P Ollitrault and M Talon ldquoDevelop-ment of genomic resources for Citrus clementina characteriza-tion of three deep-coverage BAC libraries and analysis of 46000BAC end sequencesrdquo BMC Genomics vol 9 article 423 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Whole-Genome Characteristics and Polymorphic Analysis of ...Thewholegenomesoffivericelandraceshaveyielded high-quality reads, from 112 to 161 million reads per line. Most of the reads

International Journal of Genomics 3

Table 2 Mapping and coverage of the reads from the landraces to Nipponbare reference genome

Landraces indica 12 indica 13 indica 15 japonica 11 japonica 14Total readsa 129251696 112867645 151576754 133602832 161141384 mapped readsb 112667155 101814936 130851417 125253671 151261159Mapped reads () 8717 9021 8633 9375 9387 unmapped readsc 16584541 11052709 20725337 8349161 9880225Unmapped reads () 1283 979 1367 625 613Mean mapping qualityd 4221 4459 4203 4781 4814Coveragee 9130 8918 9256 9550 9649Depthf 2994 2709 3474 3346 4041aThe raw reads in the FASTQ files bThe number of reads mapped to 12 chromosomes in the nucleus cThe number of unmapped reads to both nuclear andorganellar genomes dThe error probability of read mapping scaled by Phred quality eThe breadth of coverage across the nuclear genome fThe sequencingdepth of reads

Table 3 Number of SNP effects annotated by SnpEff SNPs were classified into various genomic regions (intergenic regions intron and exon)and effect terms by typesThe total of intergenic intronic and exonic SNPs was more than the total number of SNPs due to overlapping genemodels in the GFF file

indica 12 indica 13 indica 15 japonica 11 japonica 14Intergenic 1374419 6845 1329647 6946 1527754 6816 488888 6610 454811 6583Genic 633408 3155 584505 3054 713664 3184 250766 3390 236104 3417Intron amp regulatory sequences 288375 4553 279350 4779 320956 4497 109191 4354 101187 4286UTRs 85749 1354 82611 1413 96948 1358 29587 1180 28138 1192CDS 259284 4093 222544 3807 295760 4144 111988 4466 106779 4523Nonsynonymous 151998 5862 131754 5920 172767 5841 66645 5951 63962 5990Synonymous 107286 4138 90790 4080 122993 4159 45343 4049 42817 4010

genomes (mitochondrial and chloroplast) had coverage of94minus100 (Tables S3a and S3b) There was a portion ofreads (017sim026) that could be aligned to both nuclearDNA genome and organelle genomes mitochondrial andchloroplast (Tables S4andashS4e)

32 Detection and Distribution of SNPs and InDels By usingSAMtools and VarScan software the qualified SNPs andInDels were called The distributions of SNPs and InDels bychromosomes of each landrace are shown in Figures 1 and 2

For the indica lines the numbers of SNPs and InDelswere approximately twomillion and three hundred thousandrespectively (Tables S5andashS5c) Accordingly for the japonicalines the numbers of SNPs and InDels were approximatelyseven hundred thousand and one hundred thousand respec-tively (Tables S5d and S5e) For the indica lines chromosomes1 and 9 had the highest and lowest variation rates in termsof both SNP and InDel respectively However there was anexception in indica 12 and the lowest InDel rate was on chro-mosome 10 instead of chromosome 9 (Tables S5andashS5c) Forthe japonica lines the highest SNP rate was on chromosome8 while the lowest SNP rate was on chromosomes 2 (japonica11) and 3 (japonica 14) The highest and lowest InDel rateswere on chromosomes 1 and 5 respectively for both (TablesS5d and S5e)

The distribution of DNA polymorphisms has been exam-ined on each 100 kb nonoverlapping window to obtainaverage densities of SNP and InDels of chromosomes Theaverage densities of SNPs and InDels of indica landraces

were about 25 times that of japonica landraces (Tables S5andashS5e) The average densities of deletions tended to be higherthan those of insertions on all chromosomes (Tables S5andashS5e) Moreover SNP deserts where the SNP densities werebelow 1 SNPkb have been identified with differing sizes(100 kb to 67Mb average of 300 kb) and chromosomallocations Of all landraces there were three SNP deserts oflarger sizes (26MB 08MB and 07Mb) on chromosomes5 4 and 1 respectively (Figure 3) Moreover within all fivelines there is a small SNP desert of 01Mb located fromposition 124 to 125Mb on chromosome 11 in which thereare no genes found We have used the SNP densities per100 kb interval for reconstructing the relationships amongthe rice landraces By calculating the Pearson correlationcoefficient the relationship between the rice lines wasobserved because the patterns grouped the rice landracesinto two major subspecies confirming the classification oflandraces based on the chloroplast DNA presenceabsenceof a deletion in the Pst-12 fragment [19] In detail thelandraces in the same subspecies had a positive correlationclose to one whereas therewas no linear relationship betweenindica and japonica lines with the coefficients around zero(Figure 4)

33 Annotation of SNPs and InDels SNPs and InDels wereannotated against the GFF file of the Nipponbare referencegenome using SnpEff SNPsmostly occurred in the intergenicregions (approximately 680 for indica 660 for japon-ica) respectively (Figure 5 Table 3) For SNPs within genic

4 International Journal of Genomics

300000

200000

100000

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

20000

15000

10000

5000

0 0

SNPs InDels

Chromosomes Chromosomes(a)

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

20000

15000

10000

5000

0

300000

200000

100000

0

SNPs InDels

Chromosomes Chromosomes(b)

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

20000

15000

10000

5000

0

300000

200000

100000

0

SNPs InDels

Chromosomes Chromosomes(c)

80000

60000

40000

20000

8000

6000

4000

2000

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 1200

SNPs InDels

Chromosomes Chromosomes(d)

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

8000

6000

4000

2000

00

80000

60000

40000

20000

0

SNPs InDels

Chromosomes Chromosomes(e)

Figure 1 The number of SNPs and InDels on each chromosome in five landraces in comparison with the Nipponbare reference The 119909-axis represents the chromosomes The 119910-axis represents the number of SNPs (horizontal-line bars) insertions (white bars) and deletions(downward-diagonal bars) The five rice lines were (a) indica 12 (b) indica 13 (c) indica 15 (d) japonica 11 and (e) japonica 14

regions (approximately 320 for indica 340 for japonica)SNPs occurred in introns and regulatory sequences (449 to478 for indica about 430 for japonica) and UTR regions(approximately 1375 for indica and 12 for japonica)

Among CDS regions the split between nonsynonymousand synonymous SNPs was 5875 4125 for indica and597 403 for japonica (Figure 5 Table 3) Similarly of theInDels more than 730 were detected in intergenic regions

Most of the InDels within genic regions were within InDelsor regulatory sequences (more than 620) with 933 to1195 within coding sequences (Figure S2) The length ofinsertions ranged from one to 27 bp while the length ofdeletions detected was up to 41 bp The majority of InDelswere mononucleotide (asymp555) and dinucleotide (asymp1665)In order to provide more insights into the effects of nonsyn-onymous SNPs on the genes the distribution and skewness

International Journal of Genomics 5

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

Chr01

Chr02

Chr03

Chr04

Chr05

Chr06

Chr07

Chr08

Chr09

Chr10

Chr11

Chr12(142749)

(432Mb)

(359Mb)

(364Mb)

(355Mb)

(300Mb)

(312Mb)

(297Mb)

(284Mb)

(230Mb)

(232Mb)

(290Mb)

(275Mb)

(233111)

(204691)

(200651)

(149505)

(147438)

(177397)

(156409)

(151679)

(133830)

(142862)

(167507)

10 20 30 40

Figure 2 Distribution of SNPs between indica 12 and Nipponbare on the 12 chromosomes The 119909-axis shows the physical distance ofchromosome into 100 kb windows The chromosome size is indicated in brackets The 119910-axis represents the number of SNPs per 100 kbThe total of SNPs in each chromosome is shown in parentheses

were calculated to identify outlier genes which possess veryhigh numbers of nonsynonymous SNPs as shown in Figure 6According to the Nipponbare reference genome nearly halfof the 149 outlier genes were retrotransposon proteins (TableS6)

We have also observed that the number of transitionalSNPs (AG and CT) was much higher than that of transver-sions (AC AT CG and GT) for all of the five landraceswith the ratios (TsTv) ranging from 219 (japonica 14) to237 (indica 12) Within transitions the frequency of CTwas slightly higher than those of AT and much larger thantransversion SNPs (Table 4)

4 Discussion

In this study we have sequenced five Vietnamese ricegenomes consisting of three indica and two japonica lan-dracesThe sequence datasets were aligned to theNipponbarereference genome to identify genetic variations The geneticvariation annotation and analysis providednovel insights intothe specific Vietnamese rice landraces which should be agood resource for further molecular breeding in rice Wehave analyzed the sequence datasets of two major groupsof rice landraces indica and japonica through five elitelandraces in Vietnam This provides significant information

6 International Journal of Genomics

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(432Mb)

10 20 30 40

(Mb)N

umbe

r of g

enes

(a)

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(355Mb)

10 20 30

(Mb)

Num

ber o

f gen

es

(b)

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(300Mb)

10 20 30

(Mb)

Num

ber o

f gen

es

(c)

Figure 3 Distribution of ldquoSNP desertsrdquo on (a) chromosome 1 (b) chromosome 4 and (c) chromosome 5 in the five landraces The low-SNPdensity regions are shown as gray bars or vertical lines The vertical-dash intervals indicate the ldquoSNP desertsrdquo occurring in all five landracesThe line charts represent the distribution of genes across the chromosomes The chromosome sizes are indicated in brackets

as to the genetic diversity of the two types of rice lines in thedomestication process The results have further contributedadditional evidence for the transfer of DNA regions in theorganelle into the nucleus in the rice genome Our study hasalso strengthened the classification of the relationship amongthe landraces The detection of DNA polymorphisms can beused to identify novel genes that differentiated between ourlandraces and will serve as a reference for studies relating to

specialty characteristics of Vietnamese rice landraces Thesepolymorphisms are also available for use in marker-assistedselection in rice breeding programs

The whole genomes of five rice landraces have yieldedhigh-quality reads from 112 to 161 million reads per lineMost of the reads (ge8633) were successfully mappedto the Nipponbare reference genome with the breadth ofcoverage of more than 8918 proving that the selected rice

International Journal of Genomics 7

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica15

indica12

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica12

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica15

indica12

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica15

indica12

indica13

minus1 1Value

Color key

0

Chr01 Chr02 Chr03

Chr04 Chr05 Chr06

Chr07 Chr08

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

Chr09

Chr10 Chr11 Chr12

Figure 4 The heatmap and cluster view showing the correlation among the five landraces on the 12 chromosomes Pearson correlationcoefficients were calculated based on the SNP densities per 100 kb on the chromosome

genomes are similar to the reference Interestingly we founda small number of reads aligned with both chromosomes andorganelle genomes including chloroplast and mitochondriaThis phenomenon is known as ldquoorganellar insertionrdquo in thenuclear genome which has been previously reported by theInternational Rice Genome Sequencing Project [20] The

reads mapped to the mitochondria concentrated on chro-mosome 12 (10) and the reads mapped to the chloroplastlocated on chromosome 10 (08) Therefore using NGSthese data are consistent with some previous reports andalso reconfirmed that chromosomes 10 and 12 have had moreinsertions than the others [20]

8 International Journal of Genomics

Intergenic 1374419

Genic 633408

Intron amp regulatory sequences 288375

UTRs85749

CDS259284

Non-synonymous

151998

Synonymous107286

(a)

Intergenic 1329647

Genic 584505

Intron amp regulatory sequences

279350

UTRs 82611

CDS 222544

Non-synonymous

131754

Synonymous90790

(b)

Intergenic 1527754

Genic 713664

Intron amp regulatory sequences

320956

UTRs 96948

CDS 295760

Non-synonymous

172767

Synonymous122993

(c)

Intergenic 488888

Genic 250766

Intron amp regulatory sequences

109191

UTRs29587

CDS 111988

Non-synonymous

66645

Synonymous45343

(d)

Intergenic 454811

Genic 236104

Intron amp regulatory sequences

101187

UTRs 28138

CDS 106779

Non-synonymous

63962

Synonymous42817

(e)

Figure 5 Annotation of single nucleotide polymorphisms (SNPs) between five Vietnamese rice landraces ((a) indica 12 (b) indica 13 (c)indica 15 (d) japonica 11 and (e) japonica 14) and Nipponbare based on the annotations of Nipponbare reference genome

International Journal of Genomics 9

2000018823

2248

1554

1082

825

298 327

478

304239

159 137 99 72 56 36 28 36 30 15 12 11 7 4 9 5 15

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

Number of non-synonymous SNPs (SNPskb)

le2 3 4 5 6

69 7 8 910

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

(a)

16770

1827

1321

903

581

59

473334

214 197134 116 74 61 43 34 19 18 14 11 13 8 7 2 3 4 12

18000

16000

17000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

639 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(b)

19953

2352

1724

1293

924

705

534407

298 236160 127 97 81 52 38 29 31 24 16 17 12 14 7 2 1814

20000

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

6497 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(c)

12133

982

650

96

314 302231 181

98 83 51 46 35 23 22 23 17 11 13 5 5 5 6 5 6

12000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

668 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

ge25

Number of non-synonymous SNPs (SNPskb)

(d)

12121

957

613

5

402254 207 150 106 77 57 52 44 23 30 21 12 10 6 6 6 64 4 3 5 2

12000

14000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

647 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(e)

Figure 6 The distribution and skewness of the nonsynonymous SNPs SNPs per 1 kb in (a) indica 12 (b) indica 13 (c) indica 15 (d) japonica11 and (e) japonica 14 The black dash indicates the outlier value cutoff

Table 4 Base changes in five rice genomes

Landraces indica 12 indica 13 indica 15 japonica 11 japonica 14Transitions

AG 705908 671815 787002 255473 236847CT 707496 672959 787976 255542 237620

TransversionsAC 155310 149265 174014 59111 55410AT 182882 176243 205343 70949 68165GC 101541 95523 113736 39460 37120GT 154639 148304 173286 59011 55639

TsTv 2378 2362 2364 2236 2193

10 International Journal of Genomics

The sequencing depth of approximately 30x has beensufficient for detecting DNA polymorphisms Our results ofvariant calling of indica lines (1914152 to 2241418 SNPs268400 to 303039 InDels) are in agreementwith the previousstudy reporting that the 93-11 indica possesses about 17million SNPs and 480 thousand InDels [21] However thenumbers of SNPs and InDels (about 714 thousand SNPs and102 thousand InDels) of japonica lines were about five andthree times those of Omachi line (132462 SNPs and 35766InDels) but were similar to those of Moroberekan (827448SNPs and 159597 InDels) a tropical japonica cultivar [1222] The distinct difference between tropical and temperatejaponica landraces has been reported by Arai-Kichise et al[22] Further validation is underway to obtain SNP markersfor rice breeding selection

SNP deserts genome regions of SNP rate less than1 SNPkb were identified in all 5 Vietnamese landraces withsizes varying from 100 kb to 67MB The japonica SNPdeserts are longer than those of indica Mostly SNP desertshave been previously found on chromosome 5 [23 24]however we have identified two additional SNP desertswithin all five Vietnamese landraces with the size of 07Mbon chromosome 1 and 08Mb on chromosome 4 Chenget al [25] have reported that SNP deserts were in thevicinity of the centromere on chromosome 5 but far from thecentromere on chromosomes 1 and 4 Therefore the currentresults have supported the hypothesis that SNP-low regionshave not been correlated with low recombination [26] ThecommonSNPdesertsmight include highly conserved regionsamong the five Vietnamese rice landraces They could resultfrom selective sweeps reducing the variants during humanselection and rice domestication [27] These SNP deserts areable to raise fascinating questions for future studies as towhether the persistence of chromosomal regions is randomor special for the individual landrace

The SNP distribution correlation analysis of the fivelandraces has demonstrated the distinct divergence betweenindica and japonica and disclosed the phylogenetic relation-ships among the landraces (Figure 4) thus it is possible tobe exploited for the verification of landrace classificationsThe relationship among the rice landraces in the correlationof chromosomes could be utilized for the genetic linkagedisequilibrium studies

Additionally the phenomenon ldquotransition biasrdquo whichmeans the ratios of transitions (Ts) and transversions (Tv)larger than 1 2 occurred among the five landraces Thetransitional SNPs aremore tolerated than transversional onesduring mutation and natural selection because they are morelikely synonymous in coding protein resulting in conservingthe protein structure [28] Within transitions a number ofboth AG and CT changes have been rather similar Amongtransversions the AT transversions have been shown to behigher than others which was also reported in the genomesof citrus and rice [17 28 29]

In summary by sequencing and aligning five Vietnameserice genomes with the reference genome Nipponbare ourresults have disclosed interesting genetic information suchas SNP and InDel distributions and effects and SNP desertsThree ldquoSNPdesertrdquo regions whichmight result from selective

sweeps in the domestication of rice landraces were alsoobserved in the different chromosomes Furthermore theSNP distribution analysis has revealed the phylogeny ofindica and japonica with distinct classification Further SNPvalidations need to be examined to identify accurate SNPmarkers for molecular breeding

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Hien Trinh and Khoa Truong Nguyen contributed equallyto this work Mario Caccamo Cuong Nguyen and KhuatHuu Trung contributed to the conception and design of thestudy Sarah Ayling Khoa Truong Nguyen and Hien Trinhcarried out the quality control of sequencing data and dataanalysis Tran Dang Xuan La Hoang Anh Can Thu HuongNguyenThuy Diep LamVan Nguyen and Huy Quang Phamconceived the study and designed and performed the dataanalysis Tran Dang Khanh Hien Trinh and Cuong Nguyendrafted and revised the manuscript All authors contributedto the editing of the final version of the manuscript

Acknowledgments

The authors would like to thank Hung Nguyen from MUInformatics Institute University of Missouri Columbia MO65211 USA for his help in developing the homemade Pythonscript to split 100 kb window sizes across the chromosomesThisworkwas supported by theMinistry of Science andTech-nology (MOST) Vietnam and the Genome Analysis Centre(TGAC) Biotechnology and Biological Sciences ResearchCouncil (BBSRC) through a collaboration program betweenVietnam and theUKThe authors are grateful to colleagues inthis program that have made significant contributions to thecollection analysis or interpretation of samples and results

References

[1] Y Kawahara M Bastide J P Hamilton et al ldquoImprovementof the Oryza sativa Nipponbare reference genome using nextgeneration sequence and optical map datardquo Rice vol 6 no 1article 4 2013

[2] R Brenchley M Spannagl M Pfeifer et al ldquoAnalysis of thebread wheat genome using whole-genome shotgun sequenc-ingrdquo Nature vol 491 no 7426 pp 705ndash710 2012

[3] M Kobayashi H Nagasaki V Garcia et al ldquoGenome-wideanalysis of intraspecific dna polymorphism in lsquoMicro-Tomrsquo amodel cultivar of tomato (solanum lycopersicum)rdquo Plant andCell Physiology vol 55 no 2 pp 445ndash454 2014

[4] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[5] M Shimomura H Kanamori S Komatsu et al ldquoThe Glycinemax cv enrei genome for improvement of Japanese soybean

International Journal of Genomics 11

cultivarsrdquo International Journal of Genomics vol 2015 ArticleID 358127 8 pages 2015

[6] D R Bentley ldquoWhole-genome re-sequencingrdquoCurrent Opinionin Genetics and Development vol 16 no 6 pp 545ndash552 2006

[7] OMorozova andMAMarra ldquoApplications of next-generationsequencing technologies in functional genomicsrdquo Genomicsvol 92 no 5 pp 255ndash264 2008

[8] K LMcNally K L Childs R Bohnert et al ldquoGenomewide SNPvariation reveals relationships among landraces and modernvarieties of ricerdquoProceedings of theNational Academy of Sciencesof the United States of America vol 106 no 30 pp 12273ndash122782009

[9] T YamamotoHNagasaki J-I Yonemaru et al ldquoFine definitionof the pedigree haplotypes of closely related rice cultivars bymeans of genome-wide discovery of single-nucleotide polymor-phismsrdquo BMC Genomics vol 11 article 267 2010

[10] X Huang X Wei T Sang et al ldquoGenome-wide asociationstudies of 14 agronomic traits in rice landracesrdquoNatureGeneticsvol 42 no 11 pp 961ndash967 2010

[11] Y Liu X Qi N D Young K M Olsen A L Caicedo and YJia ldquoCharacterization of resistance genes to rice blast fungusMagnaporthe oryzae in a ldquoGreen Revolutionrdquo rice varietyrdquoMolecular Breeding vol 35 article no 52 2015

[12] Y Arai-Kichise Y Shiwa H Nagasaki et al ldquoDiscovery ofgenome-wide DNA polymorphisms in a landrace cultivar ofjaponica rice by whole-genome sequencingrdquo Plant and CellPhysiology vol 52 no 2 pp 274ndash282 2011

[13] J L Bennetzen ldquoComparative sequence analysis of plantnuclear genomes microcolinearity and its many exceptionsrdquoPlant Cell vol 12 no 7 pp 1021ndash1029 2000

[14] J Yu S Hu J Wang et al ldquoA draft sequence of the rice genome(Oryza sativa L ssp indica)rdquo Science vol 296 no 5565 pp 79ndash92 2002

[15] S A Goff D Ricke T H Lan et al ldquoA draft sequence of therice genome (Oryza sativa L ssp japonica)rdquo Science vol 296no 5565 pp 92ndash100 2005

[16] M Jain K C Moharana R Shankar R Kumari and RGarg ldquoGenomewide discovery of DNA polymorphisms in ricecultivars with contrasting drought and salinity stress responseand their functional relevancerdquoPlant Biotechnology Journal vol12 no 2 pp 253ndash264 2014

[17] P Rathinasabapathi N Purushothaman V L Ramprasad andM Parani ldquoWhole genome sequencing and analysis of Swarnaa widely cultivated indica rice variety with low glycemic indexrdquoScientific Reports vol 5 Article ID 11303 2015

[18] K Tsukada ldquoVietnam food security in a rice-exporting coun-tryrdquo in The World Food Crisis and the Strategies and Asian RiceExporter S Shigetomi K Kubo and K Tsukada Eds SpotSurvey 32 IDE-JETRO Chiba Japan 2011

[19] K H Trung and L H Ham ldquoSequencing the genomes of anumber of native Vietnamese rice linerdquo in Proceedings of the1st National Proceeding of Crop Science VAAS Hanoi VietnamSeptember 2013

[20] Sequencing Project IRG ldquoThe map-based sequence of the ricegenomerdquo Nature vol 436 pp 793ndash800 2005

[21] Y-J Shen H Jiang J-P Jin et al ldquoDevelopment of genome-wide DNA polymorphism database for map-based cloning ofrice genesrdquo Plant Physiology vol 135 no 3 pp 1198ndash1205 2004

[22] Y Arai-Kichise Y Shiwa K Ebana et al ldquoGenome-wide DNApolymorphisms in seven rice cultivars of temperate and tropicaljaponica groupsrdquo PLoS ONE vol 9 no 1 Article ID e863122014

[23] G K Subbaiyan D L EWaters S K Katiyar A R SadanandaS Vaddadi and R J Henry ldquoGenome-wide DNA polymor-phisms in elite indica rice inbreds discovered by whole-genomesequencingrdquo Plant Biotechnology Journal vol 10 no 6 pp 623ndash634 2012

[24] S G Krishnan D L EWaters and R J Henry ldquoAustralian wildrice reveals pre-domestication origin of polymorphism desertsin rice genomerdquoPLoSONE vol 9 no 6 Article ID e98843 2014

[25] Z Cheng F Dong T Langdon et al ldquoFunctional rice cen-tromeres are marked by a satellite repeat and a centromere-specific retrotransposonrdquoThe Plant Cell vol 14 no 8 pp 1691ndash1704 2002

[26] F A Feltus J Wan S R Schulze J C Estill N Jiang and AH Paterson ldquoAn SNP resource for rice genetics and breedingbased on subspecies Indica and Japonica genome alignmentsrdquoGenome Research vol 14 no 9 pp 1812ndash1819 2004

[27] D L Waters and R J Henry ldquoAustralian wild rice reveals pre-domestication origin of polymorphism deserts in rice genomerdquoPLoS ONE vol 9 no 6 Article ID e98843 2014

[28] J Wakeley ldquoThe excess of transitions among nucleotide substi-tutions new methods of estimating transition bias underscoreits significancerdquo Trends in Ecology amp Evolution vol 11 no 4 pp158ndash162 1996

[29] J Terol M A Naranjo P Ollitrault and M Talon ldquoDevelop-ment of genomic resources for Citrus clementina characteriza-tion of three deep-coverage BAC libraries and analysis of 46000BAC end sequencesrdquo BMC Genomics vol 9 article 423 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Whole-Genome Characteristics and Polymorphic Analysis of ...Thewholegenomesoffivericelandraceshaveyielded high-quality reads, from 112 to 161 million reads per line. Most of the reads

4 International Journal of Genomics

300000

200000

100000

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

20000

15000

10000

5000

0 0

SNPs InDels

Chromosomes Chromosomes(a)

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

20000

15000

10000

5000

0

300000

200000

100000

0

SNPs InDels

Chromosomes Chromosomes(b)

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

20000

15000

10000

5000

0

300000

200000

100000

0

SNPs InDels

Chromosomes Chromosomes(c)

80000

60000

40000

20000

8000

6000

4000

2000

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 1200

SNPs InDels

Chromosomes Chromosomes(d)

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12

8000

6000

4000

2000

00

80000

60000

40000

20000

0

SNPs InDels

Chromosomes Chromosomes(e)

Figure 1 The number of SNPs and InDels on each chromosome in five landraces in comparison with the Nipponbare reference The 119909-axis represents the chromosomes The 119910-axis represents the number of SNPs (horizontal-line bars) insertions (white bars) and deletions(downward-diagonal bars) The five rice lines were (a) indica 12 (b) indica 13 (c) indica 15 (d) japonica 11 and (e) japonica 14

regions (approximately 320 for indica 340 for japonica)SNPs occurred in introns and regulatory sequences (449 to478 for indica about 430 for japonica) and UTR regions(approximately 1375 for indica and 12 for japonica)

Among CDS regions the split between nonsynonymousand synonymous SNPs was 5875 4125 for indica and597 403 for japonica (Figure 5 Table 3) Similarly of theInDels more than 730 were detected in intergenic regions

Most of the InDels within genic regions were within InDelsor regulatory sequences (more than 620) with 933 to1195 within coding sequences (Figure S2) The length ofinsertions ranged from one to 27 bp while the length ofdeletions detected was up to 41 bp The majority of InDelswere mononucleotide (asymp555) and dinucleotide (asymp1665)In order to provide more insights into the effects of nonsyn-onymous SNPs on the genes the distribution and skewness

International Journal of Genomics 5

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

Chr01

Chr02

Chr03

Chr04

Chr05

Chr06

Chr07

Chr08

Chr09

Chr10

Chr11

Chr12(142749)

(432Mb)

(359Mb)

(364Mb)

(355Mb)

(300Mb)

(312Mb)

(297Mb)

(284Mb)

(230Mb)

(232Mb)

(290Mb)

(275Mb)

(233111)

(204691)

(200651)

(149505)

(147438)

(177397)

(156409)

(151679)

(133830)

(142862)

(167507)

10 20 30 40

Figure 2 Distribution of SNPs between indica 12 and Nipponbare on the 12 chromosomes The 119909-axis shows the physical distance ofchromosome into 100 kb windows The chromosome size is indicated in brackets The 119910-axis represents the number of SNPs per 100 kbThe total of SNPs in each chromosome is shown in parentheses

were calculated to identify outlier genes which possess veryhigh numbers of nonsynonymous SNPs as shown in Figure 6According to the Nipponbare reference genome nearly halfof the 149 outlier genes were retrotransposon proteins (TableS6)

We have also observed that the number of transitionalSNPs (AG and CT) was much higher than that of transver-sions (AC AT CG and GT) for all of the five landraceswith the ratios (TsTv) ranging from 219 (japonica 14) to237 (indica 12) Within transitions the frequency of CTwas slightly higher than those of AT and much larger thantransversion SNPs (Table 4)

4 Discussion

In this study we have sequenced five Vietnamese ricegenomes consisting of three indica and two japonica lan-dracesThe sequence datasets were aligned to theNipponbarereference genome to identify genetic variations The geneticvariation annotation and analysis providednovel insights intothe specific Vietnamese rice landraces which should be agood resource for further molecular breeding in rice Wehave analyzed the sequence datasets of two major groupsof rice landraces indica and japonica through five elitelandraces in Vietnam This provides significant information

6 International Journal of Genomics

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(432Mb)

10 20 30 40

(Mb)N

umbe

r of g

enes

(a)

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(355Mb)

10 20 30

(Mb)

Num

ber o

f gen

es

(b)

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(300Mb)

10 20 30

(Mb)

Num

ber o

f gen

es

(c)

Figure 3 Distribution of ldquoSNP desertsrdquo on (a) chromosome 1 (b) chromosome 4 and (c) chromosome 5 in the five landraces The low-SNPdensity regions are shown as gray bars or vertical lines The vertical-dash intervals indicate the ldquoSNP desertsrdquo occurring in all five landracesThe line charts represent the distribution of genes across the chromosomes The chromosome sizes are indicated in brackets

as to the genetic diversity of the two types of rice lines in thedomestication process The results have further contributedadditional evidence for the transfer of DNA regions in theorganelle into the nucleus in the rice genome Our study hasalso strengthened the classification of the relationship amongthe landraces The detection of DNA polymorphisms can beused to identify novel genes that differentiated between ourlandraces and will serve as a reference for studies relating to

specialty characteristics of Vietnamese rice landraces Thesepolymorphisms are also available for use in marker-assistedselection in rice breeding programs

The whole genomes of five rice landraces have yieldedhigh-quality reads from 112 to 161 million reads per lineMost of the reads (ge8633) were successfully mappedto the Nipponbare reference genome with the breadth ofcoverage of more than 8918 proving that the selected rice

International Journal of Genomics 7

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica15

indica12

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica12

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica15

indica12

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica15

indica12

indica13

minus1 1Value

Color key

0

Chr01 Chr02 Chr03

Chr04 Chr05 Chr06

Chr07 Chr08

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

Chr09

Chr10 Chr11 Chr12

Figure 4 The heatmap and cluster view showing the correlation among the five landraces on the 12 chromosomes Pearson correlationcoefficients were calculated based on the SNP densities per 100 kb on the chromosome

genomes are similar to the reference Interestingly we founda small number of reads aligned with both chromosomes andorganelle genomes including chloroplast and mitochondriaThis phenomenon is known as ldquoorganellar insertionrdquo in thenuclear genome which has been previously reported by theInternational Rice Genome Sequencing Project [20] The

reads mapped to the mitochondria concentrated on chro-mosome 12 (10) and the reads mapped to the chloroplastlocated on chromosome 10 (08) Therefore using NGSthese data are consistent with some previous reports andalso reconfirmed that chromosomes 10 and 12 have had moreinsertions than the others [20]

8 International Journal of Genomics

Intergenic 1374419

Genic 633408

Intron amp regulatory sequences 288375

UTRs85749

CDS259284

Non-synonymous

151998

Synonymous107286

(a)

Intergenic 1329647

Genic 584505

Intron amp regulatory sequences

279350

UTRs 82611

CDS 222544

Non-synonymous

131754

Synonymous90790

(b)

Intergenic 1527754

Genic 713664

Intron amp regulatory sequences

320956

UTRs 96948

CDS 295760

Non-synonymous

172767

Synonymous122993

(c)

Intergenic 488888

Genic 250766

Intron amp regulatory sequences

109191

UTRs29587

CDS 111988

Non-synonymous

66645

Synonymous45343

(d)

Intergenic 454811

Genic 236104

Intron amp regulatory sequences

101187

UTRs 28138

CDS 106779

Non-synonymous

63962

Synonymous42817

(e)

Figure 5 Annotation of single nucleotide polymorphisms (SNPs) between five Vietnamese rice landraces ((a) indica 12 (b) indica 13 (c)indica 15 (d) japonica 11 and (e) japonica 14) and Nipponbare based on the annotations of Nipponbare reference genome

International Journal of Genomics 9

2000018823

2248

1554

1082

825

298 327

478

304239

159 137 99 72 56 36 28 36 30 15 12 11 7 4 9 5 15

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

Number of non-synonymous SNPs (SNPskb)

le2 3 4 5 6

69 7 8 910

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

(a)

16770

1827

1321

903

581

59

473334

214 197134 116 74 61 43 34 19 18 14 11 13 8 7 2 3 4 12

18000

16000

17000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

639 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(b)

19953

2352

1724

1293

924

705

534407

298 236160 127 97 81 52 38 29 31 24 16 17 12 14 7 2 1814

20000

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

6497 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(c)

12133

982

650

96

314 302231 181

98 83 51 46 35 23 22 23 17 11 13 5 5 5 6 5 6

12000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

668 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

ge25

Number of non-synonymous SNPs (SNPskb)

(d)

12121

957

613

5

402254 207 150 106 77 57 52 44 23 30 21 12 10 6 6 6 64 4 3 5 2

12000

14000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

647 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(e)

Figure 6 The distribution and skewness of the nonsynonymous SNPs SNPs per 1 kb in (a) indica 12 (b) indica 13 (c) indica 15 (d) japonica11 and (e) japonica 14 The black dash indicates the outlier value cutoff

Table 4 Base changes in five rice genomes

Landraces indica 12 indica 13 indica 15 japonica 11 japonica 14Transitions

AG 705908 671815 787002 255473 236847CT 707496 672959 787976 255542 237620

TransversionsAC 155310 149265 174014 59111 55410AT 182882 176243 205343 70949 68165GC 101541 95523 113736 39460 37120GT 154639 148304 173286 59011 55639

TsTv 2378 2362 2364 2236 2193

10 International Journal of Genomics

The sequencing depth of approximately 30x has beensufficient for detecting DNA polymorphisms Our results ofvariant calling of indica lines (1914152 to 2241418 SNPs268400 to 303039 InDels) are in agreementwith the previousstudy reporting that the 93-11 indica possesses about 17million SNPs and 480 thousand InDels [21] However thenumbers of SNPs and InDels (about 714 thousand SNPs and102 thousand InDels) of japonica lines were about five andthree times those of Omachi line (132462 SNPs and 35766InDels) but were similar to those of Moroberekan (827448SNPs and 159597 InDels) a tropical japonica cultivar [1222] The distinct difference between tropical and temperatejaponica landraces has been reported by Arai-Kichise et al[22] Further validation is underway to obtain SNP markersfor rice breeding selection

SNP deserts genome regions of SNP rate less than1 SNPkb were identified in all 5 Vietnamese landraces withsizes varying from 100 kb to 67MB The japonica SNPdeserts are longer than those of indica Mostly SNP desertshave been previously found on chromosome 5 [23 24]however we have identified two additional SNP desertswithin all five Vietnamese landraces with the size of 07Mbon chromosome 1 and 08Mb on chromosome 4 Chenget al [25] have reported that SNP deserts were in thevicinity of the centromere on chromosome 5 but far from thecentromere on chromosomes 1 and 4 Therefore the currentresults have supported the hypothesis that SNP-low regionshave not been correlated with low recombination [26] ThecommonSNPdesertsmight include highly conserved regionsamong the five Vietnamese rice landraces They could resultfrom selective sweeps reducing the variants during humanselection and rice domestication [27] These SNP deserts areable to raise fascinating questions for future studies as towhether the persistence of chromosomal regions is randomor special for the individual landrace

The SNP distribution correlation analysis of the fivelandraces has demonstrated the distinct divergence betweenindica and japonica and disclosed the phylogenetic relation-ships among the landraces (Figure 4) thus it is possible tobe exploited for the verification of landrace classificationsThe relationship among the rice landraces in the correlationof chromosomes could be utilized for the genetic linkagedisequilibrium studies

Additionally the phenomenon ldquotransition biasrdquo whichmeans the ratios of transitions (Ts) and transversions (Tv)larger than 1 2 occurred among the five landraces Thetransitional SNPs aremore tolerated than transversional onesduring mutation and natural selection because they are morelikely synonymous in coding protein resulting in conservingthe protein structure [28] Within transitions a number ofboth AG and CT changes have been rather similar Amongtransversions the AT transversions have been shown to behigher than others which was also reported in the genomesof citrus and rice [17 28 29]

In summary by sequencing and aligning five Vietnameserice genomes with the reference genome Nipponbare ourresults have disclosed interesting genetic information suchas SNP and InDel distributions and effects and SNP desertsThree ldquoSNPdesertrdquo regions whichmight result from selective

sweeps in the domestication of rice landraces were alsoobserved in the different chromosomes Furthermore theSNP distribution analysis has revealed the phylogeny ofindica and japonica with distinct classification Further SNPvalidations need to be examined to identify accurate SNPmarkers for molecular breeding

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Hien Trinh and Khoa Truong Nguyen contributed equallyto this work Mario Caccamo Cuong Nguyen and KhuatHuu Trung contributed to the conception and design of thestudy Sarah Ayling Khoa Truong Nguyen and Hien Trinhcarried out the quality control of sequencing data and dataanalysis Tran Dang Xuan La Hoang Anh Can Thu HuongNguyenThuy Diep LamVan Nguyen and Huy Quang Phamconceived the study and designed and performed the dataanalysis Tran Dang Khanh Hien Trinh and Cuong Nguyendrafted and revised the manuscript All authors contributedto the editing of the final version of the manuscript

Acknowledgments

The authors would like to thank Hung Nguyen from MUInformatics Institute University of Missouri Columbia MO65211 USA for his help in developing the homemade Pythonscript to split 100 kb window sizes across the chromosomesThisworkwas supported by theMinistry of Science andTech-nology (MOST) Vietnam and the Genome Analysis Centre(TGAC) Biotechnology and Biological Sciences ResearchCouncil (BBSRC) through a collaboration program betweenVietnam and theUKThe authors are grateful to colleagues inthis program that have made significant contributions to thecollection analysis or interpretation of samples and results

References

[1] Y Kawahara M Bastide J P Hamilton et al ldquoImprovementof the Oryza sativa Nipponbare reference genome using nextgeneration sequence and optical map datardquo Rice vol 6 no 1article 4 2013

[2] R Brenchley M Spannagl M Pfeifer et al ldquoAnalysis of thebread wheat genome using whole-genome shotgun sequenc-ingrdquo Nature vol 491 no 7426 pp 705ndash710 2012

[3] M Kobayashi H Nagasaki V Garcia et al ldquoGenome-wideanalysis of intraspecific dna polymorphism in lsquoMicro-Tomrsquo amodel cultivar of tomato (solanum lycopersicum)rdquo Plant andCell Physiology vol 55 no 2 pp 445ndash454 2014

[4] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[5] M Shimomura H Kanamori S Komatsu et al ldquoThe Glycinemax cv enrei genome for improvement of Japanese soybean

International Journal of Genomics 11

cultivarsrdquo International Journal of Genomics vol 2015 ArticleID 358127 8 pages 2015

[6] D R Bentley ldquoWhole-genome re-sequencingrdquoCurrent Opinionin Genetics and Development vol 16 no 6 pp 545ndash552 2006

[7] OMorozova andMAMarra ldquoApplications of next-generationsequencing technologies in functional genomicsrdquo Genomicsvol 92 no 5 pp 255ndash264 2008

[8] K LMcNally K L Childs R Bohnert et al ldquoGenomewide SNPvariation reveals relationships among landraces and modernvarieties of ricerdquoProceedings of theNational Academy of Sciencesof the United States of America vol 106 no 30 pp 12273ndash122782009

[9] T YamamotoHNagasaki J-I Yonemaru et al ldquoFine definitionof the pedigree haplotypes of closely related rice cultivars bymeans of genome-wide discovery of single-nucleotide polymor-phismsrdquo BMC Genomics vol 11 article 267 2010

[10] X Huang X Wei T Sang et al ldquoGenome-wide asociationstudies of 14 agronomic traits in rice landracesrdquoNatureGeneticsvol 42 no 11 pp 961ndash967 2010

[11] Y Liu X Qi N D Young K M Olsen A L Caicedo and YJia ldquoCharacterization of resistance genes to rice blast fungusMagnaporthe oryzae in a ldquoGreen Revolutionrdquo rice varietyrdquoMolecular Breeding vol 35 article no 52 2015

[12] Y Arai-Kichise Y Shiwa H Nagasaki et al ldquoDiscovery ofgenome-wide DNA polymorphisms in a landrace cultivar ofjaponica rice by whole-genome sequencingrdquo Plant and CellPhysiology vol 52 no 2 pp 274ndash282 2011

[13] J L Bennetzen ldquoComparative sequence analysis of plantnuclear genomes microcolinearity and its many exceptionsrdquoPlant Cell vol 12 no 7 pp 1021ndash1029 2000

[14] J Yu S Hu J Wang et al ldquoA draft sequence of the rice genome(Oryza sativa L ssp indica)rdquo Science vol 296 no 5565 pp 79ndash92 2002

[15] S A Goff D Ricke T H Lan et al ldquoA draft sequence of therice genome (Oryza sativa L ssp japonica)rdquo Science vol 296no 5565 pp 92ndash100 2005

[16] M Jain K C Moharana R Shankar R Kumari and RGarg ldquoGenomewide discovery of DNA polymorphisms in ricecultivars with contrasting drought and salinity stress responseand their functional relevancerdquoPlant Biotechnology Journal vol12 no 2 pp 253ndash264 2014

[17] P Rathinasabapathi N Purushothaman V L Ramprasad andM Parani ldquoWhole genome sequencing and analysis of Swarnaa widely cultivated indica rice variety with low glycemic indexrdquoScientific Reports vol 5 Article ID 11303 2015

[18] K Tsukada ldquoVietnam food security in a rice-exporting coun-tryrdquo in The World Food Crisis and the Strategies and Asian RiceExporter S Shigetomi K Kubo and K Tsukada Eds SpotSurvey 32 IDE-JETRO Chiba Japan 2011

[19] K H Trung and L H Ham ldquoSequencing the genomes of anumber of native Vietnamese rice linerdquo in Proceedings of the1st National Proceeding of Crop Science VAAS Hanoi VietnamSeptember 2013

[20] Sequencing Project IRG ldquoThe map-based sequence of the ricegenomerdquo Nature vol 436 pp 793ndash800 2005

[21] Y-J Shen H Jiang J-P Jin et al ldquoDevelopment of genome-wide DNA polymorphism database for map-based cloning ofrice genesrdquo Plant Physiology vol 135 no 3 pp 1198ndash1205 2004

[22] Y Arai-Kichise Y Shiwa K Ebana et al ldquoGenome-wide DNApolymorphisms in seven rice cultivars of temperate and tropicaljaponica groupsrdquo PLoS ONE vol 9 no 1 Article ID e863122014

[23] G K Subbaiyan D L EWaters S K Katiyar A R SadanandaS Vaddadi and R J Henry ldquoGenome-wide DNA polymor-phisms in elite indica rice inbreds discovered by whole-genomesequencingrdquo Plant Biotechnology Journal vol 10 no 6 pp 623ndash634 2012

[24] S G Krishnan D L EWaters and R J Henry ldquoAustralian wildrice reveals pre-domestication origin of polymorphism desertsin rice genomerdquoPLoSONE vol 9 no 6 Article ID e98843 2014

[25] Z Cheng F Dong T Langdon et al ldquoFunctional rice cen-tromeres are marked by a satellite repeat and a centromere-specific retrotransposonrdquoThe Plant Cell vol 14 no 8 pp 1691ndash1704 2002

[26] F A Feltus J Wan S R Schulze J C Estill N Jiang and AH Paterson ldquoAn SNP resource for rice genetics and breedingbased on subspecies Indica and Japonica genome alignmentsrdquoGenome Research vol 14 no 9 pp 1812ndash1819 2004

[27] D L Waters and R J Henry ldquoAustralian wild rice reveals pre-domestication origin of polymorphism deserts in rice genomerdquoPLoS ONE vol 9 no 6 Article ID e98843 2014

[28] J Wakeley ldquoThe excess of transitions among nucleotide substi-tutions new methods of estimating transition bias underscoreits significancerdquo Trends in Ecology amp Evolution vol 11 no 4 pp158ndash162 1996

[29] J Terol M A Naranjo P Ollitrault and M Talon ldquoDevelop-ment of genomic resources for Citrus clementina characteriza-tion of three deep-coverage BAC libraries and analysis of 46000BAC end sequencesrdquo BMC Genomics vol 9 article 423 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Whole-Genome Characteristics and Polymorphic Analysis of ...Thewholegenomesoffivericelandraceshaveyielded high-quality reads, from 112 to 161 million reads per line. Most of the reads

International Journal of Genomics 5

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

2000

1000

0

Chr01

Chr02

Chr03

Chr04

Chr05

Chr06

Chr07

Chr08

Chr09

Chr10

Chr11

Chr12(142749)

(432Mb)

(359Mb)

(364Mb)

(355Mb)

(300Mb)

(312Mb)

(297Mb)

(284Mb)

(230Mb)

(232Mb)

(290Mb)

(275Mb)

(233111)

(204691)

(200651)

(149505)

(147438)

(177397)

(156409)

(151679)

(133830)

(142862)

(167507)

10 20 30 40

Figure 2 Distribution of SNPs between indica 12 and Nipponbare on the 12 chromosomes The 119909-axis shows the physical distance ofchromosome into 100 kb windows The chromosome size is indicated in brackets The 119910-axis represents the number of SNPs per 100 kbThe total of SNPs in each chromosome is shown in parentheses

were calculated to identify outlier genes which possess veryhigh numbers of nonsynonymous SNPs as shown in Figure 6According to the Nipponbare reference genome nearly halfof the 149 outlier genes were retrotransposon proteins (TableS6)

We have also observed that the number of transitionalSNPs (AG and CT) was much higher than that of transver-sions (AC AT CG and GT) for all of the five landraceswith the ratios (TsTv) ranging from 219 (japonica 14) to237 (indica 12) Within transitions the frequency of CTwas slightly higher than those of AT and much larger thantransversion SNPs (Table 4)

4 Discussion

In this study we have sequenced five Vietnamese ricegenomes consisting of three indica and two japonica lan-dracesThe sequence datasets were aligned to theNipponbarereference genome to identify genetic variations The geneticvariation annotation and analysis providednovel insights intothe specific Vietnamese rice landraces which should be agood resource for further molecular breeding in rice Wehave analyzed the sequence datasets of two major groupsof rice landraces indica and japonica through five elitelandraces in Vietnam This provides significant information

6 International Journal of Genomics

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(432Mb)

10 20 30 40

(Mb)N

umbe

r of g

enes

(a)

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(355Mb)

10 20 30

(Mb)

Num

ber o

f gen

es

(b)

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(300Mb)

10 20 30

(Mb)

Num

ber o

f gen

es

(c)

Figure 3 Distribution of ldquoSNP desertsrdquo on (a) chromosome 1 (b) chromosome 4 and (c) chromosome 5 in the five landraces The low-SNPdensity regions are shown as gray bars or vertical lines The vertical-dash intervals indicate the ldquoSNP desertsrdquo occurring in all five landracesThe line charts represent the distribution of genes across the chromosomes The chromosome sizes are indicated in brackets

as to the genetic diversity of the two types of rice lines in thedomestication process The results have further contributedadditional evidence for the transfer of DNA regions in theorganelle into the nucleus in the rice genome Our study hasalso strengthened the classification of the relationship amongthe landraces The detection of DNA polymorphisms can beused to identify novel genes that differentiated between ourlandraces and will serve as a reference for studies relating to

specialty characteristics of Vietnamese rice landraces Thesepolymorphisms are also available for use in marker-assistedselection in rice breeding programs

The whole genomes of five rice landraces have yieldedhigh-quality reads from 112 to 161 million reads per lineMost of the reads (ge8633) were successfully mappedto the Nipponbare reference genome with the breadth ofcoverage of more than 8918 proving that the selected rice

International Journal of Genomics 7

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica15

indica12

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica12

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica15

indica12

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica15

indica12

indica13

minus1 1Value

Color key

0

Chr01 Chr02 Chr03

Chr04 Chr05 Chr06

Chr07 Chr08

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

Chr09

Chr10 Chr11 Chr12

Figure 4 The heatmap and cluster view showing the correlation among the five landraces on the 12 chromosomes Pearson correlationcoefficients were calculated based on the SNP densities per 100 kb on the chromosome

genomes are similar to the reference Interestingly we founda small number of reads aligned with both chromosomes andorganelle genomes including chloroplast and mitochondriaThis phenomenon is known as ldquoorganellar insertionrdquo in thenuclear genome which has been previously reported by theInternational Rice Genome Sequencing Project [20] The

reads mapped to the mitochondria concentrated on chro-mosome 12 (10) and the reads mapped to the chloroplastlocated on chromosome 10 (08) Therefore using NGSthese data are consistent with some previous reports andalso reconfirmed that chromosomes 10 and 12 have had moreinsertions than the others [20]

8 International Journal of Genomics

Intergenic 1374419

Genic 633408

Intron amp regulatory sequences 288375

UTRs85749

CDS259284

Non-synonymous

151998

Synonymous107286

(a)

Intergenic 1329647

Genic 584505

Intron amp regulatory sequences

279350

UTRs 82611

CDS 222544

Non-synonymous

131754

Synonymous90790

(b)

Intergenic 1527754

Genic 713664

Intron amp regulatory sequences

320956

UTRs 96948

CDS 295760

Non-synonymous

172767

Synonymous122993

(c)

Intergenic 488888

Genic 250766

Intron amp regulatory sequences

109191

UTRs29587

CDS 111988

Non-synonymous

66645

Synonymous45343

(d)

Intergenic 454811

Genic 236104

Intron amp regulatory sequences

101187

UTRs 28138

CDS 106779

Non-synonymous

63962

Synonymous42817

(e)

Figure 5 Annotation of single nucleotide polymorphisms (SNPs) between five Vietnamese rice landraces ((a) indica 12 (b) indica 13 (c)indica 15 (d) japonica 11 and (e) japonica 14) and Nipponbare based on the annotations of Nipponbare reference genome

International Journal of Genomics 9

2000018823

2248

1554

1082

825

298 327

478

304239

159 137 99 72 56 36 28 36 30 15 12 11 7 4 9 5 15

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

Number of non-synonymous SNPs (SNPskb)

le2 3 4 5 6

69 7 8 910

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

(a)

16770

1827

1321

903

581

59

473334

214 197134 116 74 61 43 34 19 18 14 11 13 8 7 2 3 4 12

18000

16000

17000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

639 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(b)

19953

2352

1724

1293

924

705

534407

298 236160 127 97 81 52 38 29 31 24 16 17 12 14 7 2 1814

20000

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

6497 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(c)

12133

982

650

96

314 302231 181

98 83 51 46 35 23 22 23 17 11 13 5 5 5 6 5 6

12000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

668 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

ge25

Number of non-synonymous SNPs (SNPskb)

(d)

12121

957

613

5

402254 207 150 106 77 57 52 44 23 30 21 12 10 6 6 6 64 4 3 5 2

12000

14000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

647 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(e)

Figure 6 The distribution and skewness of the nonsynonymous SNPs SNPs per 1 kb in (a) indica 12 (b) indica 13 (c) indica 15 (d) japonica11 and (e) japonica 14 The black dash indicates the outlier value cutoff

Table 4 Base changes in five rice genomes

Landraces indica 12 indica 13 indica 15 japonica 11 japonica 14Transitions

AG 705908 671815 787002 255473 236847CT 707496 672959 787976 255542 237620

TransversionsAC 155310 149265 174014 59111 55410AT 182882 176243 205343 70949 68165GC 101541 95523 113736 39460 37120GT 154639 148304 173286 59011 55639

TsTv 2378 2362 2364 2236 2193

10 International Journal of Genomics

The sequencing depth of approximately 30x has beensufficient for detecting DNA polymorphisms Our results ofvariant calling of indica lines (1914152 to 2241418 SNPs268400 to 303039 InDels) are in agreementwith the previousstudy reporting that the 93-11 indica possesses about 17million SNPs and 480 thousand InDels [21] However thenumbers of SNPs and InDels (about 714 thousand SNPs and102 thousand InDels) of japonica lines were about five andthree times those of Omachi line (132462 SNPs and 35766InDels) but were similar to those of Moroberekan (827448SNPs and 159597 InDels) a tropical japonica cultivar [1222] The distinct difference between tropical and temperatejaponica landraces has been reported by Arai-Kichise et al[22] Further validation is underway to obtain SNP markersfor rice breeding selection

SNP deserts genome regions of SNP rate less than1 SNPkb were identified in all 5 Vietnamese landraces withsizes varying from 100 kb to 67MB The japonica SNPdeserts are longer than those of indica Mostly SNP desertshave been previously found on chromosome 5 [23 24]however we have identified two additional SNP desertswithin all five Vietnamese landraces with the size of 07Mbon chromosome 1 and 08Mb on chromosome 4 Chenget al [25] have reported that SNP deserts were in thevicinity of the centromere on chromosome 5 but far from thecentromere on chromosomes 1 and 4 Therefore the currentresults have supported the hypothesis that SNP-low regionshave not been correlated with low recombination [26] ThecommonSNPdesertsmight include highly conserved regionsamong the five Vietnamese rice landraces They could resultfrom selective sweeps reducing the variants during humanselection and rice domestication [27] These SNP deserts areable to raise fascinating questions for future studies as towhether the persistence of chromosomal regions is randomor special for the individual landrace

The SNP distribution correlation analysis of the fivelandraces has demonstrated the distinct divergence betweenindica and japonica and disclosed the phylogenetic relation-ships among the landraces (Figure 4) thus it is possible tobe exploited for the verification of landrace classificationsThe relationship among the rice landraces in the correlationof chromosomes could be utilized for the genetic linkagedisequilibrium studies

Additionally the phenomenon ldquotransition biasrdquo whichmeans the ratios of transitions (Ts) and transversions (Tv)larger than 1 2 occurred among the five landraces Thetransitional SNPs aremore tolerated than transversional onesduring mutation and natural selection because they are morelikely synonymous in coding protein resulting in conservingthe protein structure [28] Within transitions a number ofboth AG and CT changes have been rather similar Amongtransversions the AT transversions have been shown to behigher than others which was also reported in the genomesof citrus and rice [17 28 29]

In summary by sequencing and aligning five Vietnameserice genomes with the reference genome Nipponbare ourresults have disclosed interesting genetic information suchas SNP and InDel distributions and effects and SNP desertsThree ldquoSNPdesertrdquo regions whichmight result from selective

sweeps in the domestication of rice landraces were alsoobserved in the different chromosomes Furthermore theSNP distribution analysis has revealed the phylogeny ofindica and japonica with distinct classification Further SNPvalidations need to be examined to identify accurate SNPmarkers for molecular breeding

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Hien Trinh and Khoa Truong Nguyen contributed equallyto this work Mario Caccamo Cuong Nguyen and KhuatHuu Trung contributed to the conception and design of thestudy Sarah Ayling Khoa Truong Nguyen and Hien Trinhcarried out the quality control of sequencing data and dataanalysis Tran Dang Xuan La Hoang Anh Can Thu HuongNguyenThuy Diep LamVan Nguyen and Huy Quang Phamconceived the study and designed and performed the dataanalysis Tran Dang Khanh Hien Trinh and Cuong Nguyendrafted and revised the manuscript All authors contributedto the editing of the final version of the manuscript

Acknowledgments

The authors would like to thank Hung Nguyen from MUInformatics Institute University of Missouri Columbia MO65211 USA for his help in developing the homemade Pythonscript to split 100 kb window sizes across the chromosomesThisworkwas supported by theMinistry of Science andTech-nology (MOST) Vietnam and the Genome Analysis Centre(TGAC) Biotechnology and Biological Sciences ResearchCouncil (BBSRC) through a collaboration program betweenVietnam and theUKThe authors are grateful to colleagues inthis program that have made significant contributions to thecollection analysis or interpretation of samples and results

References

[1] Y Kawahara M Bastide J P Hamilton et al ldquoImprovementof the Oryza sativa Nipponbare reference genome using nextgeneration sequence and optical map datardquo Rice vol 6 no 1article 4 2013

[2] R Brenchley M Spannagl M Pfeifer et al ldquoAnalysis of thebread wheat genome using whole-genome shotgun sequenc-ingrdquo Nature vol 491 no 7426 pp 705ndash710 2012

[3] M Kobayashi H Nagasaki V Garcia et al ldquoGenome-wideanalysis of intraspecific dna polymorphism in lsquoMicro-Tomrsquo amodel cultivar of tomato (solanum lycopersicum)rdquo Plant andCell Physiology vol 55 no 2 pp 445ndash454 2014

[4] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[5] M Shimomura H Kanamori S Komatsu et al ldquoThe Glycinemax cv enrei genome for improvement of Japanese soybean

International Journal of Genomics 11

cultivarsrdquo International Journal of Genomics vol 2015 ArticleID 358127 8 pages 2015

[6] D R Bentley ldquoWhole-genome re-sequencingrdquoCurrent Opinionin Genetics and Development vol 16 no 6 pp 545ndash552 2006

[7] OMorozova andMAMarra ldquoApplications of next-generationsequencing technologies in functional genomicsrdquo Genomicsvol 92 no 5 pp 255ndash264 2008

[8] K LMcNally K L Childs R Bohnert et al ldquoGenomewide SNPvariation reveals relationships among landraces and modernvarieties of ricerdquoProceedings of theNational Academy of Sciencesof the United States of America vol 106 no 30 pp 12273ndash122782009

[9] T YamamotoHNagasaki J-I Yonemaru et al ldquoFine definitionof the pedigree haplotypes of closely related rice cultivars bymeans of genome-wide discovery of single-nucleotide polymor-phismsrdquo BMC Genomics vol 11 article 267 2010

[10] X Huang X Wei T Sang et al ldquoGenome-wide asociationstudies of 14 agronomic traits in rice landracesrdquoNatureGeneticsvol 42 no 11 pp 961ndash967 2010

[11] Y Liu X Qi N D Young K M Olsen A L Caicedo and YJia ldquoCharacterization of resistance genes to rice blast fungusMagnaporthe oryzae in a ldquoGreen Revolutionrdquo rice varietyrdquoMolecular Breeding vol 35 article no 52 2015

[12] Y Arai-Kichise Y Shiwa H Nagasaki et al ldquoDiscovery ofgenome-wide DNA polymorphisms in a landrace cultivar ofjaponica rice by whole-genome sequencingrdquo Plant and CellPhysiology vol 52 no 2 pp 274ndash282 2011

[13] J L Bennetzen ldquoComparative sequence analysis of plantnuclear genomes microcolinearity and its many exceptionsrdquoPlant Cell vol 12 no 7 pp 1021ndash1029 2000

[14] J Yu S Hu J Wang et al ldquoA draft sequence of the rice genome(Oryza sativa L ssp indica)rdquo Science vol 296 no 5565 pp 79ndash92 2002

[15] S A Goff D Ricke T H Lan et al ldquoA draft sequence of therice genome (Oryza sativa L ssp japonica)rdquo Science vol 296no 5565 pp 92ndash100 2005

[16] M Jain K C Moharana R Shankar R Kumari and RGarg ldquoGenomewide discovery of DNA polymorphisms in ricecultivars with contrasting drought and salinity stress responseand their functional relevancerdquoPlant Biotechnology Journal vol12 no 2 pp 253ndash264 2014

[17] P Rathinasabapathi N Purushothaman V L Ramprasad andM Parani ldquoWhole genome sequencing and analysis of Swarnaa widely cultivated indica rice variety with low glycemic indexrdquoScientific Reports vol 5 Article ID 11303 2015

[18] K Tsukada ldquoVietnam food security in a rice-exporting coun-tryrdquo in The World Food Crisis and the Strategies and Asian RiceExporter S Shigetomi K Kubo and K Tsukada Eds SpotSurvey 32 IDE-JETRO Chiba Japan 2011

[19] K H Trung and L H Ham ldquoSequencing the genomes of anumber of native Vietnamese rice linerdquo in Proceedings of the1st National Proceeding of Crop Science VAAS Hanoi VietnamSeptember 2013

[20] Sequencing Project IRG ldquoThe map-based sequence of the ricegenomerdquo Nature vol 436 pp 793ndash800 2005

[21] Y-J Shen H Jiang J-P Jin et al ldquoDevelopment of genome-wide DNA polymorphism database for map-based cloning ofrice genesrdquo Plant Physiology vol 135 no 3 pp 1198ndash1205 2004

[22] Y Arai-Kichise Y Shiwa K Ebana et al ldquoGenome-wide DNApolymorphisms in seven rice cultivars of temperate and tropicaljaponica groupsrdquo PLoS ONE vol 9 no 1 Article ID e863122014

[23] G K Subbaiyan D L EWaters S K Katiyar A R SadanandaS Vaddadi and R J Henry ldquoGenome-wide DNA polymor-phisms in elite indica rice inbreds discovered by whole-genomesequencingrdquo Plant Biotechnology Journal vol 10 no 6 pp 623ndash634 2012

[24] S G Krishnan D L EWaters and R J Henry ldquoAustralian wildrice reveals pre-domestication origin of polymorphism desertsin rice genomerdquoPLoSONE vol 9 no 6 Article ID e98843 2014

[25] Z Cheng F Dong T Langdon et al ldquoFunctional rice cen-tromeres are marked by a satellite repeat and a centromere-specific retrotransposonrdquoThe Plant Cell vol 14 no 8 pp 1691ndash1704 2002

[26] F A Feltus J Wan S R Schulze J C Estill N Jiang and AH Paterson ldquoAn SNP resource for rice genetics and breedingbased on subspecies Indica and Japonica genome alignmentsrdquoGenome Research vol 14 no 9 pp 1812ndash1819 2004

[27] D L Waters and R J Henry ldquoAustralian wild rice reveals pre-domestication origin of polymorphism deserts in rice genomerdquoPLoS ONE vol 9 no 6 Article ID e98843 2014

[28] J Wakeley ldquoThe excess of transitions among nucleotide substi-tutions new methods of estimating transition bias underscoreits significancerdquo Trends in Ecology amp Evolution vol 11 no 4 pp158ndash162 1996

[29] J Terol M A Naranjo P Ollitrault and M Talon ldquoDevelop-ment of genomic resources for Citrus clementina characteriza-tion of three deep-coverage BAC libraries and analysis of 46000BAC end sequencesrdquo BMC Genomics vol 9 article 423 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Whole-Genome Characteristics and Polymorphic Analysis of ...Thewholegenomesoffivericelandraceshaveyielded high-quality reads, from 112 to 161 million reads per line. Most of the reads

6 International Journal of Genomics

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(432Mb)

10 20 30 40

(Mb)N

umbe

r of g

enes

(a)

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(355Mb)

10 20 30

(Mb)

Num

ber o

f gen

es

(b)

50

25

0

indica 12

indica 13

indica 15

japonica 11

japonica 14

(300Mb)

10 20 30

(Mb)

Num

ber o

f gen

es

(c)

Figure 3 Distribution of ldquoSNP desertsrdquo on (a) chromosome 1 (b) chromosome 4 and (c) chromosome 5 in the five landraces The low-SNPdensity regions are shown as gray bars or vertical lines The vertical-dash intervals indicate the ldquoSNP desertsrdquo occurring in all five landracesThe line charts represent the distribution of genes across the chromosomes The chromosome sizes are indicated in brackets

as to the genetic diversity of the two types of rice lines in thedomestication process The results have further contributedadditional evidence for the transfer of DNA regions in theorganelle into the nucleus in the rice genome Our study hasalso strengthened the classification of the relationship amongthe landraces The detection of DNA polymorphisms can beused to identify novel genes that differentiated between ourlandraces and will serve as a reference for studies relating to

specialty characteristics of Vietnamese rice landraces Thesepolymorphisms are also available for use in marker-assistedselection in rice breeding programs

The whole genomes of five rice landraces have yieldedhigh-quality reads from 112 to 161 million reads per lineMost of the reads (ge8633) were successfully mappedto the Nipponbare reference genome with the breadth ofcoverage of more than 8918 proving that the selected rice

International Journal of Genomics 7

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica15

indica12

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica12

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica15

indica12

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica15

indica12

indica13

minus1 1Value

Color key

0

Chr01 Chr02 Chr03

Chr04 Chr05 Chr06

Chr07 Chr08

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

Chr09

Chr10 Chr11 Chr12

Figure 4 The heatmap and cluster view showing the correlation among the five landraces on the 12 chromosomes Pearson correlationcoefficients were calculated based on the SNP densities per 100 kb on the chromosome

genomes are similar to the reference Interestingly we founda small number of reads aligned with both chromosomes andorganelle genomes including chloroplast and mitochondriaThis phenomenon is known as ldquoorganellar insertionrdquo in thenuclear genome which has been previously reported by theInternational Rice Genome Sequencing Project [20] The

reads mapped to the mitochondria concentrated on chro-mosome 12 (10) and the reads mapped to the chloroplastlocated on chromosome 10 (08) Therefore using NGSthese data are consistent with some previous reports andalso reconfirmed that chromosomes 10 and 12 have had moreinsertions than the others [20]

8 International Journal of Genomics

Intergenic 1374419

Genic 633408

Intron amp regulatory sequences 288375

UTRs85749

CDS259284

Non-synonymous

151998

Synonymous107286

(a)

Intergenic 1329647

Genic 584505

Intron amp regulatory sequences

279350

UTRs 82611

CDS 222544

Non-synonymous

131754

Synonymous90790

(b)

Intergenic 1527754

Genic 713664

Intron amp regulatory sequences

320956

UTRs 96948

CDS 295760

Non-synonymous

172767

Synonymous122993

(c)

Intergenic 488888

Genic 250766

Intron amp regulatory sequences

109191

UTRs29587

CDS 111988

Non-synonymous

66645

Synonymous45343

(d)

Intergenic 454811

Genic 236104

Intron amp regulatory sequences

101187

UTRs 28138

CDS 106779

Non-synonymous

63962

Synonymous42817

(e)

Figure 5 Annotation of single nucleotide polymorphisms (SNPs) between five Vietnamese rice landraces ((a) indica 12 (b) indica 13 (c)indica 15 (d) japonica 11 and (e) japonica 14) and Nipponbare based on the annotations of Nipponbare reference genome

International Journal of Genomics 9

2000018823

2248

1554

1082

825

298 327

478

304239

159 137 99 72 56 36 28 36 30 15 12 11 7 4 9 5 15

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

Number of non-synonymous SNPs (SNPskb)

le2 3 4 5 6

69 7 8 910

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

(a)

16770

1827

1321

903

581

59

473334

214 197134 116 74 61 43 34 19 18 14 11 13 8 7 2 3 4 12

18000

16000

17000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

639 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(b)

19953

2352

1724

1293

924

705

534407

298 236160 127 97 81 52 38 29 31 24 16 17 12 14 7 2 1814

20000

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

6497 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(c)

12133

982

650

96

314 302231 181

98 83 51 46 35 23 22 23 17 11 13 5 5 5 6 5 6

12000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

668 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

ge25

Number of non-synonymous SNPs (SNPskb)

(d)

12121

957

613

5

402254 207 150 106 77 57 52 44 23 30 21 12 10 6 6 6 64 4 3 5 2

12000

14000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

647 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(e)

Figure 6 The distribution and skewness of the nonsynonymous SNPs SNPs per 1 kb in (a) indica 12 (b) indica 13 (c) indica 15 (d) japonica11 and (e) japonica 14 The black dash indicates the outlier value cutoff

Table 4 Base changes in five rice genomes

Landraces indica 12 indica 13 indica 15 japonica 11 japonica 14Transitions

AG 705908 671815 787002 255473 236847CT 707496 672959 787976 255542 237620

TransversionsAC 155310 149265 174014 59111 55410AT 182882 176243 205343 70949 68165GC 101541 95523 113736 39460 37120GT 154639 148304 173286 59011 55639

TsTv 2378 2362 2364 2236 2193

10 International Journal of Genomics

The sequencing depth of approximately 30x has beensufficient for detecting DNA polymorphisms Our results ofvariant calling of indica lines (1914152 to 2241418 SNPs268400 to 303039 InDels) are in agreementwith the previousstudy reporting that the 93-11 indica possesses about 17million SNPs and 480 thousand InDels [21] However thenumbers of SNPs and InDels (about 714 thousand SNPs and102 thousand InDels) of japonica lines were about five andthree times those of Omachi line (132462 SNPs and 35766InDels) but were similar to those of Moroberekan (827448SNPs and 159597 InDels) a tropical japonica cultivar [1222] The distinct difference between tropical and temperatejaponica landraces has been reported by Arai-Kichise et al[22] Further validation is underway to obtain SNP markersfor rice breeding selection

SNP deserts genome regions of SNP rate less than1 SNPkb were identified in all 5 Vietnamese landraces withsizes varying from 100 kb to 67MB The japonica SNPdeserts are longer than those of indica Mostly SNP desertshave been previously found on chromosome 5 [23 24]however we have identified two additional SNP desertswithin all five Vietnamese landraces with the size of 07Mbon chromosome 1 and 08Mb on chromosome 4 Chenget al [25] have reported that SNP deserts were in thevicinity of the centromere on chromosome 5 but far from thecentromere on chromosomes 1 and 4 Therefore the currentresults have supported the hypothesis that SNP-low regionshave not been correlated with low recombination [26] ThecommonSNPdesertsmight include highly conserved regionsamong the five Vietnamese rice landraces They could resultfrom selective sweeps reducing the variants during humanselection and rice domestication [27] These SNP deserts areable to raise fascinating questions for future studies as towhether the persistence of chromosomal regions is randomor special for the individual landrace

The SNP distribution correlation analysis of the fivelandraces has demonstrated the distinct divergence betweenindica and japonica and disclosed the phylogenetic relation-ships among the landraces (Figure 4) thus it is possible tobe exploited for the verification of landrace classificationsThe relationship among the rice landraces in the correlationof chromosomes could be utilized for the genetic linkagedisequilibrium studies

Additionally the phenomenon ldquotransition biasrdquo whichmeans the ratios of transitions (Ts) and transversions (Tv)larger than 1 2 occurred among the five landraces Thetransitional SNPs aremore tolerated than transversional onesduring mutation and natural selection because they are morelikely synonymous in coding protein resulting in conservingthe protein structure [28] Within transitions a number ofboth AG and CT changes have been rather similar Amongtransversions the AT transversions have been shown to behigher than others which was also reported in the genomesof citrus and rice [17 28 29]

In summary by sequencing and aligning five Vietnameserice genomes with the reference genome Nipponbare ourresults have disclosed interesting genetic information suchas SNP and InDel distributions and effects and SNP desertsThree ldquoSNPdesertrdquo regions whichmight result from selective

sweeps in the domestication of rice landraces were alsoobserved in the different chromosomes Furthermore theSNP distribution analysis has revealed the phylogeny ofindica and japonica with distinct classification Further SNPvalidations need to be examined to identify accurate SNPmarkers for molecular breeding

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Hien Trinh and Khoa Truong Nguyen contributed equallyto this work Mario Caccamo Cuong Nguyen and KhuatHuu Trung contributed to the conception and design of thestudy Sarah Ayling Khoa Truong Nguyen and Hien Trinhcarried out the quality control of sequencing data and dataanalysis Tran Dang Xuan La Hoang Anh Can Thu HuongNguyenThuy Diep LamVan Nguyen and Huy Quang Phamconceived the study and designed and performed the dataanalysis Tran Dang Khanh Hien Trinh and Cuong Nguyendrafted and revised the manuscript All authors contributedto the editing of the final version of the manuscript

Acknowledgments

The authors would like to thank Hung Nguyen from MUInformatics Institute University of Missouri Columbia MO65211 USA for his help in developing the homemade Pythonscript to split 100 kb window sizes across the chromosomesThisworkwas supported by theMinistry of Science andTech-nology (MOST) Vietnam and the Genome Analysis Centre(TGAC) Biotechnology and Biological Sciences ResearchCouncil (BBSRC) through a collaboration program betweenVietnam and theUKThe authors are grateful to colleagues inthis program that have made significant contributions to thecollection analysis or interpretation of samples and results

References

[1] Y Kawahara M Bastide J P Hamilton et al ldquoImprovementof the Oryza sativa Nipponbare reference genome using nextgeneration sequence and optical map datardquo Rice vol 6 no 1article 4 2013

[2] R Brenchley M Spannagl M Pfeifer et al ldquoAnalysis of thebread wheat genome using whole-genome shotgun sequenc-ingrdquo Nature vol 491 no 7426 pp 705ndash710 2012

[3] M Kobayashi H Nagasaki V Garcia et al ldquoGenome-wideanalysis of intraspecific dna polymorphism in lsquoMicro-Tomrsquo amodel cultivar of tomato (solanum lycopersicum)rdquo Plant andCell Physiology vol 55 no 2 pp 445ndash454 2014

[4] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[5] M Shimomura H Kanamori S Komatsu et al ldquoThe Glycinemax cv enrei genome for improvement of Japanese soybean

International Journal of Genomics 11

cultivarsrdquo International Journal of Genomics vol 2015 ArticleID 358127 8 pages 2015

[6] D R Bentley ldquoWhole-genome re-sequencingrdquoCurrent Opinionin Genetics and Development vol 16 no 6 pp 545ndash552 2006

[7] OMorozova andMAMarra ldquoApplications of next-generationsequencing technologies in functional genomicsrdquo Genomicsvol 92 no 5 pp 255ndash264 2008

[8] K LMcNally K L Childs R Bohnert et al ldquoGenomewide SNPvariation reveals relationships among landraces and modernvarieties of ricerdquoProceedings of theNational Academy of Sciencesof the United States of America vol 106 no 30 pp 12273ndash122782009

[9] T YamamotoHNagasaki J-I Yonemaru et al ldquoFine definitionof the pedigree haplotypes of closely related rice cultivars bymeans of genome-wide discovery of single-nucleotide polymor-phismsrdquo BMC Genomics vol 11 article 267 2010

[10] X Huang X Wei T Sang et al ldquoGenome-wide asociationstudies of 14 agronomic traits in rice landracesrdquoNatureGeneticsvol 42 no 11 pp 961ndash967 2010

[11] Y Liu X Qi N D Young K M Olsen A L Caicedo and YJia ldquoCharacterization of resistance genes to rice blast fungusMagnaporthe oryzae in a ldquoGreen Revolutionrdquo rice varietyrdquoMolecular Breeding vol 35 article no 52 2015

[12] Y Arai-Kichise Y Shiwa H Nagasaki et al ldquoDiscovery ofgenome-wide DNA polymorphisms in a landrace cultivar ofjaponica rice by whole-genome sequencingrdquo Plant and CellPhysiology vol 52 no 2 pp 274ndash282 2011

[13] J L Bennetzen ldquoComparative sequence analysis of plantnuclear genomes microcolinearity and its many exceptionsrdquoPlant Cell vol 12 no 7 pp 1021ndash1029 2000

[14] J Yu S Hu J Wang et al ldquoA draft sequence of the rice genome(Oryza sativa L ssp indica)rdquo Science vol 296 no 5565 pp 79ndash92 2002

[15] S A Goff D Ricke T H Lan et al ldquoA draft sequence of therice genome (Oryza sativa L ssp japonica)rdquo Science vol 296no 5565 pp 92ndash100 2005

[16] M Jain K C Moharana R Shankar R Kumari and RGarg ldquoGenomewide discovery of DNA polymorphisms in ricecultivars with contrasting drought and salinity stress responseand their functional relevancerdquoPlant Biotechnology Journal vol12 no 2 pp 253ndash264 2014

[17] P Rathinasabapathi N Purushothaman V L Ramprasad andM Parani ldquoWhole genome sequencing and analysis of Swarnaa widely cultivated indica rice variety with low glycemic indexrdquoScientific Reports vol 5 Article ID 11303 2015

[18] K Tsukada ldquoVietnam food security in a rice-exporting coun-tryrdquo in The World Food Crisis and the Strategies and Asian RiceExporter S Shigetomi K Kubo and K Tsukada Eds SpotSurvey 32 IDE-JETRO Chiba Japan 2011

[19] K H Trung and L H Ham ldquoSequencing the genomes of anumber of native Vietnamese rice linerdquo in Proceedings of the1st National Proceeding of Crop Science VAAS Hanoi VietnamSeptember 2013

[20] Sequencing Project IRG ldquoThe map-based sequence of the ricegenomerdquo Nature vol 436 pp 793ndash800 2005

[21] Y-J Shen H Jiang J-P Jin et al ldquoDevelopment of genome-wide DNA polymorphism database for map-based cloning ofrice genesrdquo Plant Physiology vol 135 no 3 pp 1198ndash1205 2004

[22] Y Arai-Kichise Y Shiwa K Ebana et al ldquoGenome-wide DNApolymorphisms in seven rice cultivars of temperate and tropicaljaponica groupsrdquo PLoS ONE vol 9 no 1 Article ID e863122014

[23] G K Subbaiyan D L EWaters S K Katiyar A R SadanandaS Vaddadi and R J Henry ldquoGenome-wide DNA polymor-phisms in elite indica rice inbreds discovered by whole-genomesequencingrdquo Plant Biotechnology Journal vol 10 no 6 pp 623ndash634 2012

[24] S G Krishnan D L EWaters and R J Henry ldquoAustralian wildrice reveals pre-domestication origin of polymorphism desertsin rice genomerdquoPLoSONE vol 9 no 6 Article ID e98843 2014

[25] Z Cheng F Dong T Langdon et al ldquoFunctional rice cen-tromeres are marked by a satellite repeat and a centromere-specific retrotransposonrdquoThe Plant Cell vol 14 no 8 pp 1691ndash1704 2002

[26] F A Feltus J Wan S R Schulze J C Estill N Jiang and AH Paterson ldquoAn SNP resource for rice genetics and breedingbased on subspecies Indica and Japonica genome alignmentsrdquoGenome Research vol 14 no 9 pp 1812ndash1819 2004

[27] D L Waters and R J Henry ldquoAustralian wild rice reveals pre-domestication origin of polymorphism deserts in rice genomerdquoPLoS ONE vol 9 no 6 Article ID e98843 2014

[28] J Wakeley ldquoThe excess of transitions among nucleotide substi-tutions new methods of estimating transition bias underscoreits significancerdquo Trends in Ecology amp Evolution vol 11 no 4 pp158ndash162 1996

[29] J Terol M A Naranjo P Ollitrault and M Talon ldquoDevelop-ment of genomic resources for Citrus clementina characteriza-tion of three deep-coverage BAC libraries and analysis of 46000BAC end sequencesrdquo BMC Genomics vol 9 article 423 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Whole-Genome Characteristics and Polymorphic Analysis of ...Thewholegenomesoffivericelandraceshaveyielded high-quality reads, from 112 to 161 million reads per line. Most of the reads

International Journal of Genomics 7

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica15

indica12

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica12

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica13

indica15

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica13

indica15

indica12

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica12

indica15

indica13

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica14

japonica11

indica15

indica12

indica13

minus1 1Value

Color key

0

Chr01 Chr02 Chr03

Chr04 Chr05 Chr06

Chr07 Chr08

indi

ca1

2

indi

ca1

3

indi

ca1

5

japo

nica

11

japo

nica

14

japonica11

japonica14

indica12

indica13

indica15

Chr09

Chr10 Chr11 Chr12

Figure 4 The heatmap and cluster view showing the correlation among the five landraces on the 12 chromosomes Pearson correlationcoefficients were calculated based on the SNP densities per 100 kb on the chromosome

genomes are similar to the reference Interestingly we founda small number of reads aligned with both chromosomes andorganelle genomes including chloroplast and mitochondriaThis phenomenon is known as ldquoorganellar insertionrdquo in thenuclear genome which has been previously reported by theInternational Rice Genome Sequencing Project [20] The

reads mapped to the mitochondria concentrated on chro-mosome 12 (10) and the reads mapped to the chloroplastlocated on chromosome 10 (08) Therefore using NGSthese data are consistent with some previous reports andalso reconfirmed that chromosomes 10 and 12 have had moreinsertions than the others [20]

8 International Journal of Genomics

Intergenic 1374419

Genic 633408

Intron amp regulatory sequences 288375

UTRs85749

CDS259284

Non-synonymous

151998

Synonymous107286

(a)

Intergenic 1329647

Genic 584505

Intron amp regulatory sequences

279350

UTRs 82611

CDS 222544

Non-synonymous

131754

Synonymous90790

(b)

Intergenic 1527754

Genic 713664

Intron amp regulatory sequences

320956

UTRs 96948

CDS 295760

Non-synonymous

172767

Synonymous122993

(c)

Intergenic 488888

Genic 250766

Intron amp regulatory sequences

109191

UTRs29587

CDS 111988

Non-synonymous

66645

Synonymous45343

(d)

Intergenic 454811

Genic 236104

Intron amp regulatory sequences

101187

UTRs 28138

CDS 106779

Non-synonymous

63962

Synonymous42817

(e)

Figure 5 Annotation of single nucleotide polymorphisms (SNPs) between five Vietnamese rice landraces ((a) indica 12 (b) indica 13 (c)indica 15 (d) japonica 11 and (e) japonica 14) and Nipponbare based on the annotations of Nipponbare reference genome

International Journal of Genomics 9

2000018823

2248

1554

1082

825

298 327

478

304239

159 137 99 72 56 36 28 36 30 15 12 11 7 4 9 5 15

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

Number of non-synonymous SNPs (SNPskb)

le2 3 4 5 6

69 7 8 910

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

(a)

16770

1827

1321

903

581

59

473334

214 197134 116 74 61 43 34 19 18 14 11 13 8 7 2 3 4 12

18000

16000

17000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

639 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(b)

19953

2352

1724

1293

924

705

534407

298 236160 127 97 81 52 38 29 31 24 16 17 12 14 7 2 1814

20000

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

6497 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(c)

12133

982

650

96

314 302231 181

98 83 51 46 35 23 22 23 17 11 13 5 5 5 6 5 6

12000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

668 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

ge25

Number of non-synonymous SNPs (SNPskb)

(d)

12121

957

613

5

402254 207 150 106 77 57 52 44 23 30 21 12 10 6 6 6 64 4 3 5 2

12000

14000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

647 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(e)

Figure 6 The distribution and skewness of the nonsynonymous SNPs SNPs per 1 kb in (a) indica 12 (b) indica 13 (c) indica 15 (d) japonica11 and (e) japonica 14 The black dash indicates the outlier value cutoff

Table 4 Base changes in five rice genomes

Landraces indica 12 indica 13 indica 15 japonica 11 japonica 14Transitions

AG 705908 671815 787002 255473 236847CT 707496 672959 787976 255542 237620

TransversionsAC 155310 149265 174014 59111 55410AT 182882 176243 205343 70949 68165GC 101541 95523 113736 39460 37120GT 154639 148304 173286 59011 55639

TsTv 2378 2362 2364 2236 2193

10 International Journal of Genomics

The sequencing depth of approximately 30x has beensufficient for detecting DNA polymorphisms Our results ofvariant calling of indica lines (1914152 to 2241418 SNPs268400 to 303039 InDels) are in agreementwith the previousstudy reporting that the 93-11 indica possesses about 17million SNPs and 480 thousand InDels [21] However thenumbers of SNPs and InDels (about 714 thousand SNPs and102 thousand InDels) of japonica lines were about five andthree times those of Omachi line (132462 SNPs and 35766InDels) but were similar to those of Moroberekan (827448SNPs and 159597 InDels) a tropical japonica cultivar [1222] The distinct difference between tropical and temperatejaponica landraces has been reported by Arai-Kichise et al[22] Further validation is underway to obtain SNP markersfor rice breeding selection

SNP deserts genome regions of SNP rate less than1 SNPkb were identified in all 5 Vietnamese landraces withsizes varying from 100 kb to 67MB The japonica SNPdeserts are longer than those of indica Mostly SNP desertshave been previously found on chromosome 5 [23 24]however we have identified two additional SNP desertswithin all five Vietnamese landraces with the size of 07Mbon chromosome 1 and 08Mb on chromosome 4 Chenget al [25] have reported that SNP deserts were in thevicinity of the centromere on chromosome 5 but far from thecentromere on chromosomes 1 and 4 Therefore the currentresults have supported the hypothesis that SNP-low regionshave not been correlated with low recombination [26] ThecommonSNPdesertsmight include highly conserved regionsamong the five Vietnamese rice landraces They could resultfrom selective sweeps reducing the variants during humanselection and rice domestication [27] These SNP deserts areable to raise fascinating questions for future studies as towhether the persistence of chromosomal regions is randomor special for the individual landrace

The SNP distribution correlation analysis of the fivelandraces has demonstrated the distinct divergence betweenindica and japonica and disclosed the phylogenetic relation-ships among the landraces (Figure 4) thus it is possible tobe exploited for the verification of landrace classificationsThe relationship among the rice landraces in the correlationof chromosomes could be utilized for the genetic linkagedisequilibrium studies

Additionally the phenomenon ldquotransition biasrdquo whichmeans the ratios of transitions (Ts) and transversions (Tv)larger than 1 2 occurred among the five landraces Thetransitional SNPs aremore tolerated than transversional onesduring mutation and natural selection because they are morelikely synonymous in coding protein resulting in conservingthe protein structure [28] Within transitions a number ofboth AG and CT changes have been rather similar Amongtransversions the AT transversions have been shown to behigher than others which was also reported in the genomesof citrus and rice [17 28 29]

In summary by sequencing and aligning five Vietnameserice genomes with the reference genome Nipponbare ourresults have disclosed interesting genetic information suchas SNP and InDel distributions and effects and SNP desertsThree ldquoSNPdesertrdquo regions whichmight result from selective

sweeps in the domestication of rice landraces were alsoobserved in the different chromosomes Furthermore theSNP distribution analysis has revealed the phylogeny ofindica and japonica with distinct classification Further SNPvalidations need to be examined to identify accurate SNPmarkers for molecular breeding

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Hien Trinh and Khoa Truong Nguyen contributed equallyto this work Mario Caccamo Cuong Nguyen and KhuatHuu Trung contributed to the conception and design of thestudy Sarah Ayling Khoa Truong Nguyen and Hien Trinhcarried out the quality control of sequencing data and dataanalysis Tran Dang Xuan La Hoang Anh Can Thu HuongNguyenThuy Diep LamVan Nguyen and Huy Quang Phamconceived the study and designed and performed the dataanalysis Tran Dang Khanh Hien Trinh and Cuong Nguyendrafted and revised the manuscript All authors contributedto the editing of the final version of the manuscript

Acknowledgments

The authors would like to thank Hung Nguyen from MUInformatics Institute University of Missouri Columbia MO65211 USA for his help in developing the homemade Pythonscript to split 100 kb window sizes across the chromosomesThisworkwas supported by theMinistry of Science andTech-nology (MOST) Vietnam and the Genome Analysis Centre(TGAC) Biotechnology and Biological Sciences ResearchCouncil (BBSRC) through a collaboration program betweenVietnam and theUKThe authors are grateful to colleagues inthis program that have made significant contributions to thecollection analysis or interpretation of samples and results

References

[1] Y Kawahara M Bastide J P Hamilton et al ldquoImprovementof the Oryza sativa Nipponbare reference genome using nextgeneration sequence and optical map datardquo Rice vol 6 no 1article 4 2013

[2] R Brenchley M Spannagl M Pfeifer et al ldquoAnalysis of thebread wheat genome using whole-genome shotgun sequenc-ingrdquo Nature vol 491 no 7426 pp 705ndash710 2012

[3] M Kobayashi H Nagasaki V Garcia et al ldquoGenome-wideanalysis of intraspecific dna polymorphism in lsquoMicro-Tomrsquo amodel cultivar of tomato (solanum lycopersicum)rdquo Plant andCell Physiology vol 55 no 2 pp 445ndash454 2014

[4] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[5] M Shimomura H Kanamori S Komatsu et al ldquoThe Glycinemax cv enrei genome for improvement of Japanese soybean

International Journal of Genomics 11

cultivarsrdquo International Journal of Genomics vol 2015 ArticleID 358127 8 pages 2015

[6] D R Bentley ldquoWhole-genome re-sequencingrdquoCurrent Opinionin Genetics and Development vol 16 no 6 pp 545ndash552 2006

[7] OMorozova andMAMarra ldquoApplications of next-generationsequencing technologies in functional genomicsrdquo Genomicsvol 92 no 5 pp 255ndash264 2008

[8] K LMcNally K L Childs R Bohnert et al ldquoGenomewide SNPvariation reveals relationships among landraces and modernvarieties of ricerdquoProceedings of theNational Academy of Sciencesof the United States of America vol 106 no 30 pp 12273ndash122782009

[9] T YamamotoHNagasaki J-I Yonemaru et al ldquoFine definitionof the pedigree haplotypes of closely related rice cultivars bymeans of genome-wide discovery of single-nucleotide polymor-phismsrdquo BMC Genomics vol 11 article 267 2010

[10] X Huang X Wei T Sang et al ldquoGenome-wide asociationstudies of 14 agronomic traits in rice landracesrdquoNatureGeneticsvol 42 no 11 pp 961ndash967 2010

[11] Y Liu X Qi N D Young K M Olsen A L Caicedo and YJia ldquoCharacterization of resistance genes to rice blast fungusMagnaporthe oryzae in a ldquoGreen Revolutionrdquo rice varietyrdquoMolecular Breeding vol 35 article no 52 2015

[12] Y Arai-Kichise Y Shiwa H Nagasaki et al ldquoDiscovery ofgenome-wide DNA polymorphisms in a landrace cultivar ofjaponica rice by whole-genome sequencingrdquo Plant and CellPhysiology vol 52 no 2 pp 274ndash282 2011

[13] J L Bennetzen ldquoComparative sequence analysis of plantnuclear genomes microcolinearity and its many exceptionsrdquoPlant Cell vol 12 no 7 pp 1021ndash1029 2000

[14] J Yu S Hu J Wang et al ldquoA draft sequence of the rice genome(Oryza sativa L ssp indica)rdquo Science vol 296 no 5565 pp 79ndash92 2002

[15] S A Goff D Ricke T H Lan et al ldquoA draft sequence of therice genome (Oryza sativa L ssp japonica)rdquo Science vol 296no 5565 pp 92ndash100 2005

[16] M Jain K C Moharana R Shankar R Kumari and RGarg ldquoGenomewide discovery of DNA polymorphisms in ricecultivars with contrasting drought and salinity stress responseand their functional relevancerdquoPlant Biotechnology Journal vol12 no 2 pp 253ndash264 2014

[17] P Rathinasabapathi N Purushothaman V L Ramprasad andM Parani ldquoWhole genome sequencing and analysis of Swarnaa widely cultivated indica rice variety with low glycemic indexrdquoScientific Reports vol 5 Article ID 11303 2015

[18] K Tsukada ldquoVietnam food security in a rice-exporting coun-tryrdquo in The World Food Crisis and the Strategies and Asian RiceExporter S Shigetomi K Kubo and K Tsukada Eds SpotSurvey 32 IDE-JETRO Chiba Japan 2011

[19] K H Trung and L H Ham ldquoSequencing the genomes of anumber of native Vietnamese rice linerdquo in Proceedings of the1st National Proceeding of Crop Science VAAS Hanoi VietnamSeptember 2013

[20] Sequencing Project IRG ldquoThe map-based sequence of the ricegenomerdquo Nature vol 436 pp 793ndash800 2005

[21] Y-J Shen H Jiang J-P Jin et al ldquoDevelopment of genome-wide DNA polymorphism database for map-based cloning ofrice genesrdquo Plant Physiology vol 135 no 3 pp 1198ndash1205 2004

[22] Y Arai-Kichise Y Shiwa K Ebana et al ldquoGenome-wide DNApolymorphisms in seven rice cultivars of temperate and tropicaljaponica groupsrdquo PLoS ONE vol 9 no 1 Article ID e863122014

[23] G K Subbaiyan D L EWaters S K Katiyar A R SadanandaS Vaddadi and R J Henry ldquoGenome-wide DNA polymor-phisms in elite indica rice inbreds discovered by whole-genomesequencingrdquo Plant Biotechnology Journal vol 10 no 6 pp 623ndash634 2012

[24] S G Krishnan D L EWaters and R J Henry ldquoAustralian wildrice reveals pre-domestication origin of polymorphism desertsin rice genomerdquoPLoSONE vol 9 no 6 Article ID e98843 2014

[25] Z Cheng F Dong T Langdon et al ldquoFunctional rice cen-tromeres are marked by a satellite repeat and a centromere-specific retrotransposonrdquoThe Plant Cell vol 14 no 8 pp 1691ndash1704 2002

[26] F A Feltus J Wan S R Schulze J C Estill N Jiang and AH Paterson ldquoAn SNP resource for rice genetics and breedingbased on subspecies Indica and Japonica genome alignmentsrdquoGenome Research vol 14 no 9 pp 1812ndash1819 2004

[27] D L Waters and R J Henry ldquoAustralian wild rice reveals pre-domestication origin of polymorphism deserts in rice genomerdquoPLoS ONE vol 9 no 6 Article ID e98843 2014

[28] J Wakeley ldquoThe excess of transitions among nucleotide substi-tutions new methods of estimating transition bias underscoreits significancerdquo Trends in Ecology amp Evolution vol 11 no 4 pp158ndash162 1996

[29] J Terol M A Naranjo P Ollitrault and M Talon ldquoDevelop-ment of genomic resources for Citrus clementina characteriza-tion of three deep-coverage BAC libraries and analysis of 46000BAC end sequencesrdquo BMC Genomics vol 9 article 423 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Whole-Genome Characteristics and Polymorphic Analysis of ...Thewholegenomesoffivericelandraceshaveyielded high-quality reads, from 112 to 161 million reads per line. Most of the reads

8 International Journal of Genomics

Intergenic 1374419

Genic 633408

Intron amp regulatory sequences 288375

UTRs85749

CDS259284

Non-synonymous

151998

Synonymous107286

(a)

Intergenic 1329647

Genic 584505

Intron amp regulatory sequences

279350

UTRs 82611

CDS 222544

Non-synonymous

131754

Synonymous90790

(b)

Intergenic 1527754

Genic 713664

Intron amp regulatory sequences

320956

UTRs 96948

CDS 295760

Non-synonymous

172767

Synonymous122993

(c)

Intergenic 488888

Genic 250766

Intron amp regulatory sequences

109191

UTRs29587

CDS 111988

Non-synonymous

66645

Synonymous45343

(d)

Intergenic 454811

Genic 236104

Intron amp regulatory sequences

101187

UTRs 28138

CDS 106779

Non-synonymous

63962

Synonymous42817

(e)

Figure 5 Annotation of single nucleotide polymorphisms (SNPs) between five Vietnamese rice landraces ((a) indica 12 (b) indica 13 (c)indica 15 (d) japonica 11 and (e) japonica 14) and Nipponbare based on the annotations of Nipponbare reference genome

International Journal of Genomics 9

2000018823

2248

1554

1082

825

298 327

478

304239

159 137 99 72 56 36 28 36 30 15 12 11 7 4 9 5 15

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

Number of non-synonymous SNPs (SNPskb)

le2 3 4 5 6

69 7 8 910

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

(a)

16770

1827

1321

903

581

59

473334

214 197134 116 74 61 43 34 19 18 14 11 13 8 7 2 3 4 12

18000

16000

17000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

639 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(b)

19953

2352

1724

1293

924

705

534407

298 236160 127 97 81 52 38 29 31 24 16 17 12 14 7 2 1814

20000

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

6497 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(c)

12133

982

650

96

314 302231 181

98 83 51 46 35 23 22 23 17 11 13 5 5 5 6 5 6

12000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

668 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

ge25

Number of non-synonymous SNPs (SNPskb)

(d)

12121

957

613

5

402254 207 150 106 77 57 52 44 23 30 21 12 10 6 6 6 64 4 3 5 2

12000

14000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

647 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(e)

Figure 6 The distribution and skewness of the nonsynonymous SNPs SNPs per 1 kb in (a) indica 12 (b) indica 13 (c) indica 15 (d) japonica11 and (e) japonica 14 The black dash indicates the outlier value cutoff

Table 4 Base changes in five rice genomes

Landraces indica 12 indica 13 indica 15 japonica 11 japonica 14Transitions

AG 705908 671815 787002 255473 236847CT 707496 672959 787976 255542 237620

TransversionsAC 155310 149265 174014 59111 55410AT 182882 176243 205343 70949 68165GC 101541 95523 113736 39460 37120GT 154639 148304 173286 59011 55639

TsTv 2378 2362 2364 2236 2193

10 International Journal of Genomics

The sequencing depth of approximately 30x has beensufficient for detecting DNA polymorphisms Our results ofvariant calling of indica lines (1914152 to 2241418 SNPs268400 to 303039 InDels) are in agreementwith the previousstudy reporting that the 93-11 indica possesses about 17million SNPs and 480 thousand InDels [21] However thenumbers of SNPs and InDels (about 714 thousand SNPs and102 thousand InDels) of japonica lines were about five andthree times those of Omachi line (132462 SNPs and 35766InDels) but were similar to those of Moroberekan (827448SNPs and 159597 InDels) a tropical japonica cultivar [1222] The distinct difference between tropical and temperatejaponica landraces has been reported by Arai-Kichise et al[22] Further validation is underway to obtain SNP markersfor rice breeding selection

SNP deserts genome regions of SNP rate less than1 SNPkb were identified in all 5 Vietnamese landraces withsizes varying from 100 kb to 67MB The japonica SNPdeserts are longer than those of indica Mostly SNP desertshave been previously found on chromosome 5 [23 24]however we have identified two additional SNP desertswithin all five Vietnamese landraces with the size of 07Mbon chromosome 1 and 08Mb on chromosome 4 Chenget al [25] have reported that SNP deserts were in thevicinity of the centromere on chromosome 5 but far from thecentromere on chromosomes 1 and 4 Therefore the currentresults have supported the hypothesis that SNP-low regionshave not been correlated with low recombination [26] ThecommonSNPdesertsmight include highly conserved regionsamong the five Vietnamese rice landraces They could resultfrom selective sweeps reducing the variants during humanselection and rice domestication [27] These SNP deserts areable to raise fascinating questions for future studies as towhether the persistence of chromosomal regions is randomor special for the individual landrace

The SNP distribution correlation analysis of the fivelandraces has demonstrated the distinct divergence betweenindica and japonica and disclosed the phylogenetic relation-ships among the landraces (Figure 4) thus it is possible tobe exploited for the verification of landrace classificationsThe relationship among the rice landraces in the correlationof chromosomes could be utilized for the genetic linkagedisequilibrium studies

Additionally the phenomenon ldquotransition biasrdquo whichmeans the ratios of transitions (Ts) and transversions (Tv)larger than 1 2 occurred among the five landraces Thetransitional SNPs aremore tolerated than transversional onesduring mutation and natural selection because they are morelikely synonymous in coding protein resulting in conservingthe protein structure [28] Within transitions a number ofboth AG and CT changes have been rather similar Amongtransversions the AT transversions have been shown to behigher than others which was also reported in the genomesof citrus and rice [17 28 29]

In summary by sequencing and aligning five Vietnameserice genomes with the reference genome Nipponbare ourresults have disclosed interesting genetic information suchas SNP and InDel distributions and effects and SNP desertsThree ldquoSNPdesertrdquo regions whichmight result from selective

sweeps in the domestication of rice landraces were alsoobserved in the different chromosomes Furthermore theSNP distribution analysis has revealed the phylogeny ofindica and japonica with distinct classification Further SNPvalidations need to be examined to identify accurate SNPmarkers for molecular breeding

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Hien Trinh and Khoa Truong Nguyen contributed equallyto this work Mario Caccamo Cuong Nguyen and KhuatHuu Trung contributed to the conception and design of thestudy Sarah Ayling Khoa Truong Nguyen and Hien Trinhcarried out the quality control of sequencing data and dataanalysis Tran Dang Xuan La Hoang Anh Can Thu HuongNguyenThuy Diep LamVan Nguyen and Huy Quang Phamconceived the study and designed and performed the dataanalysis Tran Dang Khanh Hien Trinh and Cuong Nguyendrafted and revised the manuscript All authors contributedto the editing of the final version of the manuscript

Acknowledgments

The authors would like to thank Hung Nguyen from MUInformatics Institute University of Missouri Columbia MO65211 USA for his help in developing the homemade Pythonscript to split 100 kb window sizes across the chromosomesThisworkwas supported by theMinistry of Science andTech-nology (MOST) Vietnam and the Genome Analysis Centre(TGAC) Biotechnology and Biological Sciences ResearchCouncil (BBSRC) through a collaboration program betweenVietnam and theUKThe authors are grateful to colleagues inthis program that have made significant contributions to thecollection analysis or interpretation of samples and results

References

[1] Y Kawahara M Bastide J P Hamilton et al ldquoImprovementof the Oryza sativa Nipponbare reference genome using nextgeneration sequence and optical map datardquo Rice vol 6 no 1article 4 2013

[2] R Brenchley M Spannagl M Pfeifer et al ldquoAnalysis of thebread wheat genome using whole-genome shotgun sequenc-ingrdquo Nature vol 491 no 7426 pp 705ndash710 2012

[3] M Kobayashi H Nagasaki V Garcia et al ldquoGenome-wideanalysis of intraspecific dna polymorphism in lsquoMicro-Tomrsquo amodel cultivar of tomato (solanum lycopersicum)rdquo Plant andCell Physiology vol 55 no 2 pp 445ndash454 2014

[4] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[5] M Shimomura H Kanamori S Komatsu et al ldquoThe Glycinemax cv enrei genome for improvement of Japanese soybean

International Journal of Genomics 11

cultivarsrdquo International Journal of Genomics vol 2015 ArticleID 358127 8 pages 2015

[6] D R Bentley ldquoWhole-genome re-sequencingrdquoCurrent Opinionin Genetics and Development vol 16 no 6 pp 545ndash552 2006

[7] OMorozova andMAMarra ldquoApplications of next-generationsequencing technologies in functional genomicsrdquo Genomicsvol 92 no 5 pp 255ndash264 2008

[8] K LMcNally K L Childs R Bohnert et al ldquoGenomewide SNPvariation reveals relationships among landraces and modernvarieties of ricerdquoProceedings of theNational Academy of Sciencesof the United States of America vol 106 no 30 pp 12273ndash122782009

[9] T YamamotoHNagasaki J-I Yonemaru et al ldquoFine definitionof the pedigree haplotypes of closely related rice cultivars bymeans of genome-wide discovery of single-nucleotide polymor-phismsrdquo BMC Genomics vol 11 article 267 2010

[10] X Huang X Wei T Sang et al ldquoGenome-wide asociationstudies of 14 agronomic traits in rice landracesrdquoNatureGeneticsvol 42 no 11 pp 961ndash967 2010

[11] Y Liu X Qi N D Young K M Olsen A L Caicedo and YJia ldquoCharacterization of resistance genes to rice blast fungusMagnaporthe oryzae in a ldquoGreen Revolutionrdquo rice varietyrdquoMolecular Breeding vol 35 article no 52 2015

[12] Y Arai-Kichise Y Shiwa H Nagasaki et al ldquoDiscovery ofgenome-wide DNA polymorphisms in a landrace cultivar ofjaponica rice by whole-genome sequencingrdquo Plant and CellPhysiology vol 52 no 2 pp 274ndash282 2011

[13] J L Bennetzen ldquoComparative sequence analysis of plantnuclear genomes microcolinearity and its many exceptionsrdquoPlant Cell vol 12 no 7 pp 1021ndash1029 2000

[14] J Yu S Hu J Wang et al ldquoA draft sequence of the rice genome(Oryza sativa L ssp indica)rdquo Science vol 296 no 5565 pp 79ndash92 2002

[15] S A Goff D Ricke T H Lan et al ldquoA draft sequence of therice genome (Oryza sativa L ssp japonica)rdquo Science vol 296no 5565 pp 92ndash100 2005

[16] M Jain K C Moharana R Shankar R Kumari and RGarg ldquoGenomewide discovery of DNA polymorphisms in ricecultivars with contrasting drought and salinity stress responseand their functional relevancerdquoPlant Biotechnology Journal vol12 no 2 pp 253ndash264 2014

[17] P Rathinasabapathi N Purushothaman V L Ramprasad andM Parani ldquoWhole genome sequencing and analysis of Swarnaa widely cultivated indica rice variety with low glycemic indexrdquoScientific Reports vol 5 Article ID 11303 2015

[18] K Tsukada ldquoVietnam food security in a rice-exporting coun-tryrdquo in The World Food Crisis and the Strategies and Asian RiceExporter S Shigetomi K Kubo and K Tsukada Eds SpotSurvey 32 IDE-JETRO Chiba Japan 2011

[19] K H Trung and L H Ham ldquoSequencing the genomes of anumber of native Vietnamese rice linerdquo in Proceedings of the1st National Proceeding of Crop Science VAAS Hanoi VietnamSeptember 2013

[20] Sequencing Project IRG ldquoThe map-based sequence of the ricegenomerdquo Nature vol 436 pp 793ndash800 2005

[21] Y-J Shen H Jiang J-P Jin et al ldquoDevelopment of genome-wide DNA polymorphism database for map-based cloning ofrice genesrdquo Plant Physiology vol 135 no 3 pp 1198ndash1205 2004

[22] Y Arai-Kichise Y Shiwa K Ebana et al ldquoGenome-wide DNApolymorphisms in seven rice cultivars of temperate and tropicaljaponica groupsrdquo PLoS ONE vol 9 no 1 Article ID e863122014

[23] G K Subbaiyan D L EWaters S K Katiyar A R SadanandaS Vaddadi and R J Henry ldquoGenome-wide DNA polymor-phisms in elite indica rice inbreds discovered by whole-genomesequencingrdquo Plant Biotechnology Journal vol 10 no 6 pp 623ndash634 2012

[24] S G Krishnan D L EWaters and R J Henry ldquoAustralian wildrice reveals pre-domestication origin of polymorphism desertsin rice genomerdquoPLoSONE vol 9 no 6 Article ID e98843 2014

[25] Z Cheng F Dong T Langdon et al ldquoFunctional rice cen-tromeres are marked by a satellite repeat and a centromere-specific retrotransposonrdquoThe Plant Cell vol 14 no 8 pp 1691ndash1704 2002

[26] F A Feltus J Wan S R Schulze J C Estill N Jiang and AH Paterson ldquoAn SNP resource for rice genetics and breedingbased on subspecies Indica and Japonica genome alignmentsrdquoGenome Research vol 14 no 9 pp 1812ndash1819 2004

[27] D L Waters and R J Henry ldquoAustralian wild rice reveals pre-domestication origin of polymorphism deserts in rice genomerdquoPLoS ONE vol 9 no 6 Article ID e98843 2014

[28] J Wakeley ldquoThe excess of transitions among nucleotide substi-tutions new methods of estimating transition bias underscoreits significancerdquo Trends in Ecology amp Evolution vol 11 no 4 pp158ndash162 1996

[29] J Terol M A Naranjo P Ollitrault and M Talon ldquoDevelop-ment of genomic resources for Citrus clementina characteriza-tion of three deep-coverage BAC libraries and analysis of 46000BAC end sequencesrdquo BMC Genomics vol 9 article 423 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Whole-Genome Characteristics and Polymorphic Analysis of ...Thewholegenomesoffivericelandraceshaveyielded high-quality reads, from 112 to 161 million reads per line. Most of the reads

International Journal of Genomics 9

2000018823

2248

1554

1082

825

298 327

478

304239

159 137 99 72 56 36 28 36 30 15 12 11 7 4 9 5 15

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

Number of non-synonymous SNPs (SNPskb)

le2 3 4 5 6

69 7 8 910

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

(a)

16770

1827

1321

903

581

59

473334

214 197134 116 74 61 43 34 19 18 14 11 13 8 7 2 3 4 12

18000

16000

17000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

639 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(b)

19953

2352

1724

1293

924

705

534407

298 236160 127 97 81 52 38 29 31 24 16 17 12 14 7 2 1814

20000

18000

16000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

6497 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(c)

12133

982

650

96

314 302231 181

98 83 51 46 35 23 22 23 17 11 13 5 5 5 6 5 6

12000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

668 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

ge25

Number of non-synonymous SNPs (SNPskb)

(d)

12121

957

613

5

402254 207 150 106 77 57 52 44 23 30 21 12 10 6 6 6 64 4 3 5 2

12000

14000

10000

2000

1500

1000

500

0

Num

ber o

f gen

es

le2 3 4 5 6

647 7 8 9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

ge27

Number of non-synonymous SNPs (SNPskb)

(e)

Figure 6 The distribution and skewness of the nonsynonymous SNPs SNPs per 1 kb in (a) indica 12 (b) indica 13 (c) indica 15 (d) japonica11 and (e) japonica 14 The black dash indicates the outlier value cutoff

Table 4 Base changes in five rice genomes

Landraces indica 12 indica 13 indica 15 japonica 11 japonica 14Transitions

AG 705908 671815 787002 255473 236847CT 707496 672959 787976 255542 237620

TransversionsAC 155310 149265 174014 59111 55410AT 182882 176243 205343 70949 68165GC 101541 95523 113736 39460 37120GT 154639 148304 173286 59011 55639

TsTv 2378 2362 2364 2236 2193

10 International Journal of Genomics

The sequencing depth of approximately 30x has beensufficient for detecting DNA polymorphisms Our results ofvariant calling of indica lines (1914152 to 2241418 SNPs268400 to 303039 InDels) are in agreementwith the previousstudy reporting that the 93-11 indica possesses about 17million SNPs and 480 thousand InDels [21] However thenumbers of SNPs and InDels (about 714 thousand SNPs and102 thousand InDels) of japonica lines were about five andthree times those of Omachi line (132462 SNPs and 35766InDels) but were similar to those of Moroberekan (827448SNPs and 159597 InDels) a tropical japonica cultivar [1222] The distinct difference between tropical and temperatejaponica landraces has been reported by Arai-Kichise et al[22] Further validation is underway to obtain SNP markersfor rice breeding selection

SNP deserts genome regions of SNP rate less than1 SNPkb were identified in all 5 Vietnamese landraces withsizes varying from 100 kb to 67MB The japonica SNPdeserts are longer than those of indica Mostly SNP desertshave been previously found on chromosome 5 [23 24]however we have identified two additional SNP desertswithin all five Vietnamese landraces with the size of 07Mbon chromosome 1 and 08Mb on chromosome 4 Chenget al [25] have reported that SNP deserts were in thevicinity of the centromere on chromosome 5 but far from thecentromere on chromosomes 1 and 4 Therefore the currentresults have supported the hypothesis that SNP-low regionshave not been correlated with low recombination [26] ThecommonSNPdesertsmight include highly conserved regionsamong the five Vietnamese rice landraces They could resultfrom selective sweeps reducing the variants during humanselection and rice domestication [27] These SNP deserts areable to raise fascinating questions for future studies as towhether the persistence of chromosomal regions is randomor special for the individual landrace

The SNP distribution correlation analysis of the fivelandraces has demonstrated the distinct divergence betweenindica and japonica and disclosed the phylogenetic relation-ships among the landraces (Figure 4) thus it is possible tobe exploited for the verification of landrace classificationsThe relationship among the rice landraces in the correlationof chromosomes could be utilized for the genetic linkagedisequilibrium studies

Additionally the phenomenon ldquotransition biasrdquo whichmeans the ratios of transitions (Ts) and transversions (Tv)larger than 1 2 occurred among the five landraces Thetransitional SNPs aremore tolerated than transversional onesduring mutation and natural selection because they are morelikely synonymous in coding protein resulting in conservingthe protein structure [28] Within transitions a number ofboth AG and CT changes have been rather similar Amongtransversions the AT transversions have been shown to behigher than others which was also reported in the genomesof citrus and rice [17 28 29]

In summary by sequencing and aligning five Vietnameserice genomes with the reference genome Nipponbare ourresults have disclosed interesting genetic information suchas SNP and InDel distributions and effects and SNP desertsThree ldquoSNPdesertrdquo regions whichmight result from selective

sweeps in the domestication of rice landraces were alsoobserved in the different chromosomes Furthermore theSNP distribution analysis has revealed the phylogeny ofindica and japonica with distinct classification Further SNPvalidations need to be examined to identify accurate SNPmarkers for molecular breeding

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Hien Trinh and Khoa Truong Nguyen contributed equallyto this work Mario Caccamo Cuong Nguyen and KhuatHuu Trung contributed to the conception and design of thestudy Sarah Ayling Khoa Truong Nguyen and Hien Trinhcarried out the quality control of sequencing data and dataanalysis Tran Dang Xuan La Hoang Anh Can Thu HuongNguyenThuy Diep LamVan Nguyen and Huy Quang Phamconceived the study and designed and performed the dataanalysis Tran Dang Khanh Hien Trinh and Cuong Nguyendrafted and revised the manuscript All authors contributedto the editing of the final version of the manuscript

Acknowledgments

The authors would like to thank Hung Nguyen from MUInformatics Institute University of Missouri Columbia MO65211 USA for his help in developing the homemade Pythonscript to split 100 kb window sizes across the chromosomesThisworkwas supported by theMinistry of Science andTech-nology (MOST) Vietnam and the Genome Analysis Centre(TGAC) Biotechnology and Biological Sciences ResearchCouncil (BBSRC) through a collaboration program betweenVietnam and theUKThe authors are grateful to colleagues inthis program that have made significant contributions to thecollection analysis or interpretation of samples and results

References

[1] Y Kawahara M Bastide J P Hamilton et al ldquoImprovementof the Oryza sativa Nipponbare reference genome using nextgeneration sequence and optical map datardquo Rice vol 6 no 1article 4 2013

[2] R Brenchley M Spannagl M Pfeifer et al ldquoAnalysis of thebread wheat genome using whole-genome shotgun sequenc-ingrdquo Nature vol 491 no 7426 pp 705ndash710 2012

[3] M Kobayashi H Nagasaki V Garcia et al ldquoGenome-wideanalysis of intraspecific dna polymorphism in lsquoMicro-Tomrsquo amodel cultivar of tomato (solanum lycopersicum)rdquo Plant andCell Physiology vol 55 no 2 pp 445ndash454 2014

[4] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[5] M Shimomura H Kanamori S Komatsu et al ldquoThe Glycinemax cv enrei genome for improvement of Japanese soybean

International Journal of Genomics 11

cultivarsrdquo International Journal of Genomics vol 2015 ArticleID 358127 8 pages 2015

[6] D R Bentley ldquoWhole-genome re-sequencingrdquoCurrent Opinionin Genetics and Development vol 16 no 6 pp 545ndash552 2006

[7] OMorozova andMAMarra ldquoApplications of next-generationsequencing technologies in functional genomicsrdquo Genomicsvol 92 no 5 pp 255ndash264 2008

[8] K LMcNally K L Childs R Bohnert et al ldquoGenomewide SNPvariation reveals relationships among landraces and modernvarieties of ricerdquoProceedings of theNational Academy of Sciencesof the United States of America vol 106 no 30 pp 12273ndash122782009

[9] T YamamotoHNagasaki J-I Yonemaru et al ldquoFine definitionof the pedigree haplotypes of closely related rice cultivars bymeans of genome-wide discovery of single-nucleotide polymor-phismsrdquo BMC Genomics vol 11 article 267 2010

[10] X Huang X Wei T Sang et al ldquoGenome-wide asociationstudies of 14 agronomic traits in rice landracesrdquoNatureGeneticsvol 42 no 11 pp 961ndash967 2010

[11] Y Liu X Qi N D Young K M Olsen A L Caicedo and YJia ldquoCharacterization of resistance genes to rice blast fungusMagnaporthe oryzae in a ldquoGreen Revolutionrdquo rice varietyrdquoMolecular Breeding vol 35 article no 52 2015

[12] Y Arai-Kichise Y Shiwa H Nagasaki et al ldquoDiscovery ofgenome-wide DNA polymorphisms in a landrace cultivar ofjaponica rice by whole-genome sequencingrdquo Plant and CellPhysiology vol 52 no 2 pp 274ndash282 2011

[13] J L Bennetzen ldquoComparative sequence analysis of plantnuclear genomes microcolinearity and its many exceptionsrdquoPlant Cell vol 12 no 7 pp 1021ndash1029 2000

[14] J Yu S Hu J Wang et al ldquoA draft sequence of the rice genome(Oryza sativa L ssp indica)rdquo Science vol 296 no 5565 pp 79ndash92 2002

[15] S A Goff D Ricke T H Lan et al ldquoA draft sequence of therice genome (Oryza sativa L ssp japonica)rdquo Science vol 296no 5565 pp 92ndash100 2005

[16] M Jain K C Moharana R Shankar R Kumari and RGarg ldquoGenomewide discovery of DNA polymorphisms in ricecultivars with contrasting drought and salinity stress responseand their functional relevancerdquoPlant Biotechnology Journal vol12 no 2 pp 253ndash264 2014

[17] P Rathinasabapathi N Purushothaman V L Ramprasad andM Parani ldquoWhole genome sequencing and analysis of Swarnaa widely cultivated indica rice variety with low glycemic indexrdquoScientific Reports vol 5 Article ID 11303 2015

[18] K Tsukada ldquoVietnam food security in a rice-exporting coun-tryrdquo in The World Food Crisis and the Strategies and Asian RiceExporter S Shigetomi K Kubo and K Tsukada Eds SpotSurvey 32 IDE-JETRO Chiba Japan 2011

[19] K H Trung and L H Ham ldquoSequencing the genomes of anumber of native Vietnamese rice linerdquo in Proceedings of the1st National Proceeding of Crop Science VAAS Hanoi VietnamSeptember 2013

[20] Sequencing Project IRG ldquoThe map-based sequence of the ricegenomerdquo Nature vol 436 pp 793ndash800 2005

[21] Y-J Shen H Jiang J-P Jin et al ldquoDevelopment of genome-wide DNA polymorphism database for map-based cloning ofrice genesrdquo Plant Physiology vol 135 no 3 pp 1198ndash1205 2004

[22] Y Arai-Kichise Y Shiwa K Ebana et al ldquoGenome-wide DNApolymorphisms in seven rice cultivars of temperate and tropicaljaponica groupsrdquo PLoS ONE vol 9 no 1 Article ID e863122014

[23] G K Subbaiyan D L EWaters S K Katiyar A R SadanandaS Vaddadi and R J Henry ldquoGenome-wide DNA polymor-phisms in elite indica rice inbreds discovered by whole-genomesequencingrdquo Plant Biotechnology Journal vol 10 no 6 pp 623ndash634 2012

[24] S G Krishnan D L EWaters and R J Henry ldquoAustralian wildrice reveals pre-domestication origin of polymorphism desertsin rice genomerdquoPLoSONE vol 9 no 6 Article ID e98843 2014

[25] Z Cheng F Dong T Langdon et al ldquoFunctional rice cen-tromeres are marked by a satellite repeat and a centromere-specific retrotransposonrdquoThe Plant Cell vol 14 no 8 pp 1691ndash1704 2002

[26] F A Feltus J Wan S R Schulze J C Estill N Jiang and AH Paterson ldquoAn SNP resource for rice genetics and breedingbased on subspecies Indica and Japonica genome alignmentsrdquoGenome Research vol 14 no 9 pp 1812ndash1819 2004

[27] D L Waters and R J Henry ldquoAustralian wild rice reveals pre-domestication origin of polymorphism deserts in rice genomerdquoPLoS ONE vol 9 no 6 Article ID e98843 2014

[28] J Wakeley ldquoThe excess of transitions among nucleotide substi-tutions new methods of estimating transition bias underscoreits significancerdquo Trends in Ecology amp Evolution vol 11 no 4 pp158ndash162 1996

[29] J Terol M A Naranjo P Ollitrault and M Talon ldquoDevelop-ment of genomic resources for Citrus clementina characteriza-tion of three deep-coverage BAC libraries and analysis of 46000BAC end sequencesrdquo BMC Genomics vol 9 article 423 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Whole-Genome Characteristics and Polymorphic Analysis of ...Thewholegenomesoffivericelandraceshaveyielded high-quality reads, from 112 to 161 million reads per line. Most of the reads

10 International Journal of Genomics

The sequencing depth of approximately 30x has beensufficient for detecting DNA polymorphisms Our results ofvariant calling of indica lines (1914152 to 2241418 SNPs268400 to 303039 InDels) are in agreementwith the previousstudy reporting that the 93-11 indica possesses about 17million SNPs and 480 thousand InDels [21] However thenumbers of SNPs and InDels (about 714 thousand SNPs and102 thousand InDels) of japonica lines were about five andthree times those of Omachi line (132462 SNPs and 35766InDels) but were similar to those of Moroberekan (827448SNPs and 159597 InDels) a tropical japonica cultivar [1222] The distinct difference between tropical and temperatejaponica landraces has been reported by Arai-Kichise et al[22] Further validation is underway to obtain SNP markersfor rice breeding selection

SNP deserts genome regions of SNP rate less than1 SNPkb were identified in all 5 Vietnamese landraces withsizes varying from 100 kb to 67MB The japonica SNPdeserts are longer than those of indica Mostly SNP desertshave been previously found on chromosome 5 [23 24]however we have identified two additional SNP desertswithin all five Vietnamese landraces with the size of 07Mbon chromosome 1 and 08Mb on chromosome 4 Chenget al [25] have reported that SNP deserts were in thevicinity of the centromere on chromosome 5 but far from thecentromere on chromosomes 1 and 4 Therefore the currentresults have supported the hypothesis that SNP-low regionshave not been correlated with low recombination [26] ThecommonSNPdesertsmight include highly conserved regionsamong the five Vietnamese rice landraces They could resultfrom selective sweeps reducing the variants during humanselection and rice domestication [27] These SNP deserts areable to raise fascinating questions for future studies as towhether the persistence of chromosomal regions is randomor special for the individual landrace

The SNP distribution correlation analysis of the fivelandraces has demonstrated the distinct divergence betweenindica and japonica and disclosed the phylogenetic relation-ships among the landraces (Figure 4) thus it is possible tobe exploited for the verification of landrace classificationsThe relationship among the rice landraces in the correlationof chromosomes could be utilized for the genetic linkagedisequilibrium studies

Additionally the phenomenon ldquotransition biasrdquo whichmeans the ratios of transitions (Ts) and transversions (Tv)larger than 1 2 occurred among the five landraces Thetransitional SNPs aremore tolerated than transversional onesduring mutation and natural selection because they are morelikely synonymous in coding protein resulting in conservingthe protein structure [28] Within transitions a number ofboth AG and CT changes have been rather similar Amongtransversions the AT transversions have been shown to behigher than others which was also reported in the genomesof citrus and rice [17 28 29]

In summary by sequencing and aligning five Vietnameserice genomes with the reference genome Nipponbare ourresults have disclosed interesting genetic information suchas SNP and InDel distributions and effects and SNP desertsThree ldquoSNPdesertrdquo regions whichmight result from selective

sweeps in the domestication of rice landraces were alsoobserved in the different chromosomes Furthermore theSNP distribution analysis has revealed the phylogeny ofindica and japonica with distinct classification Further SNPvalidations need to be examined to identify accurate SNPmarkers for molecular breeding

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Hien Trinh and Khoa Truong Nguyen contributed equallyto this work Mario Caccamo Cuong Nguyen and KhuatHuu Trung contributed to the conception and design of thestudy Sarah Ayling Khoa Truong Nguyen and Hien Trinhcarried out the quality control of sequencing data and dataanalysis Tran Dang Xuan La Hoang Anh Can Thu HuongNguyenThuy Diep LamVan Nguyen and Huy Quang Phamconceived the study and designed and performed the dataanalysis Tran Dang Khanh Hien Trinh and Cuong Nguyendrafted and revised the manuscript All authors contributedto the editing of the final version of the manuscript

Acknowledgments

The authors would like to thank Hung Nguyen from MUInformatics Institute University of Missouri Columbia MO65211 USA for his help in developing the homemade Pythonscript to split 100 kb window sizes across the chromosomesThisworkwas supported by theMinistry of Science andTech-nology (MOST) Vietnam and the Genome Analysis Centre(TGAC) Biotechnology and Biological Sciences ResearchCouncil (BBSRC) through a collaboration program betweenVietnam and theUKThe authors are grateful to colleagues inthis program that have made significant contributions to thecollection analysis or interpretation of samples and results

References

[1] Y Kawahara M Bastide J P Hamilton et al ldquoImprovementof the Oryza sativa Nipponbare reference genome using nextgeneration sequence and optical map datardquo Rice vol 6 no 1article 4 2013

[2] R Brenchley M Spannagl M Pfeifer et al ldquoAnalysis of thebread wheat genome using whole-genome shotgun sequenc-ingrdquo Nature vol 491 no 7426 pp 705ndash710 2012

[3] M Kobayashi H Nagasaki V Garcia et al ldquoGenome-wideanalysis of intraspecific dna polymorphism in lsquoMicro-Tomrsquo amodel cultivar of tomato (solanum lycopersicum)rdquo Plant andCell Physiology vol 55 no 2 pp 445ndash454 2014

[4] C B Yadav P Bhareti M Muthamilarasan et al ldquoGenome-wide SNP identification and characterization in two soybeancultivars with contrasting mungbean yellow mosaic india virusdisease resistance traitsrdquo PLoS ONE vol 10 no 4 Article IDe0123897 2015

[5] M Shimomura H Kanamori S Komatsu et al ldquoThe Glycinemax cv enrei genome for improvement of Japanese soybean

International Journal of Genomics 11

cultivarsrdquo International Journal of Genomics vol 2015 ArticleID 358127 8 pages 2015

[6] D R Bentley ldquoWhole-genome re-sequencingrdquoCurrent Opinionin Genetics and Development vol 16 no 6 pp 545ndash552 2006

[7] OMorozova andMAMarra ldquoApplications of next-generationsequencing technologies in functional genomicsrdquo Genomicsvol 92 no 5 pp 255ndash264 2008

[8] K LMcNally K L Childs R Bohnert et al ldquoGenomewide SNPvariation reveals relationships among landraces and modernvarieties of ricerdquoProceedings of theNational Academy of Sciencesof the United States of America vol 106 no 30 pp 12273ndash122782009

[9] T YamamotoHNagasaki J-I Yonemaru et al ldquoFine definitionof the pedigree haplotypes of closely related rice cultivars bymeans of genome-wide discovery of single-nucleotide polymor-phismsrdquo BMC Genomics vol 11 article 267 2010

[10] X Huang X Wei T Sang et al ldquoGenome-wide asociationstudies of 14 agronomic traits in rice landracesrdquoNatureGeneticsvol 42 no 11 pp 961ndash967 2010

[11] Y Liu X Qi N D Young K M Olsen A L Caicedo and YJia ldquoCharacterization of resistance genes to rice blast fungusMagnaporthe oryzae in a ldquoGreen Revolutionrdquo rice varietyrdquoMolecular Breeding vol 35 article no 52 2015

[12] Y Arai-Kichise Y Shiwa H Nagasaki et al ldquoDiscovery ofgenome-wide DNA polymorphisms in a landrace cultivar ofjaponica rice by whole-genome sequencingrdquo Plant and CellPhysiology vol 52 no 2 pp 274ndash282 2011

[13] J L Bennetzen ldquoComparative sequence analysis of plantnuclear genomes microcolinearity and its many exceptionsrdquoPlant Cell vol 12 no 7 pp 1021ndash1029 2000

[14] J Yu S Hu J Wang et al ldquoA draft sequence of the rice genome(Oryza sativa L ssp indica)rdquo Science vol 296 no 5565 pp 79ndash92 2002

[15] S A Goff D Ricke T H Lan et al ldquoA draft sequence of therice genome (Oryza sativa L ssp japonica)rdquo Science vol 296no 5565 pp 92ndash100 2005

[16] M Jain K C Moharana R Shankar R Kumari and RGarg ldquoGenomewide discovery of DNA polymorphisms in ricecultivars with contrasting drought and salinity stress responseand their functional relevancerdquoPlant Biotechnology Journal vol12 no 2 pp 253ndash264 2014

[17] P Rathinasabapathi N Purushothaman V L Ramprasad andM Parani ldquoWhole genome sequencing and analysis of Swarnaa widely cultivated indica rice variety with low glycemic indexrdquoScientific Reports vol 5 Article ID 11303 2015

[18] K Tsukada ldquoVietnam food security in a rice-exporting coun-tryrdquo in The World Food Crisis and the Strategies and Asian RiceExporter S Shigetomi K Kubo and K Tsukada Eds SpotSurvey 32 IDE-JETRO Chiba Japan 2011

[19] K H Trung and L H Ham ldquoSequencing the genomes of anumber of native Vietnamese rice linerdquo in Proceedings of the1st National Proceeding of Crop Science VAAS Hanoi VietnamSeptember 2013

[20] Sequencing Project IRG ldquoThe map-based sequence of the ricegenomerdquo Nature vol 436 pp 793ndash800 2005

[21] Y-J Shen H Jiang J-P Jin et al ldquoDevelopment of genome-wide DNA polymorphism database for map-based cloning ofrice genesrdquo Plant Physiology vol 135 no 3 pp 1198ndash1205 2004

[22] Y Arai-Kichise Y Shiwa K Ebana et al ldquoGenome-wide DNApolymorphisms in seven rice cultivars of temperate and tropicaljaponica groupsrdquo PLoS ONE vol 9 no 1 Article ID e863122014

[23] G K Subbaiyan D L EWaters S K Katiyar A R SadanandaS Vaddadi and R J Henry ldquoGenome-wide DNA polymor-phisms in elite indica rice inbreds discovered by whole-genomesequencingrdquo Plant Biotechnology Journal vol 10 no 6 pp 623ndash634 2012

[24] S G Krishnan D L EWaters and R J Henry ldquoAustralian wildrice reveals pre-domestication origin of polymorphism desertsin rice genomerdquoPLoSONE vol 9 no 6 Article ID e98843 2014

[25] Z Cheng F Dong T Langdon et al ldquoFunctional rice cen-tromeres are marked by a satellite repeat and a centromere-specific retrotransposonrdquoThe Plant Cell vol 14 no 8 pp 1691ndash1704 2002

[26] F A Feltus J Wan S R Schulze J C Estill N Jiang and AH Paterson ldquoAn SNP resource for rice genetics and breedingbased on subspecies Indica and Japonica genome alignmentsrdquoGenome Research vol 14 no 9 pp 1812ndash1819 2004

[27] D L Waters and R J Henry ldquoAustralian wild rice reveals pre-domestication origin of polymorphism deserts in rice genomerdquoPLoS ONE vol 9 no 6 Article ID e98843 2014

[28] J Wakeley ldquoThe excess of transitions among nucleotide substi-tutions new methods of estimating transition bias underscoreits significancerdquo Trends in Ecology amp Evolution vol 11 no 4 pp158ndash162 1996

[29] J Terol M A Naranjo P Ollitrault and M Talon ldquoDevelop-ment of genomic resources for Citrus clementina characteriza-tion of three deep-coverage BAC libraries and analysis of 46000BAC end sequencesrdquo BMC Genomics vol 9 article 423 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Whole-Genome Characteristics and Polymorphic Analysis of ...Thewholegenomesoffivericelandraceshaveyielded high-quality reads, from 112 to 161 million reads per line. Most of the reads

International Journal of Genomics 11

cultivarsrdquo International Journal of Genomics vol 2015 ArticleID 358127 8 pages 2015

[6] D R Bentley ldquoWhole-genome re-sequencingrdquoCurrent Opinionin Genetics and Development vol 16 no 6 pp 545ndash552 2006

[7] OMorozova andMAMarra ldquoApplications of next-generationsequencing technologies in functional genomicsrdquo Genomicsvol 92 no 5 pp 255ndash264 2008

[8] K LMcNally K L Childs R Bohnert et al ldquoGenomewide SNPvariation reveals relationships among landraces and modernvarieties of ricerdquoProceedings of theNational Academy of Sciencesof the United States of America vol 106 no 30 pp 12273ndash122782009

[9] T YamamotoHNagasaki J-I Yonemaru et al ldquoFine definitionof the pedigree haplotypes of closely related rice cultivars bymeans of genome-wide discovery of single-nucleotide polymor-phismsrdquo BMC Genomics vol 11 article 267 2010

[10] X Huang X Wei T Sang et al ldquoGenome-wide asociationstudies of 14 agronomic traits in rice landracesrdquoNatureGeneticsvol 42 no 11 pp 961ndash967 2010

[11] Y Liu X Qi N D Young K M Olsen A L Caicedo and YJia ldquoCharacterization of resistance genes to rice blast fungusMagnaporthe oryzae in a ldquoGreen Revolutionrdquo rice varietyrdquoMolecular Breeding vol 35 article no 52 2015

[12] Y Arai-Kichise Y Shiwa H Nagasaki et al ldquoDiscovery ofgenome-wide DNA polymorphisms in a landrace cultivar ofjaponica rice by whole-genome sequencingrdquo Plant and CellPhysiology vol 52 no 2 pp 274ndash282 2011

[13] J L Bennetzen ldquoComparative sequence analysis of plantnuclear genomes microcolinearity and its many exceptionsrdquoPlant Cell vol 12 no 7 pp 1021ndash1029 2000

[14] J Yu S Hu J Wang et al ldquoA draft sequence of the rice genome(Oryza sativa L ssp indica)rdquo Science vol 296 no 5565 pp 79ndash92 2002

[15] S A Goff D Ricke T H Lan et al ldquoA draft sequence of therice genome (Oryza sativa L ssp japonica)rdquo Science vol 296no 5565 pp 92ndash100 2005

[16] M Jain K C Moharana R Shankar R Kumari and RGarg ldquoGenomewide discovery of DNA polymorphisms in ricecultivars with contrasting drought and salinity stress responseand their functional relevancerdquoPlant Biotechnology Journal vol12 no 2 pp 253ndash264 2014

[17] P Rathinasabapathi N Purushothaman V L Ramprasad andM Parani ldquoWhole genome sequencing and analysis of Swarnaa widely cultivated indica rice variety with low glycemic indexrdquoScientific Reports vol 5 Article ID 11303 2015

[18] K Tsukada ldquoVietnam food security in a rice-exporting coun-tryrdquo in The World Food Crisis and the Strategies and Asian RiceExporter S Shigetomi K Kubo and K Tsukada Eds SpotSurvey 32 IDE-JETRO Chiba Japan 2011

[19] K H Trung and L H Ham ldquoSequencing the genomes of anumber of native Vietnamese rice linerdquo in Proceedings of the1st National Proceeding of Crop Science VAAS Hanoi VietnamSeptember 2013

[20] Sequencing Project IRG ldquoThe map-based sequence of the ricegenomerdquo Nature vol 436 pp 793ndash800 2005

[21] Y-J Shen H Jiang J-P Jin et al ldquoDevelopment of genome-wide DNA polymorphism database for map-based cloning ofrice genesrdquo Plant Physiology vol 135 no 3 pp 1198ndash1205 2004

[22] Y Arai-Kichise Y Shiwa K Ebana et al ldquoGenome-wide DNApolymorphisms in seven rice cultivars of temperate and tropicaljaponica groupsrdquo PLoS ONE vol 9 no 1 Article ID e863122014

[23] G K Subbaiyan D L EWaters S K Katiyar A R SadanandaS Vaddadi and R J Henry ldquoGenome-wide DNA polymor-phisms in elite indica rice inbreds discovered by whole-genomesequencingrdquo Plant Biotechnology Journal vol 10 no 6 pp 623ndash634 2012

[24] S G Krishnan D L EWaters and R J Henry ldquoAustralian wildrice reveals pre-domestication origin of polymorphism desertsin rice genomerdquoPLoSONE vol 9 no 6 Article ID e98843 2014

[25] Z Cheng F Dong T Langdon et al ldquoFunctional rice cen-tromeres are marked by a satellite repeat and a centromere-specific retrotransposonrdquoThe Plant Cell vol 14 no 8 pp 1691ndash1704 2002

[26] F A Feltus J Wan S R Schulze J C Estill N Jiang and AH Paterson ldquoAn SNP resource for rice genetics and breedingbased on subspecies Indica and Japonica genome alignmentsrdquoGenome Research vol 14 no 9 pp 1812ndash1819 2004

[27] D L Waters and R J Henry ldquoAustralian wild rice reveals pre-domestication origin of polymorphism deserts in rice genomerdquoPLoS ONE vol 9 no 6 Article ID e98843 2014

[28] J Wakeley ldquoThe excess of transitions among nucleotide substi-tutions new methods of estimating transition bias underscoreits significancerdquo Trends in Ecology amp Evolution vol 11 no 4 pp158ndash162 1996

[29] J Terol M A Naranjo P Ollitrault and M Talon ldquoDevelop-ment of genomic resources for Citrus clementina characteriza-tion of three deep-coverage BAC libraries and analysis of 46000BAC end sequencesrdquo BMC Genomics vol 9 article 423 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 12: Whole-Genome Characteristics and Polymorphic Analysis of ...Thewholegenomesoffivericelandraceshaveyielded high-quality reads, from 112 to 161 million reads per line. Most of the reads

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology