© 2010 by the samuel roberts noble foundation, inc. 1 the samuel roberts noble foundation, 2510 sam...

1
© 2010 by The Samuel Roberts Noble Foundation, Inc 1 The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA 2 National Center for Genome Resources, Santa Fe, NM, 87505, USA Xuehui Li 1 , Ananta Acharya 1 , Andrew D. Farmer 2 , John A. Crow 2 , Arvind K. Bharti 2 , Yanling Wei 1 , Yuanhong Han 1 , Jiqing Gou 1 , Gregory D. May 2 , Maria J. Monteros 1 , E. Charles Brummer 1 Illumina sequencing of 27 cultivated and wild alfalfa transcriptomes: gene and single nucleotide polymorphism (SNP) discovery Introduction Alfalfa, a perennial, outcrossing species, is a widely planted forage legume. Currently, improvement of cultivated alfalfa mainly relies on recurrent phenotypic selection. Marker-assisted breeding holds promise to propel alfalfa improvement, but is constrained by the lack of a large number of markers. With low cost and high time/labor efficiency, next generation sequencing enables high-throughput discovery of SNPs, even for species with large complex genomes. In this experiment, our objective was to increase the number of SNPs for alfalfa research and molecular breeding. Materials and Methods We have sequenced 27 alfalfa genotypes (23 cultivated tetraploids and four wild diploids). Total RNA was isolated from young and old stems, and pooled for each genotype for Illumina sequencing. Each transcriptome was sequenced on a single lane of the Illumina Genome Analyzer IIx to produce about 17-32 million 72-bp reads. Quality-filtered reads were used for de novo assembly to generate contigs. To assess the representation and quality of our alfalfa assembly, BLASTx was performed against the annotated non-redundant GenBank protein database. SNPs were detected by realigned reads to the assembled contigs under conditions of : (1) average quality of bases calling the SNP >20; (2) number of uniquely aligned reads calling the SNP >=20; and (3) p value of contingency test <0.01. References Li et al., 2011. Prevalence of segregation distortion in diploid alfalfa and its implications for genetics and breeding applications. Theor. Appl. Genet . 123:667-679. Robins et al., 2007. Genetic Sequencing of 27 genotypes resulted in a total of 740 million reads (Table 1), the assembling of which generated 25,183 contigs with a total length of 26.8 Mbp and an average length of 1,065 bp, giving an average read depth of 56-fold for each genotype. Overall, 21,954 (87.2%) of 25,183 contigs matched to 14,878 unique protein accessions. The realignment of reads to the contigs enabled the detection of 873,384 putative SNPs and 25,183 InDels. In total, 7,812 (31%) of the 25,183 contigs aligned to M. truncatula pseudomolecules version 3.5.1, carrying 298,771 SNPs and 9,205 InDels, which were widely distributed along the eight chromosomes (Figure 1). High Resolution Melting (HRM) analysis of 192 putative SNPs validated about 85% of them, including confirming the allele dosage inferred from sequencing (Figure 2a and 2b). Principle Components Analysis (PCA) with the 173,947 SNPs indicated that subspecies falcata is clearly separated from diploid caerulea and tetraploid sativa (cultivated tetraploid alfalfa) (Figure 3). Selected SNPs have been mapped to tetraploid and diploid alfalfa linkage maps previously constructed with RFLP and SSR markers (Li et al., 2011; Robins et al., 2007) (Figure 4). An alfalfa Illumina Infinium array with ~10,000 SNPs is being developed, which will enable high-throughput genotyping and facilitate genome-wide association studies and genomic selection in alfalfa. Our results demonstrated that next generation transcriptome sequencing is an efficient way to discover high quality SNPs for alfalfa. These ESTs and SNP markers could effectively contribute to future alfalfa research and breeding applications. Results and Conclusion Figure 2. Examples of high resolution melting analysis of SNP (a) Validation of three SNP phenotypes; (b) Validation of potential allele dose in heterozygotes. Figure 3. PCA analysis of 27 genotypes. Blue solid circle represents tetraploid sativa; red solid circle represents tetraploid falcata; blue triangle represents diploid caerulea; red triangle represents diploid falcata. Figure 1. SNPs distribution along eight chromosomes of M. truncatula. X- axis is the genome location for each chromosome. The number of SNPs per 1,000 bp was calculated for each 0.5 million base pair interval and plotted on the Y-axis. Figure 4. Physical map of M.truncatula (Build 3.0) and genetic linkage maps for one diploid (CC78) and one tetraploid mapping population (ABE408×Wis6) based on RFLP, SSR, and SNP. The physical locations indicated on the maps are all in the scale of 5 × 10 5 base pairs. Markers in red on linkage maps are SNPs. -150 -100 -50 0 50 -100 -50 0 50 PC1 PC2 PI251830-K G abes PI243225-A PI631816-A PI577551-D Magali W isfal6 0 5 10 15 20 25 30 35 0.0 1.0 2.0 3.0 C hr1 0 5 10 15 20 25 30 35 0.0 1.0 2.0 3.0 C hr2 0 10 20 30 40 0.0 1.0 2.0 3.0 C hr3 0 10 20 30 40 0.0 1.0 2.0 3.0 C hr4 0 10 20 30 40 0.0 1.0 2.0 3.0 C hr5 0 5 10 15 20 0.0 1.0 2.0 3.0 C hr6 0 10 20 30 40 0.0 1.0 2.0 3.0 C hr7 0 5 10 15 20 25 30 0.0 1.0 2.0 3.0 C hr8 Entry Ploidy Subspecies Total Reads (millions) Quality-filtered reads (millions) PI251830-K 2 subsp.falcata 29.6 24.7 PI631816-A 2 subsp.falcata 25.1 21.6 PI577551-D 2 subsp.caerula e 19.5 16.4 PI243225-A 2 subsp.caerula e 26.2 23.3 Magali-A 4 subsp.sativa 18.8 16.9 Gabes 4 subsp.sativa 17.2 15.1 Dairyland263 4 subsp.sativa 24.7 20.8 Dairyland879 W4 4 subsp.sativa 27.4 22.9 Dairyland833 4 subsp.sativa 30.1 25.4 Dairyland317 4 subsp.sativa 28.1 23.9 NL002724 4 subsp.sativa 23.1 20.8 LH050543 4 subsp.sativa 24.9 22.1 DW000577 4 subsp.sativa 24.5 21.2 CV020017 4 subsp.sativa 28.1 25 CWI-4 4 subsp.sativa 24.8 20.5 CWD-10 4 subsp.sativa 27.5 23.8 CWB-7 4 subsp.sativa 31.6 27 CWA-9 4 subsp.sativa 28.2 24.9 B86-220 4 subsp.sativa 29.9 25.8 B85-920 4 subsp.sativa 29.5 25.3 B85-912 4 subsp.sativa 23.9 20.7 B75GH-402 4 subsp.sativa 26.5 21.2 Wisfal6 4 subsp.falcata 26.8 23.3 ABI408 4 subsp.sativa 27.3 23.4 NECS141 4 subsp.sativa 25.8 22.2 Altet4 4 subsp.sativa 26.8 23 95-608 4 subsp.sativa 28.1 23.6 Table 1. 27 genotypes used in this study and sequence statistics (a) (b)

Upload: muriel-gilmore

Post on 04-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © 2010 by The Samuel Roberts Noble Foundation, Inc. 1 The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA 2 National Center

© 2010 by The Samuel Roberts Noble Foundation, Inc.

1 The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA2 National Center for Genome Resources, Santa Fe, NM, 87505, USA

Xuehui Li1, Ananta Acharya1, Andrew D. Farmer2, John A. Crow2, Arvind K. Bharti2, Yanling Wei1, Yuanhong Han1, Jiqing Gou1, Gregory D. May2, Maria J. Monteros1, E. Charles Brummer1

Illumina sequencing of 27 cultivated and wild alfalfa transcriptomes: gene and single nucleotide polymorphism (SNP) discovery

Introduction

Alfalfa, a perennial, outcrossing species, is a widely planted forage legume. Currently, improvement of cultivated alfalfa mainly relies on recurrent phenotypic selection. Marker-assisted breeding holds promise to propel alfalfa improvement, but is constrained by the lack of a large number of markers. With low cost and high time/labor efficiency, next generation sequencing enables high-throughput discovery of SNPs, even for species with large complex genomes. In this experiment, our objective was to increase the number of SNPs for alfalfa research and molecular breeding.

Materials and MethodsWe have sequenced 27 alfalfa genotypes (23 cultivated tetraploids and four wild diploids). Total RNA was isolated from young and old stems, and pooled for each genotype for Illumina sequencing. Each transcriptome was sequenced on a single lane of the Illumina Genome Analyzer IIx to produce about 17-32 million 72-bp reads. Quality-filtered reads were used for de novo assembly to generate contigs. To assess the representation and quality of our alfalfa assembly, BLASTx was performed against the annotated non-redundant GenBank protein database. SNPs were detected by realigned reads to the assembled contigs under conditions of : (1) average quality of bases calling the SNP >20; (2) number of uniquely aligned reads calling the SNP >=20; and (3) p value of contingency test <0.01.

ReferencesLi et al., 2011. Prevalence of segregation distortion in diploid alfalfa and its implications for genetics and breeding applications. Theor. Appl. Genet. 123:667-679.

Robins et al., 2007. Genetic mapping of biomass production in tetraploid alfalfa. Crop Sci. 47:1-10.

AcknowledgmentThis project is funded by the USDA National Institute of Food and Agriculture.

Sequencing of 27 genotypes resulted in a total of 740 million reads (Table 1), the assembling of which generated 25,183 contigs with a total length of 26.8 Mbp and an average length of 1,065 bp, giving an average read depth of 56-fold for each genotype.

Overall, 21,954 (87.2%) of 25,183 contigs matched to 14,878 unique protein accessions.

The realignment of reads to the contigs enabled the detection of 873,384 putative SNPs and 25,183 InDels. In total, 7,812 (31%) of the 25,183 contigs aligned to M. truncatula pseudomolecules version 3.5.1, carrying 298,771 SNPs and 9,205 InDels, which were widely distributed along the eight chromosomes (Figure 1).

High Resolution Melting (HRM) analysis of 192 putative SNPs validated about 85% of them, including confirming the allele dosage inferred from sequencing (Figure 2a and 2b).

Principle Components Analysis (PCA) with the 173,947 SNPs indicated that subspecies falcata is clearly separated from diploid caerulea and tetraploid sativa (cultivated tetraploid alfalfa) (Figure 3).

Selected SNPs have been mapped to tetraploid and diploid alfalfa linkage maps previously constructed with RFLP and SSR markers (Li et al., 2011; Robins et al., 2007) (Figure 4).

An alfalfa Illumina Infinium array with ~10,000 SNPs is being developed, which will enable high-throughput genotyping and facilitate genome-wide association studies and genomic selection in alfalfa.

Our results demonstrated that next generation transcriptome sequencing is an efficient way to discover high quality SNPs for alfalfa. These ESTs and SNP markers could effectively contribute to future alfalfa research and breeding applications.

Results and Conclusion

Figure 2. Examples of high resolution melting analysis of SNP (a) Validation of three SNP phenotypes; (b) Validation of potential allele dose in heterozygotes.

Figure 3. PCA analysis of 27 genotypes. Blue solid circle represents tetraploid sativa; red solid circle represents tetraploid falcata; blue triangle represents diploid caerulea; red triangle represents diploid falcata.

Figure 1. SNPs distribution along eight chromosomes of M. truncatula. X-axis is the genome location for each chromosome. The number of SNPs per 1,000 bp was calculated for each 0.5 million base pair interval and plotted on the Y-axis.

Figure 4. Physical map of M.truncatula (Build 3.0) and genetic linkage maps for one diploid (CC78) and one tetraploid mapping population (ABE408×Wis6) based on RFLP, SSR, and SNP. The physical locations indicated on the maps are all in the scale of 5 × 105 base pairs. Markers in red on linkage maps are SNPs.

-150 -100 -50 0 50

-10

0-5

00

50

PC1

PC

2

PI251830-K

GabesPI243225-A

PI631816-A

PI577551-D

Magali

Wisfal6

0 5 10 15 20 25 30 35

0.0

1.0

2.0

3.0

Chr

1

0 5 10 15 20 25 30 35

0.0

1.0

2.0

3.0

Chr

2

0 10 20 30 40

0.0

1.0

2.0

3.0

Chr

3

0 10 20 30 40

0.0

1.0

2.0

3.0

Chr

4

0 10 20 30 40

0.0

1.0

2.0

3.0

Chr

5

0 5 10 15 20

0.0

1.0

2.0

3.0

Chr

6

0 10 20 30 40

0.0

1.0

2.0

3.0

Chr

7

0 5 10 15 20 25 30

0.0

1.0

2.0

3.0

Chr

8

Entry Ploidy SubspeciesTotal Reads

(millions)Quality-filtered reads

(millions)

PI251830-K 2 subsp.falcata 29.6 24.7

PI631816-A 2 subsp.falcata 25.1 21.6

PI577551-D 2 subsp.caerulae 19.5 16.4

PI243225-A 2 subsp.caerulae 26.2 23.3

Magali-A 4 subsp.sativa 18.8 16.9

Gabes 4 subsp.sativa 17.2 15.1

Dairyland263 4 subsp.sativa 24.7 20.8

Dairyland879W4 4 subsp.sativa 27.4 22.9

Dairyland833 4 subsp.sativa 30.1 25.4

Dairyland317 4 subsp.sativa 28.1 23.9

NL002724 4 subsp.sativa 23.1 20.8

LH050543 4 subsp.sativa 24.9 22.1

DW000577 4 subsp.sativa 24.5 21.2

CV020017 4 subsp.sativa 28.1 25

CWI-4 4 subsp.sativa 24.8 20.5

CWD-10 4 subsp.sativa 27.5 23.8

CWB-7 4 subsp.sativa 31.6 27

CWA-9 4 subsp.sativa 28.2 24.9

B86-220 4 subsp.sativa 29.9 25.8

B85-920 4 subsp.sativa 29.5 25.3

B85-912 4 subsp.sativa 23.9 20.7

B75GH-402 4 subsp.sativa 26.5 21.2

Wisfal6 4 subsp.falcata 26.8 23.3

ABI408 4 subsp.sativa 27.3 23.4

NECS141 4 subsp.sativa 25.8 22.2

Altet4 4 subsp.sativa 26.8 23

95-608 4 subsp.sativa 28.1 23.6

Table 1. 27 genotypes used in this study and sequence statistics

(a)

(b)