next generation cassava breeding · next generation cassava breeding proposal july 6, 2012...

56
Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation By the Office of Sponsored Programs on behalf of the Department of International Programs College of Agriculture and Life Sciences Cornell University Contact Dr. Ronnie Coffman International Programs, College of Agriculture and Life Sciences 252 Emerson Hall, Cornell University Ithaca, NY 14853-7801 (USA) Telephone: (607) 255-3035 Fax: (607) 255-6683 E-mail: [email protected]

Upload: dotuong

Post on 27-Apr-2018

222 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Next Generation Cassava Breeding

Proposal July 6, 2012

Submitted to the Bill & Melinda Gates Foundation

By the Office of Sponsored Programs on behalf of the Department of International Programs

College of Agriculture and Life Sciences Cornell University

Contact Dr. Ronnie Coffman

International Programs, College of Agriculture and Life Sciences 252 Emerson Hall, Cornell University

Ithaca, NY 14853-7801 (USA) Telephone: (607) 255-3035 Fax: (607) 255-6683

E-mail: [email protected]

caa9
Typewritten Text
Financial Information Redacted
caa9
Typewritten Text
caa9
Typewritten Text
caa9
Typewritten Text
caa9
Typewritten Text
caa9
Typewritten Text
caa9
Typewritten Text
caa9
Typewritten Text
caa9
Typewritten Text
caa9
Typewritten Text
caa9
Typewritten Text
caa9
Typewritten Text
caa9
Typewritten Text
caa9
Typewritten Text
caa9
Typewritten Text
Page 2: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Grant  Proposal    Date July 6, 2012 Project Title Next Generation Cassava Breeding Organization Name Cornell University Institutional official authorized to submit and accept grants on behalf of the organization: Prefix First Name Tom Surname Frank Suffix Title Grant and Contract Officer II Address 373 Pine Tree Road, Ithaca, NY 14850 Telephone 607-255-2943 Fax 607-255-5058 Email [email protected] Web Site

Project Director/Primary Contact: Prefix Dr First Name Ronnie Surname Coffman Suffix

Title Director, International Programs, College of Agriculture and Life Sciences, Cornell University

Address 252 Emerson Hall, Ithaca NY 14853 Telephone 607-255-3035 Fax 607-255-6683 Email [email protected] Web Site

U.S. Tax Status (Refer to Tax Status Definitions) Not for profit –tax exempt Geographic Location(s) of Project Sub-Saharan Africa: especially Nigeria, Uganda Amount Requested from Foundation in Dollars (U.S.)

Project Duration (months) 60

Organization’s Fiscal Year-End Date June 30 Estimated Total Cost of Project in Dollars (U.S.) Organization’s Total Revenue for Most Recent Audited Financial Year in Dollars (U.S.)  

caa9
Typewritten Text
Page 3: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Table of Contents

I. Charitable Purpose ................................................................................................................................. 1 II. Executive Summary .............................................................................................................................. 1 III. Project Description .............................................................................................................................. 1

Objective 1: Fast Cassava: Improved Reproduction ............................................................................. 3

Objective 2: Tools for Genomic Selection Implemented ...................................................................... 3

Objective 3: Database Developed and Curated ..................................................................................... 6 Objective 4: Germplasm Developed .................................................................................................... 7

Objective 5: Infrastructure Developed and Plant Breeders Trained ...................................................... 8 Objective 6: Biotechnology/biosafety education and awareness ........................................................... 9

Objective 7. Enhancement of Genomic Selection through Cassava Genomics ................................... 10 Objective 8. Project Management and Communication .......................................................................11

IV. Alignment with Foundation Strategy .............................................................................................. 12 V. Sustainability and Scalability ............................................................................................................. 12

VI. Implementation, Intended Results, and Results Measurement ..................................................... 12 A. Results Framework ......................................................................................................... Appendix A

B. Project Plan ...................................................................................................................................... 13 C. Analysis ........................................................................................................................................... 13

D. Assumptions and Risks .....................................................................................................................14 E. Measurement .................................................................................................................................... 15

References .............................................................................................................................................16

VII. Institutional Capacity ...................................................................................................................... 17

Appendix A: Results Framework and Milestones ................................................................................ 20 Appendix B: Preliminary Results ........................................................................................................... 31 Appendix C: Supporting Documentation for Specific Objectives ........................................................ 37 Appendix D: PhD Student Project Proposals ......................................................................................... 51 Appendix E: Description of MSc Training Plan .................................................................................... 53

Page 4: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

Project Title: Next Generation Cassava Breeding Organization: Cornell University with sub-grantees in CGIAR, NARS in Nigeria and Uganda

I. Charitable Purpose: To dramatically increase the rate of genetic improvement of cassava in sub-Saharan Africa.

II. Executive Summary: Genetic improvement of cassava has been slow because of a lack of interest from the private sector, and because the biology of cassava makes traditional breeding approaches very inefficient. The primary goal of this project is to put in place methods and infrastructure that will dramatically increase the rate of genetic improvement of cassava varieties in sub-Saharan Africa, starting with programs in Nigeria and Uganda. This will be accomplished by two approaches in parallel: 1) hastening the time to flower and improving the seed set of cassava so that crosses can be made more easily and will produce more progeny and 2) applying statistical methods that can evaluate seedlings based on genotype alone, bypassing the need for expensive and time-consuming yield trials. The second objective is enabled by next-generation sequencing technology that can generate high numbers of molecular markers at low cost and by implementing a state-of-the-art information technology infrastructure tailored to use of these data in breeding. In addition, this project will build capacity in the Ugandan national program and will develop a plan for incorporation of exotic germplasm into breeding programs so that African cassava can benefit from Latin American diversity. With the development of expertise in Nigeria and Uganda, we anticipate the replication of this approach in other cassava breeding programs in Africa.

III. Project Description:

Obstacles to Cassava Breeding

There are two major obstacles to cassava breeding. One is the frequent difficulty in making a cross due to a long and variable juvenile phase before flower initiation, sparse flower number, and poor seed set in the parents. Cassava shows wide variation in flowering time, rate, and fertility. Even the plants with highest fertility produce a maximum of three seeds per flower. Unreliable flowering and small seed number limit the production of segregating progenies between the best parents (Ceballos, 2004). Poor flowering has been selected during domestication because cassava is clonally propagated, the edible portion is vegetative, and flowering is associated with branching, which is considered undesirable. An ideal solution to this problem would be a method to induce flowering in parent plants without transmitting the flowering trait to progeny. Thus, released varieties would retain the desirable trait of limited flowering. The identification of such a solution is the goal of Objective 1.

The second major obstacle is the length of the breeding cycle, a consequence of the low number of vegetative propagules (“stakes”) that can be obtained from one plant. Cassava breeding relies on phenotypic characterization of mature plants that have been clonally propagated, since agronomic phenotypes cannot be measured in a meaningful way in individual plants grown from seed. Because the number of stakes that can be obtained from a single plant is fairly small (5-10), several cycles of propagation are needed to obtain sufficient material for large-scale (replicated and multi-location) evaluation (Table 1). Consequently, it typically takes five to six years from seedling germination to multi-location yield trials where meaningful measurements can be made for traits such as yield, end user quality, and local adaptation. Because it is not feasible to advance all seedlings through the full evaluation process, progenies are crudely selected at the seedling stage. For many traits, including yield, seedling assessments are poor predictors of adult performance. An ability to predict adult performance at the seedling stage would:

• dramatically reduce the length of the breeding cycle • increase the number of crosses and selection per unit time • increase the number of seedlings that could be accurately evaluated

1

Page 5: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

Implementation of such a system for predicting adult performance for seedlings, through genomic selection, is the goal of Objective 2. Timelines for one-year and two-year breeding cycles are shown in Appendix C.

Table 1. Phenotypic Selection vs Genomic Selection.

Year Phenotypic Selection* 5-year cycle, 8 years to clone release

Genomic Selection 2-year cycle, 6 years to clone release

Cycle Start Dates

1 1000 crosses among top 50 clones 1000 crosses among top 50 clones Start GS and PS Cycle 1

2 100,000 seedlings; screen for disease 100,000 seedlings; screen for disease Genotype 10,000 seedlings. Select clones for inclusion in crossing

3 3,000 clones; single 5m rows

Training population update: 3,000 clones; single 5m rows Variety development: 100 clones; single 10m rows, 2 reps

Start GS Cycle 2

4 100 clones; single 10m rows, 2 reps 15 clones; 4 x 10m plots, 4 reps, 10 locs

5 25 clones; 4 x 10m rows, 4 reps, 4 locs Select clones for inclusion in crossing 5 clones; large plots in on-farm testing Start GS

Cycle 3

6 15 clones; 4 x 10m rows, 4 reps, 10 locs Multiplication and release as variety Start PS

Cycle 2

7 5 clones; large plots in on-farm testing Start GS Cycle 4

8 Multiplication and release as variety * taken from IITA cassava breeding scheme (IITA, 1990).

In addition to unreliable flowering and a very long breeding cycle, there is another perceived obstacle to cassava breeding: the very high levels of heterozygosity in cassava clones. Historically, molecular plant breeding has relied on the availability of inbred lines, which can be developed even in an outcrossing crop that suffers inbreeding depression. These resources have not been developed for cassava, which relies instead on pseudo-F2 populations for mapping QTL. Thus, the standard approaches of backcrossing, fine-mapping, introgression, etc, are not easily applied. Some cassava scientists believe that it is important to develop inbred lines and to follow the maize paradigm, which has been extremely successful, even though it would require the investment of large amounts of time and money. In contrast, implementation of genomic selection does not require inbred lines, and high levels of heterozygosity are not an obstacle: the prediction models of genomic selection capture additive effects, and those effects can be captured as readily in the heterozygous state as in the homozygous state.

In considering the contribution of genomic selection to cassava breeding, it is important to distinguish between selection of parents versus selection of varieties. In the selection of parents, the objective is to identify clones that will transmit superior alleles to their progeny while in the selection of varieties, the objective is to identify clones with superior performance themselves. Only additive genetic effects contribute to the former (called the breeding value) while dominant, epistatic, and higher order interactive effects contribute to the latter (called the genotypic value). Genomic prediction models exist for both cases: linear additive models versus models that incorporate interaction rules or epistatic relationship matrices. Preliminary evidence (see Objective 2 below) suggests that models to predict genotypic values have higher correlation to the phenotype than do models to predict breeding value. Therefore, we will target use models for each purpose.

2

Page 6: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

Overcoming the Obstacles

Objective 1. Fast Cassava: Improved Reproduction

Intense scientific effort in recent years, largely conducted in the model organism Arabidopsis, has revealed that initiation of floral and inflorescence meristems is controlled by a network of signaling molecules that integrate factors including photoperiod and temperature. It is now established that Flowering Locus T (FT) encodes the florigen factor, which moves through phloem from leaves to shoots and stimulates floral meristem initiation (Giakountis and Coupland, 2008; Zeevaart, 2008). A fundamental understanding of hormonal, environmental and genetic factors that regulate flowering has paved the way for the development of improved protocols for induction of flowering in species with delayed and variable flowering. For example, perennial woody species, which normally spend several years of juvenile growth before flowering, have been induced to flower in the seedling stage when given artificial induction with FT molecules (Bohlenius et al. 2006; Kotoda et al. 2010; Zhang et al. 2010).

A number of different strategies to induce flowering in cassava will be tested, but all are based on the hypothesis that, by provision of an appropriate hormonal signal, or ectopic exposure to FT protein, we can reliably induce flowering in clones that would otherwise remain vegetative or flower poorly. Given that the need for flower induction is only for a subset of plants that are to be crossed, it is feasible to employ intensive, highly managed protocols. The strategies include: a. Hormone sprays and/or daylength treatments b. Grafting of a branch from a clone that flowers vigorously and produces an abundance of the

phloem-mobile floral stimulus.

The general approach will be to develop protocols for cassava floral development that can be used on a moderate number of plants in a crossing-block nursery. The goal will be to rapidly identify methods by which cassava floral initiation and high seed set can be achieved, starting with several strategies and converging on those that provide the best, most reliable outcomes. There is evidence from other systems that such an approach is likely to be successful. Gibberellin and cytokinin hormones and photoperiodic factors are now known to act up-stream in the floral stimulus signaling network. Previous work in cassava provided evidence that exogenous gibberellin can have a floral stimulatory effect (Tang et al. 1983), though, as in other plant species, work is needed to optimize the timing, dosage and means of delivery of such treatments. Transgenic studies have shown that cassava genes for major floral signaling components are functional orthologs of the model plant Arabidopsis, and therefore we can use published studies in other species as a guide (Adeyemo, 2009; Adeyemo et al. 2011).

Cassava can be readily grafted, and stock plants that flower profusely have been found in breeding programs. The proposed work follows from a large body of classical research which demonstrated that florigen (now known to be FT protein) is graft transmissible and capable of eliciting flowering in a recipient shoot that has not itself been induced to flower. Grafting has been successfully used to induce flowering in mutants and recalcitrant species. It is expected that the best host cassava plants for floral induction will produce high amounts of FT protein, which will be transmitted to the recipient grafted branches via phloem transport. This assumption will be monitored by production of antibodies to cassava FT protein for use in immunoassays of recipient branch tissues.

Also important to the success of these strategies are protocols to ensure high seed set. Such protocols will be developed based on principles of source/sink balance of photosynthate and hormonal regulation (De Jong et al. 2009; Setter and Parra, 2010).

Objective 2. Tools for Genomic Selection Implemented

Because of cassava’s long breeding cycle, it has been appreciated for some time that the use of molecular markers could allow selection during the seedling stage, saving cassava breeders time and money. The Generation Challenge Program (GCP) and the BMGF have both, in recent years, funded projects for

3

Page 7: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

development of genomic resources for cassava, largely in the context of QTL and gene discovery. For example, the GCP has Subprograms called “Genomics towards gene discovery” and “Trait capture for crop improvement”, which are aimed at dissecting genetic mechanisms and developing trait-specific markers that can be deployed in molecular breeding.

Such approaches have two drawbacks. First, the discovery and development of trait-based markers is time-consuming and expensive. Second, while use of such markers can be effective when a small number of loci underlie most of the variation in a trait, it is impractical for many traits, such as yield, that are quantitative. Genomic selection (GS) is a methodology that uses molecular markers in a different way: rather than trying to identify particular markers that are significantly associated with a trait of interest, it simply uses all the information in a large set of genome-wide markers (Heffner 2009). Based on both genotypic and phenotypic information from a “training population,” GS develops a statistical model that can predict, based on genotypic data alone, breeding values in an experimental population. In a breeding program, the experimental population consists of new breeding lines in the pipeline for clonal propagation and field testing. A prediction model can be developed for any trait of agricultural importance. In addition, GS can be used to improve traits that are critical in the breeding process itself; as discussed in Objective 1, flowering time, flowering rate, and seed production are all aspects of cassava biology that constrain breeding success.

Compared to the current system of clonal propagation prior to evaluation, GS would allow accurate evaluation at the seedling stage and evaluation of many seedlings, limited only by the program’s capacity for DNA extractions and funds available for genotyping. Lines selected as seedlings, based on their genotypes, could be grown to flowering and crossed in year two, providing new segregating progenies for another round of selection. This ability to efficiently evaluate a seedling during year one, rather than waiting until year six, is the primary target of this proposal. Recurrent selection, i.e., multiple cycles of crossing and selection prior to variety release, would become feasible, resulting in a greatly increased rate of genetic gain.

In addition to accelerating the breeding cycle, the accurate evaluation of a much larger number of seedlings would allow for more ambitious breeding goals, such as targeting of diverse agroecological zones. Phenotypic evaluation in multiple environments would also be more efficient, because not every clone would need to be tested in every environment. That is because the GS model estimates the effects of alleles, not clones; as long as an allele is well represented across environments, its effect can be estimated accurately. Another consequence of a dramatic acceleration of the breeding cycle would be lower cost of producing superior new varieties, making them more likely to be adopted by farmers. The more frequent release of improved varieties is a necessary component for the establishment of a sustained commercial “seed” system. Since cassava is clonally propagated, farmer motivation to purchase new seed will depend on availability of new and better products.

Implementation of GS has several components: a. Development of a database (described in Objective 3). b. Statistical tools. The development of new GS prediction models is not an activity of this project; a variety of methods that perform well are currently available. Because the optimal method for a given trait depends on its (unknown) genetic architecture, each of the major classes of GS methods will be made available in the database tools (see Objective 3). Testing the performance of different methods for different traits will be an important activity carried out by PhD students as part of their training (see Appendix D). c. Implementation of GS in African breeding programs. For each breeding program that undertakes the implementation of GS, a number of activities are required:

i. Establishment of a training population (genotype and phenotype information).

4

Page 8: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

ii. Ongoing training in database use: curating, uploading, accessing data and use of statistical tools. iii. Development of prediction models for traits of interest. iv. Design and execution of crosses. v. DNA extraction from progeny and out-sourced sequencing. vi. Processing for quality control, SNP extraction, and upload to the database. vii. Application of prediction models for traits of interest, selection of next generation of parents. viii. Ongoing phenotypic evaluation of germplasm to update and evaluate the prediction model.

As of the writing of this proposal, some of these activities are already well under way. IITA already has a training population, and both NRCRI and NaCRRI have begun to assemble the necessary materials (see Appendices B and C).

Methods for some of these activities are still being optimized, for example: What is the most cost-effective method for DNA extraction? What is the best method for construction of reduced-representation libraries for genotyping by sequencing? What is the best algorithm for calling genotypes that maximizes accuracy and minimizes missing data? We have all three of these processes working (see Appendix B), but are actively exploring approaches that will produce better quality data. Details of the genotyping pipeline (logistics and costs) are presented in Appendix C.

d. Ongoing training in GS methodology and improved plant breeding practices. Training in the principles and practice of GS is a key activity of this project. As programs advance, breeders will need to make decisions about updating their training populations and how best to allocate phenotyping resources. Our goal is to develop expertise in African scientists so that they can, in turn, train other scientists in their own programs and help to initiate GS in other programs. This will be accomplished through formal workshops as well as informal consultation available throughout the duration of the project. Training of PhD and MSc students are activities of Objectives 2, 4, and 5. In addition, there will be need for increased emphasis on standardized phenotyping and reliable sample-tracking, both of which will become more critical in the context of GS. The tools available through GCP’s Integrated Breeding Platform (IBP) will be complementary to this project. Our budget includes funds for electronic field books developed for use with the IBP.

A strategy for implementation of GS has been designed for each breeding program, including the number of crosses to make, the number of seedlings to genotype, the strength of selection to apply (see Appendix C). We anticipate achieving two to four rounds of GS by the end of the five-year project, the larger number at IITA, which began its program under the pilot project (see Appendix B). These strategies are a starting point, and may evolve over the course of the project, through consultation between the Cornell GS and database teams and the breeders. As the outputs from Objective 1 become available, we will incorporate these improvements into the breeding cycle. This may lead to a strategy that will involve continuous production of smaller numbers of seedlings, to distribute the genotyping workload more evenly throughout each year and to make better use of limited resources.

Preliminary Results for Objective 2

In April 2011, we (JLJ and MTH) received funding from the BMGF to carry out a pilot project of genomic selection in cassava. The primary objective of this pilot study was to develop a set of molecular markers for cassava and to estimate expected genomic prediction accuracies in African breeding germplasm. This work is being done in collaboration with Peter Kulakow, Melaku Gedil, Moshood Bakere, and Ismail Rabbi at IITA, Nigeria. A detailed report of progress toward the milestones associated with this objective is given in Appendix B. Here, we summarize that progress:

Milestones 1.1.1 to 1.1.3: Identification of an appropriate collection of germplasm.

5

Page 9: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

IITA has a large Genetic Gain (GG) population for which there are many years of historical phenotype data. Our validation population consists of those members of the Genetic Gain population for which plant material still exists (i.e., DNA can be extracted). In all we have extracted DNA and genotyped 623 of the GG clones. Historical phenotypic data from 37,752 plots from GG trials and 11,266 plots from Uniform Yield Trials (UYT) was analyzed. As of the writing of this report, we have both genotype data and good phenotypic records for 512 clones. Work to increase the number of clones with curated phenotypic data is ongoing.

Milestones 1.2.1 and 1.2.2: Selection of a genetic marker system.

The number of SNP genotype assays that are available for cassava is quite small, and the assays are likely to suffer from ascertainment bias. Furthermore, work was already planned for a large genotyping-by-sequencing (GBS) effort in cassava, as part of the Cassava Genomics project funded by the BMGF. GBS uses a bioinformatic pipeline to call SNPs from next-generation sequencing of reduced-representation, bar-coded libraries. Because of the efficiency, high marker number, low cost, and lack of bias in GBS data, we chose GBS as our marker system. We are using a protocol developed in the Buckler lab at Cornell University (Elshire et al. 2011).

Milestone 1.3.1: Extraction of DNAs and scoring of markers.

DNAs were extracted at IITA and sent to Cornell University, where PstI libraries were constructed and sequenced using the Illumina HiSeq. PstI was chosen because the low fragment number results in the higher read depth necessary to score heterozygous genotypes accurately. A bioinformatics pipeline designed for maize was modified for cassava and used to extract ~5000 high quality SNPs.

Milestones 1.4.1 and 1.4.2: Data analysis.

Population genetic analyses so far consist of estimates of the expected level of linkage disequilibrium (LD) across the genome. These analyses suggest that, for this training population, useful LD is likely to decay between 10 and 50 kb. Given the size of the cassava genome (760 Mb), increasing the number of markers to ~20,000 should improve GS prediction accuracy.

Genomic prediction accuracies have been estimated for 17 traits, which range in plot-basis broad-sense heritability from 0.04 to 0.64. Three different genomic selection models were tested that capture: 1) additive genetic effects evenly spread across the genome; 2) additive and dominance genetic effects evenly spread across the genome; 3) large effects and gene interactions.

The accuracies of these models were assessed by ten-fold cross validation as follows. The dataset of 512 lines was randomly split into ten subsets or "folds." Each fold in turn was predicted by the genomic selection model while excluding the fold from the training population. Accuracy was calculated as the correlation between the observed phenotype and the prediction. These preliminary analyses indicate that prediction accuracies for cassava are similar to those for other species given the training population size available. Thus, the unique aspects of cassava (the fact that is a root crop, is clonally propagated, and has high levels of heterozygosity) are not negatively affecting prediction accuracies in ways we do not understand.

Given that we anticipate that genomic selection can accelerate the breeding cycle of cassava between two and a half and five fold (for two and one year genomic selection breeding cycles, respectively), even the accuracies obtained thus far indicate that rates of improvement under genomic selection will be more rapid than under phenotypic selection. Furthermore, going forward, we believe that our current accuracies are a lower bound on what is achievable.

Objective 3. Database Developed and Curated

Implementation of genomic selection requires the acquisition and management of large amounts of genotypic and phenotypic data, and the availability of statistical tools to make use of the data. If cassava breeders in Africa are to have the resources to make breeding decisions based on genotype data, they need

6

Page 10: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

access to those data in a user-friendly, reliable database. Genotype data will be applicable in all locations where a cassava clone is tested, so data sharing through a secure and well curated database is necessary for international sharing of germplasm and application of genomic selection. The proposed database infrastructure will allow breeders to manage the Genomic Selection (GS) process through the website by uploading data for their breeding populations, including phenotypic and genotypic datasets. Compatibility with the Integrated Breeding Field Book (IBFB) developed by GCP will allow use of the IBFB for design of evaluation trials and field plans, and to capture and verify evaluation data for upload into the cassava GS database. Linking phenotypic data collection with the Integrated Breeding Platform (IBP) should provide quality control of the data that is uploaded. Quality control of phenotypic data will be performed using simple statistical analyses for the detection of outliers. Because the cassava breeding program at NRCRI is one of the “user cases” for which the IBP is currently being implemented, we will have the opportunity to test this compatibility at the early stages of database deployment.

GS algorithms will be directly integrated into the website, allowing definition of training populations by selecting from appropriate plant accessions in the database, and calculation of predictive models on the fly. Breeding values will be calculated by applying the GS models to any relevant accession in the database, and the calculated breeding values will be stored in the database for each accession. Genotypic information will be loaded through a pipeline based on next-generation sequencing (NGS) of cassava germplasm that is currently in development (see Appendix B).

The cassava website will also contain a number of other features: A genome browser with cassava genome information (updated by and hosted at the DOE-JGI Phytozome portal), a database of predicted and experimentally characterized cassava genes based on the community curation model developed and successfully deployed at Solanaceae Genomics Network (SGN, http://solgenomics.net/), a comparative map viewer for genetic maps, sequence specific tools such as sequence homology search (BLAST), sequence alignment tools, and powerful search interfaces for all data types, such as accessions and pedigrees. There will also be an electronic forum for breeders and scientists to exchange information, email lists, and Wikis, as well as links to twitter feeds and IRC channels (real-time Internet text messaging or synchronous conferencing) that will enable communication between project members. The software infrastructure will be based on software from SGN, which has developed genome and breeder specific databases for over ten years. Based on this, a demo site has been implemented, at http://cassavabase.org/, including a breeders’ toolbox which will be expanded to accommodate GS. A cassava phenotypic test dataset has already been successfully uploaded. The site is currently password protected because the data are not yet published. The password is available upon request. All code is released as open source code on the web (http://github.com/solgenomics/).

Initially, the database infrastructure will be located at Cornell. During the course of the project, database hosting will be moved to IITA, initially as a mirror site. Developers from IITA will join developers at BTI-Cornell for 6-12 month internships. By the end of the project, the group at IITA will host the production server, and BTI-Cornell will host the mirror. The software lead will still be provided by BTI-Cornell until the end of the project.

Objective 4. Germplasm Developed

Cassava is native to Latin America, and was introduced to Africa by the Portuguese in the 16th century. While some new germplasm has been provided to African programs from the International Centre for Tropical Agriculture (CIAT) in recent years, African germplasm represents only a subset of the diversity present at the center of origin. Given the challenges facing African breeders, particularly in the areas of virus- and pest-resistance, it is important to maximize the phenotypic and allelic diversity of breeding material. This objective ties in with the goal of conservation (Objective 5.b.iii): currently, new germplasm may be lost due to inadequate resources for its maintenance. Together, Objectives 4 and 5 will increase the genetic resources available for population improvement.

7

Page 11: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

Incorporation of exotic germplasm into a GS breeding program must be done systematically. We propose to establish “pre-breeding” populations– progenies of crosses between elite, locally adapted clones and Latin American germplasm– at NRCRI and NaCRRI. IITA and NRCRI will cooperate on this objective, as their breeding material is quite similar. Promising progenies will be selected for incorporation into the local breeding program. Development and evaluation of the NaCRRI pre-breeding population will constitute a thesis projects for one of the PhD students (see Appendix D). Details of the pre-breeding plan are presented in Appendix C.

Both NRCRI and NaCRRI have submitted proposals under the Brazil-Africa Market place for obtaining Brazilian M. esculenta germplasm. NRCRI will be receiving a large number of hybrid seed from CIAT for the Harvest Plus program to introduce high carotenoid germplasm from Latin America into the Africa; these materials could be incorporated into this pre-breeding effort. Some of the additional Latin American parents for these pre-breeding populations will also be selected based on genetic divergence from existing African lines, and from each other, using data from the Cassava Genomics project and funded by the BMGF. The current plan for the Cassava Genomics project includes sequencing of 248 clones from CIAT, provided by Morag Ferguson through a GCP-funded project.

Objective 5. Infrastructure Developed and Plant Breeders Trained

The transition to genomic selection from phenotypic selection will require development of a number of capacities at the breeding stations, particularly at NaCRRI in Uganda, which is the base of the Cassava Regional Centre of Excellence (CRCoE). These needs will be met through training and infrastructure investments, which are described in detail in Appendix C. These investments include:

a. Strengthening regional human resources in the East and Central African sub-region by training of graduate students.

i. Training of 8 MSc students, 2 each from Uganda, Kenya, Tanzania, and Rwanda. All these are key cassava producing countries in eastern Africa but suffer from very low manpower deployment to cassava research. The training will be conducted at Makerere University (Uganda), which runs a Master of Science in Plant Breeding and Seed Systems. This program involves intense structured coursework followed by thesis research. Already the course menu has proved effective in providing for core competencies, supporting disciplines, and additional skills (such as project planning, social research methods and “soft-skills” for the first cohort) —competencies required for the contemporary African plant breeder. Both theory and “hands-on” application are emphasized. See Appendix E for description of MSc training plan.

b. Strengthening physical capacity for research. i. Enhance the capacity for high-throughput DNA extractions at NaCRRI. ii. Establish a nutrient profiling facility at NaCRRI. To breed for increased nutritional value of

cassava, there is a need to establish capacity for comprehensive nutritional profiling. The facility will provide routine laboratory analyses for carbohydrates, lipids, proteins and other biochemicals related to nutrition. In addition, the facility will function as a training centre for different groups of people including academia, food processers and consumers.

iii. Genetic conservation. The impact of disease epidemics on cassava genetic resources is a cause for serious concern in Uganda and surrounding countries. Loss of genetic resources could seriously affect food and nutrition security. Field based conservation presents major drawbacks. This therefore calls for suitable alternatives, of which tissue culture based techniques are of highest priority. In particular, cryopreservation saves costs, time, space, and avoids catastrophic disasters that are commonly encountered in field-based gene banks. We propose to build capacity in cassava conservation using tissue culture protocols.

c. Enhancing regional and global research connections. These connections will include hosting of visiting scientists and breeders to transfer specific knowledge in emerging areas. The emphasis will

8

Page 12: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

be on passing knowledge and skills to local scientists and students, ensuring sustainability of such knowledge after the visiting scientist has gone back to the home institution.

d. Strengthening NaCRRI communications and leadership capabilities i. Establishing a broader bandwidth internet connectivity (also at NRCRI) and e-library service. ii. Enhancing leadership research capabilities. The design of this scheme will include peer mentoring where people who are of the same level from different institutions act as mentors (ees) to each other, while providing regular feedback to a senior mentor to the duo. Visits to demonstrated centers of excellence in cassava research will be arranged i.e., Vietnam, Embrapa, CIAT, IITA and the Central Tuber Crops Research Institute in Trivandrum-Kerala, India.

Objective 6: Biotechnology/biosafety education and awareness.

The focus of this proposal is on improvement of cassava through selection of naturally-occurring genetic variation. However, it is likely that there are important traits (e.g., nutritional quality) for which such natural variation is inadequate. In such cases, which are not limited to cassava, agricultural biotechnology can be an important tool (Asian Development Bank, 2004; Cohen, 2001; United Nations Economic Commission for Africa, 2002). Yet there is a highly polarized debate about biotechnology going on in Africa and elsewhere. Modern biotechnology is a victim of relentless criticism, misinformation and outright falsehoods being foisted on the public; at the same time, there is also considerable hype about biotechnology. The public is constantly bombarded with all sorts of information, causing confusion, hampering development of effective public policies, and delaying the implementation of biotechnology. This public campaign for and against biotechnology has increased the regulatory burden on many benign biotechnologies and products, and has a telling effect on the development of biotechnology itself (Shantharam, 2004; Shantharam and Auberson-Huang, 2004). The goal of this objective is to address this problem through a program of public education and awareness, and through capacity development in communication technology and biotechnology policy:

a. Establishment of a NARO Hub for biotechnology/biosafety education and awareness. This resource hub will serve as a center for biotechnology policy development and analysis, for dissemination of accurate information, and as a forum for discussion of the priorities, benefits and risks of modern biotechnology in the national interest of Uganda. Farmers and growers and consumers will be the most important sounding board for policy makers and scientists. There will be regular interactive sessions between the three groups and a regular channel of communication will be established so that these stakeholder groups will serve as an important source of feedback from the general public for the development of appropriate biotechnology. The hub will be established by NARO under the guidance of an experienced consulting firm to be identified later, and will be staffed by three professional people including a PhD level scientist. Activities have been planned to create biotechnology awareness in the following stakeholder groups (details of these activities can be found in Appendix C):

i. Policy makers and administrators: policy seminar trainings on current and emerging issues in biotechnology.

ii. Scientific community: seminars or workshops to sensitize them about the social, economic, ethical and commercial issues surrounding biotechnology; scientific workshops, seminars and symposia.

iii. Media: media workshops, briefings, and press releases about the latest developments in field of biotechnology and also the priorities of the Government.

iv. Medical and health community: printed materials, videos, annual exhibitions and workshops to create general awareness about biotechnology.

v. Farmers and growers will be the top most stakeholders of the center. They will be regularly hosted at the center and other laboratories and facilities to demonstrate how biotechnology works for them.

9

Page 13: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

vi. Consumer groups will be the second most important group of stakeholders that will be targeted under this initiative. They will be regularly polled to solicit their views and opinions for policy and program development.

vii. The public: regular lectures by local and visiting scientists and biotechnology experts; brochures, pamphlets and pocket guides to provide basic facts on biotechnology; annual biotechnology exhibitions, touring exhibits and videos and film shows to inform the public about all aspects of biotechnology.

b. Capacity development in information and communication technology, and biopolicy. A PhD graduate student will be recruited to undertake training in (i) information packaging and communication, (ii) information retrieval, database management and privacy codes, (iii) website design, launching and operation, (iv) e-conferencing, (v) regulatory biopolicy, and (vi) basic bioinformatics. In addition, one technician will be trained in general electronics and equipment mechanics at advanced diploma level for routine maintenance and advice on laboratory resources.

Objective 7. Enhancement of GS through Cassava Genomics

Genotyping-by-sequencing (see page 6) provides a cheap and effective method to generate between 5 and 20 thousand SNPs in a given organism. This technology can be complemented by including a whole-genome survey (WGS) of SNPs generated from alignment and statistical analysis of next-generation WGS reads that are being obtained as part of current BMGF-supported Cassava Genomics project. WGS reads would likely generate ten to a hundred times more SNPs than GBS (or at least one per gene). We propose to develop several direct benefits from WGS sequence.

a. Imputation of missing haplotype information between GBS SNPs. The higher density of WGS-derived SNPs will complete haplotype information in the spaces between GBS-derived SNPs. Imputation of this missing data has been shown to improve resolution of genome-wide association studies (Guan and Stephens, 2008) and may improve genomic selection accuracies. b. Genomic analysis platform. Visualization of both GBS- and WGS-derived SNPs in their genomic loci and analysis of neighboring genes will be possible at the JGI’s comparative plant genomics portal (Phytozome). Such analysis would potentially allow assignment of a gene(s) to the trait-causing polymorphism: Phytozome includes a gene page for every gene displaying predicted functional information. Furthermore, widening the search to gene pathways, paralogs, gene families etc. using built-in tools at Phytozome may reveal other GBS-SNPs that are likely to be of high interest in breeding. The Phytozome and cassavabase portals will be tightly integrated, ensuring users can navigate efficiently and intuitively between the two platforms, helping breeders rapidly investigate the mechanistic basis of traits of interest. c. Improve GS prediction accuracy. The integration of pathway information such as that discussed above, and/or orthogonal gene density and other easily-calculated information, with the statistical modeling process used in genomic selection can improve prediction accuracies. In humans, a prediction model using SNPs in or close to genes was shown to be more accurate than one using SNPs far from genes (Yang et al. 2011). Analysis of genes close to GBS-SNPs using "probabilistic functional gene networks" derived from literature and database searches (Lee et al. 2010) can indicate the probable biological processes in which they are involved. Results from this analysis could in turn provide trait-specific weights for each SNP that might increase GS prediction accuracies. These approaches that link scientific knowledge development across many fields with genomic prediction models are an exciting growth direction for such models. These efforts will form the basis of a research PhD to be carried out at Cornell (see Appendix D). d. Lastly, the construction of ancestral haplotypes by mapping WGS reads from resequenced varieties to the reference AM560-2 genome can resolve wild and elite complements of genotypes in varieties of interest. Revealing the wild-cassava-derived portions of the genome (introgressions) will allow their association with desirable traits. More generally these analyses will identify non-recombined genome

10

Page 14: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

segments (from whatever source) that currently segregate within cassava breeding populations, enabling allele effect estimation at the segment level rather than the SNP level. Furthermore, analysis of WGS-SNPs can reveal which SNPs would segregate in a cross between two cultivars. This provides help in planning effective crossing strategies. Tools for predicting segregating SNPs in arbitrary crosses, as well as for browsing and searching ancestral haplotypes, will be incorporated into Phytozome.

Objective 8. Project Management and Communication

The project will be administered and managed by the Project Management Unit (PMU) of the Office of International Programs in the College of Agriculture and Life Sciences (IP-CALS) at Cornell University in Ithaca, New York. IP/CALS has extensive experience with donor-funded projects (including the Durable Rust Resistance in Wheat project also funded by BMGF). IP-CALS’ long and successful donor-funded project experience will ensure timely execution of all subcontracts and satisfaction of all BMGF reporting requirements.

Efficient, transparent, and unbiased project management will be essential to achieving the NGC project goals. The PMU in IP-CALs has dedicated staff that will contribute to the NGC project and will recruit outstanding new staff to fill leadership positions specific to this project. Project management activities include the financial, administrative, legal, and IT services necessary to administer the project. In addition the PMU will foster communication between the project partners and will handle all reporting to BMGF. The PMU will manage the project’s global access and gender components.

External guidance will be provided to the project through an External Project Advisory Committee (EPAC). Members will be mutually acceptable to BMGF and Cornell and will serve for the duration of the project. The EPAC will convene at least annually and will serve in an advisory capacity to BMGF and Cornell. The EPAC will convene in connection with annual project meetings that include participating scientists. The Project Management Unit will also host a larger open cassava project meeting in Years 2 and 4. This initiative is a follow-on to the cassava convenings that BMGF held in 2011 and 2012 in San Diego.

Having adequate communications and bandwidth to key field stations is essential. Currently both Umudike and Namulonge suffer from inadequate internet bandwidth and a poor communications infrastructure. As part of the project, these key stations for cassava research will be upgraded and reinforced from an IT perspective to allow basic and reliable internet and communications to be performed there (Objective 5). This will include review of their infrastructure, procurement of appropriate bandwidth and equipment, its installation and support.

Special initiative: Collaboration for Cassava Germplasm Exchange. There is a need for increased and simplified exchange of germplasm between Latin America and Africa (see Objective 5). In particular, clones from Latin America need to be propagated in an environment that is disease-free for pre-breeding activities with elite African germplasm; direct shipment to East Africa will result in major losses. An initiative to address this concern would include partners at CIAT and Embrapa and either USDA-HI or a UK-based center as the disease-free exchange location.

Special initiative: Gender Initiative for Cassava Breeding. Women are the cornerstone of cassava production and are responsible for growing, harvesting and processing cassava across the African continent. Cassava breeding would greatly benefit from a better understanding of specific traits that are important to women cassava farmers. More specifically for this project, research into ‘gendered traits’ could yield information on traits that can then be integrated into GS schemes at IITA, NaCRRI and NRCRI developed through this project. This special initiative seeks to fill the gap in knowledge regarding ‘gendered traits’ in cassava, and aims to mainstream these traits into GS and conventional breeding schemes in Africa. This project also supports the career development of female cassava breeders and researchers in Africa, through Female Scientist Career Development Support funds to sponsor female researcher and breeders’ attendance to conferences, workshops, short courses or meetings. This will be

11

Page 15: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

supplemented with research work under the gender special initiative that examines barriers that women cassava researchers face in developing their careers in Uganda and Nigeria, and use these findings to develop award initiatives to overcome these barriers. This special initiative will be further developed and milestones will be determined after the grant agreement has been signed.

The Director of IP/CALS, Dr. Ronnie Coffman, will serve as the Principle Investigator of the project and will supervise the coordinating staff based in IP-CALS.

IV. Alignment with Foundation Strategy:

This project aligns with the BMGF’s goal of Agricultural Development. More specifically, it “uses science and technology to develop crops that can thrive.” Cassava is a staple food for many poor farmers in sub-Saharan Africa, but yields are low compared to yields in Latin America and Southeast Asia. Genetic improvement is a key component of any plan to increase yields sufficiently to break the cycle of hunger and poverty.

There are several grants funded by the BMGF and the Generation Challenge Program (GCP) with goals complementary to those of this project. In particular, the Cassava Genomics project will provide critical data for selection of exotic germplasm, measuring linkage disequilibrium, estimating needs for marker density, and other benefits that are described in Objective 8. Currently, we are collaborating with the Cassava Genomics project for development of the Genotyping-by-Sequencing (GBS) methodology that will be used for collection of genotypic data for this project. The BMGF is also funding research on virus resistance in Eastern Africa; any resistant germplasm identified by that project can be integrated into the Ugandan and Nigerian GS programs. Similarly, any germplasm identified by GCP grantees working on drought or disease tolerance can be integrated into GS programs. This project complements others that are using a transgenic approach, as it exploits only naturally occurring variation. For traits that lack sufficient natural variation to respond well to selection (e.g., protein or micronutrients), cultivars developed via the GS pipeline could be further improved through transgenic approaches.

V. Sustainability and Scalability:

From the standpoint of human capacity, our goal is to achieve self-reliance of African scientists by the end of the project (five years). Much of this project is essentially modular, and inherently scalable: as each new breeding program initiates work in GS, its breeders can learn from scientists in the established programs. Technology to improve flowering and seed set can easily be transferred. The database can be used by any program with a good internet connection. While the database will be developed at Cornell, it is our intention that, by the end of the project, database maintenance and development will be in the hands of African staff. Linkages with other databases, such as the GCP Integrated Breeding Platform (see Objective 3), will also need to be developed as the project evolves. Two of the Cornell scientists on this project (LM and MH) will be meeting with staff from the IBP in Kampala, Uganda, this June, at the Global Cassava Partnership meeting, to discuss ways to connect our projects and tools.

To accomplish these goals, several PhD and MSc students will be trained in the course of this project, establishing expertise in bioinformatics, GS, and good plant breeding practices (standardized phenotyping, record-keeping, etc), see Appendix D and Appendix E for details. By embedding the PhD research in ongoing research programs at students’ home institutions, these trainees are likely to remain in their organizations after completing their studies. In the case of NaCRRI, this expectation is made explicit.

Aside from human resources, what will be needed to allow the continuation of this work, and an increase in the number of participating breeding programs, are resources for phenotyping and genotyping and adequate internet services. Where these are limiting, new programs will not be able to participate without additional funding.

12

Page 16: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

VI. Implementation, Intended Results, and Results Measurement:

A. Results Framework (see Appendix A)

B. Project Plan The ultimate beneficiaries of this project are the cassava farmers of sub-Saharan Africa, who will receive improved cassava clones having higher and more stable yields from season to season. We chose to focus this project on three institutions, IITA, NRCRI, and NaCRRI, because it is their mission to serve these beneficiaries, and because they have the institutional capacity to succeed and serve as models. The core of the project is to transfer expertise from Cornell University to our partners in these cassava breeding programs; the project has been designed in close consultation with them.

Vision of success: At the completion of this research, three African cassava breeding programs (IITA, NRCRI, and NaCRRI) will be using GS routinely and will have the scientific capacity to optimize the GS breeding scheme, maintain and develop the underpinning database, manage an outsourced genotyping process, and train other cassava breeders in the methodology. Ultimately, the most important outcome will be an increase in the rate of genetic improvement of cassava, i.e., increase in yield per unit time due to breeding. Specifically, outcomes of the breeding scheme will be:

1. Production of seedlings from parents selected for their high predicted performance in targeted agroecological zones.

2. Outsourcing of seedling genotyping to obtain rapid, low-cost, high density DNA marker data. 3. Efficient and functional management of data in a database housing genotypic and phenotypic data

and allowing easy connectivity to analysis tools. 4. Prediction of seedling performance resulting in selection for clonal propagation and distribution

to targeted zones for phenotypic evaluation. 5. Recovery of phenotypic data from distributed trials in view of updating prediction models for the

trials’ zones.

In addition to the outcomes that are directly related to improved breeding of cassava, other outcomes will contribute indirectly to that goal:

1. Improvements in reliability of cassava flowering. 2. Strengthened capacity for sustainable cassava research through investments in human resources

and infrastructure. 3. Enhanced global and regional research connections. 4. An improved public climate for consideration of biotechnological approaches.

C. Analysis While this is an ambitious project and should have a significant impact, many of the methods and technologies are already in use, and need only to be transferred, rather than developed, as discussed below. One exception to this is the objective for flower induction (Objective 1), which will be experimental and thus riskier than the other objectives, in terms of both experimental outcomes and practical implementation in a breeding program. This is because the strategies proposed in this objective are based on research that has been conducted in model plants and a variety of other species, but that has not yet been replicated in cassava. To offset the risk that cassava will respond differently than other species, our proposal includes several parallel, alternative strategies toward achieving the desired outcome. Success of these strategies will be assessed by measuring the levels of signaling molecules that are markers of a response to the floral-stimulating treatments. As testing reveals which technologies are most promising, effort will be focused on their development for the breeding community. Successful impact requires that just one of these technologies be superior to current practice.

13

Page 17: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

The second exception is the GBS library construction and genotyping protocols which are still being optimized. Our strategy assumes that we will be successful in developing libraries that generate more SNPs than the PstI libraries that we have used so far. This assumption does not seem unreasonable, but it could take longer than expected. Furthermore, we are planning to reduce genotyping costs by using an imputation approach for the large numbers of progeny that we will sequence at lower coverage (see Appendix C). This will require imputation of the sites that are not scored, based on the higher coverage information of the parents. Again, there is no reason this should not ultimately be successful, but we have not yet begun to use these methods and cannot predict precisely when they will become routine. However, the modifications we are implementing have been used successfully by other labs, and we are confident that an improved method will be in place before the first batch of progeny DNAs arrives.

The rest of this project rests solidly on well-established practices. To be specific: 1. GS is already in routine use at Cornell University (and elsewhere). 2.   The database team will use a framework that has already been in use for ten years and reflects

continual improvements made over that time. 3. The breeders at IITA, NRCRI, and NaRCCI already have large-scale cassava breeding programs

in place, including systems for phenotyping and record-keeping. Furthermore, they all have molecular labs and have the capacity to extract 10,000 DNA samples/year for this project.

The pace of activities proposed for implementation of GS is determined largely by the biology of cassava—how long it takes to flower, make crosses, set seed, propagate vegetatively, etc. Our timeline is based on published information (IITA, 1990) and from communication with Peter Kulakow and Ismail Rabbi, our collaborators at IITA, Yona Baguma and Robert Kawuki, our collaborators at NaCRRI , and Chiedozie Egesi and Emmanuel Okogbenin at NRCRI. In coordination with that biology-based timeline are molecular and bioinformatic activities. For example, tissue from progeny of crosses will be collected for DNA extraction during the seedling stage. The process of DNA extraction and genotyping will take place during the period of six to twelve months required growth to the next round of propagation (a one-year and two-year cycle have been designed and are shown in detail in Appendix C). Our plan includes infrastructure and personnel dedicated to this project so that those molecular and bioinformatic activities will be completed in time to run prediction models on the seedling genotypes, allowing selection of parents for the next set of crosses. This timing is less critical in cassava than in many other crops, e.g. cereals, where the flowering period is short and crosses must be made in a window of only a few days.

Finally, there is a lot of momentum in the cassava community these days. The Global Cassava Partnership Meeting takes place in Uganda in June. The African Development Bank has recently awarded substantial funds for ''Support to Agricultural Research for Development of Strategic Crops in Africa'', one of which is cassava; IITA is one of three institutions that will implement the project. Awareness of the importance and promise of cassava is growing, thanks in part to the efforts of BMGF. Ismail Rabbi (IITA), Chiedozie Egesi (NRCRI), and Robert Kawuki (NaCRRI) spent four weeks at Cornell this spring, getting some training in the statistical side of GS, and providing us opportunities to further develop and refine our plans. Their willingness to make such a significant investment of their time reflects their enthusiasm for this project.

D. Assumptions and Risks As explained above, most of the technology used in this project is not risky. Thus, it is unlikely that we will encounter an insurmountable barrier; however, progress could be unacceptably slow because tasks are not prioritized. For example, seedling evaluation depends on genotypic data, which in turn depends on technical staff at the breeding programs doing DNA extractions and shipping high quality DNA to the sequencing facility. This work must be prioritized so that it is completed in a timely manner. Similarly, we will need to be attentive to the efficiency of the sequencing facility; protracted delays at such facilities can be a serious problem. We have allocated personnel time (Charlotte Acharya, 0.5 FTE) to management of the genotyping pipeline, from breeding program to sequencing facility to data delivery. This activity

14

Page 18: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

will involve regular contact with technical staff in Africa, making sure they have the reagents they need, contacting sequencing service providers as warranted, and ensuring a timely flow of sequence data to the bioinformatics team. Ms. Acharya has been working at the GBS facility at Cornell since its inception and is very knowledgeable of all phases of the operation. We expect the DNA library preparation to continue to be done at Cornell, while the sequencing will frequently be outsourced. The options for sequencing will likely increase over the life of this project.

Another aspect of this project that poses some risks is Objective 1, which involves experimental work to identify an efficient method for induction of flowering. Each of the strategies that will be tested for flower induction in cassava has some degree of uncertainty with respect to efficacy and practical implementation in a breeding program. The strategies are based on research that has been conducted in model plants and a variety of other species, but they have not yet been studied systematically in cassava. It is expected that cassava will behave similarly, but our proposal includes several parallel, alternative strategies toward achieving the desired outcome. This will help mitigate the risk that cassava will respond differently than other species. While it is desirable to have alternate technologies by which cassava flowering and seed set are improved, successful impact requires that just one of them be superior to current practice. Hence, as testing reveals which technologies are most promising, effort will be focused on them toward final refinement and recommendation/extension to the breeding community. A second measure that will be used to mitigate risk is use of diagnostic tests to measure signaling molecules (FT, hormones) that are expected to be found as a result of the floral-stimulating treatments. Given the firm foundation of knowledge we now have regarding flowering, these diagnostics will help guide investigations toward treatments that are most effective and reliable.

In the long term, effective partnerships will be critical to the success of this project. The project aims to transfer significant knowledge and experience to African partners. Therefore we are engaging closely with the breeders and technical support staff based in Africa who will be playing key roles in a successful project. For example, breeders from IITA and Nigeria and Uganda NARS were at Cornell for genomic selection and QTL mapping training in April and early May 2012. We used the opportunity of all being together to also discuss and agree upon responsibilities and norms in the project. Time together will create stronger personal and professional connections and trust. In addition to a connection between Cornell and African partners, a strong spirit of cooperation must exist across programs in sub-Saharan Africa. A solid foundation for this spirit exists already among African scientists. The ability to use GS methods to predict performance in diverse agroecological zones will depend on the sharing of phenotypic data from the breeding programs (both IITA and NARS) covering those zones. Each program has an incentive to share because by doing so data from all programs is leveraged. The scientific underpinning for more rapid gains is that by sharing, individual program data becomes part of a larger source of data resulting in more accurate predictions. Steps toward this vision will be to initially work with groups (e.g., Nigeria and Uganda NARS) that now have higher trust so that we can test the value of central databasing and joint analysis of data. We will also test structuring the database in such a way that data can be held separate and maintain confidentiality.

E. Measurement Many very concrete measures of progress will accumulate in the database: to the extent that breeders are genotyping seedlings and uploading phenotype data, the database will be populated. When those data are used to make breeding decisions, records of analyses, selections, and crosses will also be generated and stored in the database, serving as metrics of progress.

Generation of a prediction model involves cross-validation within the training population to find the model that produces the most accurate predictions. The type of model that performs best is expected to vary among traits and possibly among agroecological zones. While genomic selection has been shown to work well in simulations and in a number of experimental applications, there is still much to learn about optimization of models, design and updating of training populations, incorporation of exotic germplasm, etc. In the course of this project, the practical experience of implementing GS will produce very valuable

15

Page 19: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

data that will address some of these questions. We expect that analysis of these results will lead to several scientific publications concerning the implementation of GS, pre-breeding for cassava, and manipulation of flowering time.

Ultimately, the most important outcome will be measured as the gain in cassava performance. Over the short time period of the project, we will have yield trial data only from the first one or two generations of progeny, depending on whether the one-year or two-year cycle is used (see Appendix C). This is not a large amount of data on which to draw strong conclusions, but a trend should be apparent.

Impact metrics for this project include: — Number of recombination cycles achieved — Increase in breeding values — Number of breeders trained in GS — Number of publications in peer-reviewed journals — Number of clones of breeding material conserved — Number of exotic clones identified with useful alleles

References Adeyemo S. 2009. Molecular Genetic Characterization of Photoperiodic genes in Cassava (Manihot

esculenta Crantz) and attempts to manipulate their expression to promote floral induction, PhD Thesis, Universität zu Köln, Cologne, Germany, 104p.

Adeyemo S, E Kolmos , J Tohme , P Chavariaga, M Fregene, S Davis. 2011. Identification and Characterization of the Cassava Core-Clock Gene EARLY FLOWERING Tropical Plant Biology 4: 117-125

Akano AO, AGO Dixon, C Mba, E Barrera, and M Fregene. 2002. Genetic mapping of a dominant gene conferring resistance to cassava mosaic disease. Theoretical Applied Genetics 105:521–535.

Alicai T, CA Omongo, MN Maruthi, RJ Hillocks, Y Baguma, R Kawuki, A Bua, GW Otim-Nape, J Colvin. 2007. Re-emergence of cassava brown streak disease in Uganda. Plant Disease 91:24-29.

Asian Development Bank. 2001. Agricultural Biotechnology, Poverty, Reduction, and Food Security. A Working Paper. http://www.adb.org/Documents/Books/Agri_Biotech/agribiotech.pdf

Bohlenius H, T Huang, L Charbonnel-Campaa, A-M Brunner, S Jansson, S-H Strauss, O Nilsson. 2006. CO/FT regulatory module controls timing of flowering and seasonal growth cessation in trees. Science 312: 1040-1043.

Breiman L. 2001. Random Forests. Machine Learning 45:5-32. Ceballos H, CA Iglesias, JC Perez, and AGO Dixon. 2004. Cassava breeding: opportunities and

challenges. Plant Molecular Biology 56:503-516. Cohen JI. 2001. Harnessing biotechnology for the poor: challenges ahead for capacity, safety, and public

investment. Journal of Human Development 2: 239-264.  De Jong M, Mariani C, Vriezen WH. 2009. The role of auxin and gibberellin in tomato fruit set. Journal

of Experimental Botany 60: 1523-1532. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, et al. 2011. A Robust, Simple Genotyping-by-

Sequencing (GBS) Approach for High Diversity Species. PLoS ONE 6(5): e19379. Giakountis A, Coupland G. 2008. Phloem transport of flowering signals. Current Opinion in Plant

Biology 11: 687-694. Gopal J, ID Garg. 2011. An efficient protocol of chemo-cum-thermotherapy for elimination of potato

(Solanum tuberosum) viruses by meristem-tip culture. Indian J Agr Sci 81:544-549. Guan Y, M Stephens M. 2008. Practical Issues in Imputation-Based Association Mapping. PLoS Genetics

4:e1000279.

16

Page 20: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

Habier D, RL Fernando and JCM Dekkers. 2007. The Impact of Genetic Relationship Information on Genome-Assisted Breeding Values. Genetics 177: 2389-2397.

Hayes BJ, PM Visscher, and ME Goddard. 2009. Increased accuracy of artificial selection by using the realized relationship matrix. Genet. Res. 91:47-60.

Heffner EL, ME Sorrells, and J-L Jannink. 2009. Genomic Selection for Crop Improvement. Crop Science 49:1–12.

International Institute of Tropical Agriculture. 1990. Cassava in Tropical Africa. IITA, Ibadan, Nigeria. Kawuki, RS. Variation in cassava (Manihot esculenta Crantz.) based on single nucleotide

polymorphisms, simple sequence repeats and phenotypic traits. PhD thesis. 2009. Department of Plant Sciences: Plant Breeding. University of the Free State, Bloemfontein, South Africa.

Kizito E, A Bua, M Fregene, T Egwang, U Gullberg, A Westerbergh. 2005. The effect of cassava mosaic disease on the genetic diversity of cassava in Uganda. Euphytica 146:45-54.

Kotoda N, Hayashi H, Suzuki M, Igarashi M, Hatsuyama Y, Kidou S-i, Igasaki T, Nishiguchi M, Yano K, Shimizu T, Takahashi S, Iwanami H, Moriya S, Abe K. 2010. Molecular Characterization of FLOWERING LOCUS T-Like Genes of Apple (Malus domestica Borkh.). Plant and Cell Physiology 51: 561-575.

Lee I, B Ambaru, P Thakkar, EM Marcotte, and SY.Rhee. 2010. Nature Biotechnology 28: 149-156, Legg JP, B Owor, P Sseruwagi, and J Ndunguru. 2006. Cassava mosaic virus disease in East and Central

Africa: Epidemiology and management of a regional pandemic. Advances in Virus Ressearch 67:355-418.

Setter TL and R Parra. 2010. Relationship of carbohydrate and abscisic acid levels to kernel set in maize under postpollination water deficit. Crop Science 50: 980-988.

Shantharam, S. Introduction to Risk Assessment In: Handbook of Plant Biotechnology. Edited by Paul Christou and Harry Klee. John Wiley and Sons Ltd.

Shantharam, S, and L. Auberson-Huang. 2004. Risk assessment of transgenic plants: Science and public policy. In: Handbook of Plant Biotechnology. Edited by Paul Christou and Harry Klee. John Wiley and Sons Ltd.

United Nations Economic Commission for Africa. 2002. Harnessing technologies for sustainable development, Addis Ababa.

Yang J, TA Manolio, LR Pasquale, E Boerwinkle, N Caporaso, JM Cunningham, M de Andrade, B Feenstra, E Feingold, MG Hayes, WG Hill, MT Landi, A Alonso, G Lettre, P Lin, H Ling, W Lowe, RA Mathias, M Melbye, E Pugh, MC Cornelis, BS Weir, ME Goddard, PM Visscher. 2011. Genome partitioning of genetic variation for complex traits using common SNPs. Nature Genetics 43:519-525.

Zeevaart JAD. 2008. Leaf-produced floral signals. Current Opinion in Plant Biology 11: 541-547. Zhang H, DE Harry, C Ma, C Yuceer, C-Y Hsu, V Vikram, O Shevchenko, E Etherington, SH Strauss.

2010. Precocious flowering in trees: the FLOWERING LOCUS T gene as a research and breeding tool in Populus. Journal of Experimental Botany 61: 2549-2560.

VII. Institutional Capacity:

Cornell University (CU) At Cornell, the plant sciences are spread across five departments that closely interact (Crop and Soil Sciences, Plant Breeding and Genetics, Horticulture, Plant Pathology and Plant-Microbe Biology, and Plant Biology), plus several other departments focused on natural resources, ecology, genetics, and entomology. Collectively, Cornell represents among the strongest universities nationally and internationally in the plant sciences and provides a rich and supportive academic environment for research and scholarship on crop science and agriculture.

CU International Programs of the College of Agriculture and Life Sciences (IP-CALS) is a leader in cutting edge research and international outreach in food and energy systems, the life sciences, environmental sciences, and economic and community vitality. Directed by Ronnie Coffman, IP-CALS

17

Page 21: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

has a 60-year history of developing leaders and improving lives in the world’s emerging economies through teaching, research and outreach initiatives that prepare students at the undergraduate and graduate levels for careers in international agriculture and rural development. Currently, the total value of active grants in IP-CALS is $125M spent on 22 projects operated around the world including $66.8M from the Bill & Melinda Gates Foundation and the UK Department for International Development for the Durable Rust Resistance in Wheat project.

CU Department of Plant Breeding and Genetics is dedicated to the genetic improvement of crop plants for the benefit of society through the development of novel breeding methodologies, the discovery and deployment of economically important genes as genetic stocks, germplasm, and varieties, and the training of a the next generation of plant breeders. The department is consistently ranked as the premiere program of its kind in the nation by the National Research Council. Jean-Luc Jannink, co-PI and leader of Objective 2, is currently lead or co-PI on four grants developing genomic selection methods and assessing their value for applied crop improvement. His group is completing work on "Association genetics of beta-glucan metabolism to enhance oat germplasm for food and nutritional function", funded by USDA-AFRI. The research tested use of data from oat cooperative nurseries that are similar to the uniform yield trials conducted by IITA on cassava and demonstrated the value of data from such nurseries for the training of genomic selection models. They are in the last year of work on "Evaluating Genomic Selection For Applied Plant Breeding", also funded by USDA-AFRI. This research is using historical phenotypic data from the University of Minnesota barley and the Ohio State wheat breeding programs. This research has provided the first empirically validated predictions on progeny. The Jannink group is also part of the Triticeae Coordinated Agricultural Project (TCAP) entitled "Improving barley and wheat germplasm for changing environments". This project has a statistical component aimed at improving prediction accuracy and the identification of quantitative trait loci using breeding data and also a database development component in which we are building web-accessible tools for storing, integrating, curating, and analyzing modern plant breeding data. Finally, Jannink is the lead investigator on the BMGF funded "Genomic selection: the next frontier for rapid genetic gains in maize and wheat" that is a partnership with CIMMYT to use CIMMYT breeding data to both evaluate prediction accuracies in CIMMYT's wheat and maize breeding programs and to incorporate genomic selection pipelines into those programs. These many projects give us broad experience in genomic selection and have also brought talented personnel to our lab group who can contribute their insight to our efforts in cassava.

CU Department of Crop and Soil Sciences The Chronicle of Higher Education Faculty Productivity Index (2006 and 2007) ranked Cornell’s Department of Crop and Soil Sciences second among universities offering a PhD program. In the 2010 National Research Council ranking of graduate programs, the department ranked fourth among agronomy/crop/soil science programs despite a size two to three times smaller than its peers. Dr. Tim Setter, Co-PI and leader of Objective 1, is a crop physiologist and professor in the Department of Crop and Soil Sciences, with a joint appointment in the Department of Plant Breeding and Genetics. He has decades of experience collaborating with CGIAR centers and currently collaborates with researchers at the International Center for Tropical Agriculture (CIAT), and Embrapa CNPMF, the Brazilian center for cassava research, on studies of drought tolerance of cassava.

Boyce Thompson Institute The Boyce Thompson Institute is a not-for-profit research institute affiliated with Cornell University in Ithaca, NY. The BTI has a long history of plant science research and has over 12,000 sq. feet of greenhouse space and a similar area of growth chambers to provide controlled environments for plant growth. BTI has invested extensively in genomics tools and resources, particularly bioinformatics. The Mueller Bioinformatics Lab runs a number of genomes databases, including the reference database for the Solanaceae (http://solgenomics.net), and has the necessary expertise and infrastructure to implement and maintain large-scale genomics databases and tools.

International Institute of Tropical Agriculture (IITA) The IITA was founded in 1967 and became the first African link in the CGIAR network. The institute has stations located throughout sub-Saharan Africa with regional hubs in Dar es Salam, Tanzania; Lusaka, Zambia; and Ibadan, Nigeria. The Nigerian

18

Page 22: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

 

Cornell  University:  Next  Generation  Cassava  Breeding    

facility includes the research laboratories necessary for conducting the proposed research including a biosciences center specializing in genomics applications, a genetic resources center, a germplasm health unit, and a cassava breeding unit. The Nigerian NARS and IITA have a long history of collaboration, which has transformed the country into the world’s largest cassava producer by addressing all value chain needs of the food and industrial sectors.

The National Root Crops Research Institute (NRCRI), Nigeria As the national coordinating center for cassava research and development, NRCRI seeks to improve cassava production in Nigeria through sustainable agricultural systems for food security with increased income and better livelihood for Nigerian farmers. In the last 10 years, in collaboration with IITA, it has released 21 high yielding, disease resistant varieties, and these varieties now are widely grown in Nigeria. The cassava program currently is leading a strong research component in biotic stresses with strong emphasis on cassava mosaic disease (CMD), cassava brown streak disease, and cassava bacterial blight. Through collaborative efforts with CIAT in Colombia, NRCRI has embarked on molecular breeding of cassava using SSR markers for resistance to CMD. NRCRI is the only research institute in Nigeria with a molecular tool-based breeding program and it is leading coordinated research efforts with other NARS partners in Tanzania and Uganda.

National Crops Resources Research Institute (NaCRRI), Uganda NaCRRI is one of the six National Agricultural Research Organizations established by the Parliament of Uganda through the NARS ACT (2005). It has a mandate to undertake research on major crop commodities in the country and stands as a center of excellence for all aspects of crop research for accelerated development in East Africa. NaCRRI conducts research, training, and information exchange activities in partnership with regional bodies and universities, NGOs, and the private sector. Research focuses on smallholder cropping systems on cassava, beans, banana, maize, rice, sweet potato, horticultural crops, and oil and beverage crops. The institute has released over 100 improved crop varieties, participated in training of over 200 graduate students, and organized national, regional and international meetings. NaCRRI has a long history of collaboration.

US Department of Energy Joint Genome Institute The U.S. Department of Energy (DOE) Joint Genome Institute (JGI) was created in 1997, uniting the expertise and resources in DNA sequencing, informatics, and technology development that the DOE National Laboratories pioneered to help complete the Human Genome Project. Today, the DOE JGI has grown to occupy 80,000 square feet and employ 280 staff at its Walnut Creek, California campus and serves an international community of 1106 users. The DOE JGI is the global leader in generating genome sequences of plants, fungi, microbes, and metagenomes. The Institute’s comparative analysis systems are recognized as important resources for conducting genome and metagenome studies, empowering scientists around the world to conduct studies that otherwise would be too expensive or out of reach. They allow users to analyze and improve the functional characterization of a vast number of publicly available genomes and metagenomes. Launched in 2008, and including cassava since 2009, Phytozome is the JGI’s comparative plant genomics portal. .

 

19

Page 23: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  A:  Results  Framework© 2010 Bill Melinda Gates Foundation. All Rights Reserved.

See the Instructions tab for details on how to complete this spreadsheet.

Expected to produce theseOutputs

Expected to contribute to theseOutcomes

Cassava genotypes representing appropriate experimental material for testing of flowering treatments established at experimental locations

XCassava breeding personnel in Africa use methods that induce early, profuse flowering and provide good seed set in their genomic selection programs

X

A diagnostic test is created by which lines/treatments which stimulate flowering can be assessed

X

A reliable method of floral induction is established X

XSource-sink manipulation methods for enhancing seed set in cassava plants with artificially-induced flowering are established

X

Cassava breeding personnel have gained knowledge and skills in use of protocols that induce early, profuse flowering and good seed set

X

X

Determine the state of signaling systems in cassava leaves engaged in floral stimulation (Cornell)

Develop tests for floral signaling factors (Cornell)

Among prolific cassava lines chosen for use, test the relationship between flowering and floral signaling factors in leaves and target tissues (Cornell)

Test the efficacy of various hormone treatments (hormone chemicals, dosages, application methods) for stimulating flowering in flowering-recalcitrant cassava lines (Cornell)

Vision of Success and the Most Significant Result of this Grant

Connection to Relevant Foundation Strategy

IITA, NaCRRI and NRCRI will be using GS routinely and will have the capacity to optimize the GS breeding scheme, maintain and develop the database, manage an outsourced genotyping process, and train other cassava breeders in the methodology. The most important outcome will be an increase in yield per unit time due to breeding.

This project aligns with the BMGF’s goal of Agricultural Development. Cassava is a staple food for many poor farmers in sub-Saharan Africa, but yields are low compared to yields in Latin America and Southeast Asia. Genetic improvement is a key component of any plan to increase yields sufficiently to break the cycle of hunger and poverty.

Establish nurseries of cassava plants representing precocious and flowering types in field (Africa) and greenhouse (Cornell)

Reminder: The number of objectives can be expanded or contracted as necessary, and although there should be a logical flow under each objective, there is not necessarily a one-to-one relationship among activities, outputs, and outcomes. Add or subtract activities, outputs, or outcomes as needed.

Fast Cassava: Improved ReproductionObjective #1

We will complete theseActivities

From breeding populations and historical data (IITA, NaCRRI), identify prolific cassava lines with precocious, profuse flowering

From breeding populations and historical data, identify a panel of cassava lines with a range of flowering behaviors (late/early, sparce/profuse, etc.)

Distribute propagules of cassava precocious and flowering types to experimental locations (from Africa to Cornell)

RESULTS FRAMEWORK – RESULTS TABLE

20

Page 24: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  A:  Results  Framework© 2010 Bill Melinda Gates Foundation. All Rights Reserved.

X

X

X

X

X

Expected to produce theseOutputs

Expected to contribute to theseOutcomes

If the decision above is positive, use empirical tests and physiological assays to identify a superior line(s) for graft understock

If the decision above is positive, refine methodolgy for grafting so it is reliable and high-throughput

Organize workshops for cassava breeding programs in Africa to demonstrate FastCassava protocols for precocious, proliferative flowering and reliable seed set

Determine seed set in recalcitrant-flowering cassava lines subjected to a combination of floral induction and seed-set enhancing treatments

Exchange ideas and experiences with project collaborators on the FastCassava project (Cornell, IITA, NaCRRI)

We will complete theseActivities

Provide ongoing support to breeding personnel who utilize FastCassava to identify and solve any problems

On the basis of data obtained so far, decide whether the hormone and photoperiod treatments have sufficient promise to develop the technology further.

If the decision above is positive, assay flower signaling components in tissues of plants subjected to treatments

If the decision above is positive, perform stage-2 tests of promising hormone and photoperiod treatments

Test cassava lines for flowering responsiveness to a range of photoperiod treatments (Cornell)

Tools for Genomic Selection ImplementedObjective #2

Test grafting methods for cassava stem/branches at various developmental stages

Test flowering using our method of grafting recalcitrant-flowering stems to profuse-flowering FT donor understock plants

Assay flower signaling components in tissues of plants subjected to grafting treatments

On the basis of data obtained so far, decide whether the grafting methodology has sufficient promise to develop the technology further

Test the efficacy of several source-sink manipulation treatments to enhance seed set in cassava

21

Page 25: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  A:  Results  Framework© 2010 Bill Melinda Gates Foundation. All Rights Reserved.

X Phenotypic data available for development of prediction model

Breeders in Nigeria and Uganda are using GS in their breeding programs

Genotypic data available for development of prediction model

Human capacity is available to expand GS to other regions

Prediction models chosen for key traits XThree generations of seedlings selected for high predicted performance in targeted agroecological zones

Genotypic data for evaluation of segregating progenies (recurs)

Preliminary estimates of GS accuracy and gain for cassava are available. X

First generation of seedlings selected for high predicted performance in targeted agroecological zones (IITA, NaCRRI, NRCRI)

XRate of genetic gain in cassava performance has increased measurably.

Second generation of seedlings selected for high predicted performance in targeted agroecological zones (IITA, NaCRRI, NRCRI)

XQuality of phenotypic assessment and record-keeping has improved.

Third and fourth generations of seedlings selected for high predicted performance in targeted agroecological zones (IITA only)

X

Phenotypic data to update prediction model (ongoing)

Phenotypic data to evaluate accuracy of GS X

Six African scientists with PhD degrees in Plant Breeding X

X

Expected to produce theseOutputs

Expected to contribute to theseOutcomes

X

Functional website for cassava with tools for browsing the cassava genome, maps, markers, cassava accessions and run common analyses such as BLAST

X Provide cyberinfrastructure for cassava biology and breeding

Enable breeders to predict GS models and apply them to selected genotypes

Provide cyberinfrastructure for cassava biology and breeding

Database contents with up to date genotypic, phenotypic and other breeding information

X Enable Genomic Selection in breeding programs

Fully functional site run and administered in Africa by project end X

X

We will complete theseActivities

Implement cassavabase.org website based on open source tools

Database Developed and Curated

Workshops to train African breeders in use of cassava database

Load historic breeding data

Install the production database in IITA (Nigeria) in year 3.

Hardware infrastructure installed

Load genotyping and phenotyping information, coordinate with pipeline and data providers

Design and genotyping of training populations

Coordination of genotyping and bioinformatic pipeline; ongoing

Crosses of selected parents (recurs)

DNA extractions and sequencing of progeny (recurs)

Field evaluation of selected clones (large plots)

Informal training and consultation as needed, including hosting visiting students and scientists (ongoing)

Integrate Genomic Selection algorithms on the website

Field evaluation of unselected clones (small plots) (recurs)

Workshops to train African breeders in GS

Training of PhD students

Objective #3

Selection of seedlings based on GS prediction model (recurs)

Development and cross-validation of prediction models

Collection and/or curation of phenotypic data for training population

22

Page 26: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  A:  Results  Framework© 2010 Bill Melinda Gates Foundation. All Rights Reserved.

Expected to produce theseOutputs

Expected to contribute to theseOutcomes

Diverse Latin American germplasm for pre-breeding program

Greater potential for improvement in African breeding populations

X Latin American germplasm with favorable alleles identified.

Initiation of routine use of Latin American germplasm in the cassava breeding cycle

XClones carrying favorable alleles identified for use as as parents in breeding program.

X

Expected to produce theseOutputs

Expected to contribute to theseOutcomes

X Capacity to meet deadlines for DNA extractions from thousands of seedlings X

Improved quality of experimental design and execution of breeding projects.

New staff members trained in modern plant breeding and genetics. X Strengthened research capacity to

undertake quality breeding.

X Well-trained technical staff at NaCRRI producing high quality data.

Staff at NaCRRI are better connected with international community.

XStrengthened information management, communication and leadership capabilities

Breeding germplasm can be conserved for future use.

X 8 MSc students trained XStrengthened regional human resources in the East and Central African sub-region

X

Expected to produce theseOutputs

Expected to contribute to theseOutcomes

A regularly validated web site that allows easy access and interactive use X

Public debates on societal and economic impacts of agricultural biotechnology and biotechnology in general

XDevelopment of science and biotechnology and biosafety outreach and communication strategies and partnership mechanisms

XPathways for delivery of biotechnology products through commercialization packages for prioritized commodity products developed

XSensitization workshops for farmers, scholars, researchers, decision makers and policy makers conducted and record of workshops published

Exchange visits: short-term visits

Training of students

Enhancing leadership research capabilities

Building of genetic conservation facility

Improvement of infrastructure for high throughput DNA extractions

Improvement of infrastructure for nutrient profiling.

Improvement of internet connectivity and e-library service.

Infrastructure Developed and Plant Breeders Trained

We will complete theseActivities

Objective #4

Crosses with African elite parents in both locations, propagation of seedlings; 50 exotics crossed/yr/program

We will complete theseActivities

Analysis of sequence data from Latin American accessions (Rounsley BMGF project)

Multilocation trials of progeny clones

Genotyping of progeny

Shipment of Latin American germplasm to IITA and NacRRI

Germplasm Developed

Objective #6

We will complete theseActivities

Policy seminars on current and emerging issues in biotechnology

Seminars and workshops for the scientific community on social, economic, ethical and commercial issues surrounding biotechnology; scientific workshops, seminars and symposia

Media workshops, briefings, and press releases

Objective #5

Biotechnology/biosafety education and awareness

23

Page 27: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  A:  Results  Framework© 2010 Bill Melinda Gates Foundation. All Rights Reserved.

XPublication of periodicals such as newsletters, pamphlets, and discussion and policy papers, white papers, and fact sheets

X1 PhD trained in information and communication technology and biopolicy

X

XTechnicians trained in information technology and equipment maintenance

X

Expected to produce theseOutputs

Expected to contribute to theseOutcomes

X GS model with improved resolution and accuracy

GS models have better predictive power

XGenome browser with all available SNP data integrated with gene loci data, SNPs may be associated with causative gene(s)/pathways

X Increased availablility and usefulness of SNP data

X Improved accuracy of GS model GS models have better predictive power

X Further improved models Genomic regions associated with traits

Map of causative regions contributing to desirable traits.

Breeders understand genomic information and tools and are able to use it effectively in their work

X Tool developed for predicting which SNPs will segregate in a given cross

X

Printed materials, videos, annual exhibitions and workshops for the health community

Education and polling of farmers and growers

Polling of consumer groups

Regular public lectures by scientists and biotechnology experts; brochures, pamphlets and pocket guides; annual biotechnology exhibitions, touring exhibits and videos and film shows.

Objective #7 Enhancement of GS through cassava genomics

We will complete theseActivities

Missing GBS haplotype information imputed from WGS SNP calls, calculate amount of improvement to models

Load WGS SNP information to vizualization platform and integration with GBS SNP information, maintain with up-to-date data

Calculate simple statistics such as gene density and provide to Cornell for GS model impovement. Assess amount of model improvement.

Investigate use of probabilistic functional gene networks and other analysis to predict gene function and use this information to further improve models. Assess amount of improvement

Calculation of ancestral haplotypes including delineating wild and elite genomic portions. Association of these regions with traits.

Develop and maintain tool within Phytozome for vizualization and searching segregating SNPs in arbitrary crosses

Communicate available genomic data and tools to breeders and train them in best practices for using it.

24

Page 28: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  A:  Results  Framework-­‐Milestones© 2010 Bill Melinda Gates Foundation. All Rights Reserved.

Grant End

September 2012 August 2013 September

2013 August 2014 September 2014 August 2015 September

2015 August 2016 September 2016 August 2017 August 2017

Baseline(if relevant

and available)

Cumulative target at

grant end

1

Establish nurseries of cassava plants representing precocious and flowering types in field (IITA and NaCRRI) and greenhouse (Cornell)

1

Cassava genotypes representing appropriate experimental material for testing of flowering treatments established at experimental locations

1

Among prolific cassava lines chosen for use, test the relationship between flowering and floral signaling factors in leaves and target tissues (Cornell)

1A diagnostic test is created by which lines/treatments which stimulate flowering can be assessed

1If the decision above is positive, perform stage-2 tests of promising hormone and photoperiod treatments

1If the decision above is positive, refine methodolgy for grafting so it is reliable and high-throughput

1 A reliable method of floral induction is established

1Test the efficacy of several source-sink manipulation treatments to enhance seed set in cassava

Source/sink manipulation is tested.

Target at period end

Methodology for grafting is tested.

Methodology for grafting is refined.

Period Five

Objective #

Period Two

Target at period end

Period Four

Key Milestones

Period One

Target at period end Target at period end

Period Three

Target at period end

Diagnostic test is created.

Stage-1 tests are performed.

Stage-2 tests are performed.

Nurseries are established.

Experimental material is established.

Prolific cassava lines are tested.

Hormone and/or grafting method is established.

RESULTS FRAMEWORK – KEY MILESTONES

25

Page 29: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  A:  Results  Framework-­‐Milestones© 2010 Bill Melinda Gates Foundation. All Rights Reserved.

Grant End

September 2012 August 2013 September

2013 August 2014 September 2014 August 2015 September

2015 August 2016 September 2016 August 2017 August 2017

Baseline(if relevant

and available)

Cumulative target at

grant endTarget at period end

Period Five

Objective #

Period Two

Target at period end

Period Four

Key Milestones

Period One

Target at period end Target at period end

Period Three

Target at period end

1

Determine seed set in recalcitrant-flowering cassava lines subjected to a combination of floral induction and seed-set enhancing treatments

1Source-sink manipulation methods for enhancing seed set in cassava plants with artificially-induced flowering are established

1

Organize workshops for cassava breeding programs in Africa to demonstrate FastCassava protocols for precocious, proliferative flowering and reliable seed set

1

Cassava breeding personnel have gained knowledge and skills in use of protocols that induce early, profuse flowering and good seed set

1

Cassava breeding personnel in Africa use methods that induce early, profuse flowering and provide good seed set in their genomic selection programs

2 Design and genotyping of training populations

2 Prediction models chosen for key traits

2First generation of seedlings selected for high predicted performance in targeted agroecological zones

2Second generation of seedlings selected for high predicted performance in targeted agroecological zones

First generation of seedlings selected at IITA

First generation of seedlings selected at NaCRRI ad NRCRI

Second generation of seedlings selected at IITA

Second generation of seedlings selected at NaCRRI and NRCRI

Workshops for cassava breeding programs

Prediction models developed for NaCRRI and

NRCRI

Prediction models developed for IITA

Training populations have been genotyped at all three programs, data uploaded to

database

Response of seed set to treatments is determined

Cassava breeding personnel in Africa use methods for enhanced flowering and seed set.

Cassava breeding personnel test methods

Methods for enhancing seed set are established

26

Page 30: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  A:  Results  Framework-­‐Milestones© 2010 Bill Melinda Gates Foundation. All Rights Reserved.

Grant End

September 2012 August 2013 September

2013 August 2014 September 2014 August 2015 September

2015 August 2016 September 2016 August 2017 August 2017

Baseline(if relevant

and available)

Cumulative target at

grant endTarget at period end

Period Five

Objective #

Period Two

Target at period end

Period Four

Key Milestones

Period One

Target at period end Target at period end

Period Three

Target at period end

2Third and fourth generations of seedlings selected for high predicted performance in targeted agroecological zones (IITA only)

2 Workshops to train African breeders in GS

2 Phenotypic data to evaluate accuracy of GS / IITA

2 Preliminary estimates of GS accuracy and gain / IITA

2Six African scientists with PhD degrees in Plant Breeding; one Cornell PhD

3 Hardware infrastructure installed

3

Functional website for cassava with tools for browing the cassava genome, maps, markers, cassava accessions and run common analyses such as BLAST

3Database contents with up to date genotypic, phenotypic and other breeding information

3Workshops to train African breeders in use of cassava database

3 Training for African informaticians in Ithaca

3Fully functional site run and administered in Africa by project end

Workshop held in Africa for breeders

Workshop held in Africa for breeders

Third generation of seedlings selected

Six African scientists with PhD degrees in Plant Breeding; one Cornell PhD

Estimates of GS accuracy and genetic gain are available

Advanced yield trial data for selected clones from crosses 1,2,3 have been collected

Fourth generation of seedlings selected at IITA

Mirror site is installed and operational at IITA

Workshops organized in Africa (concomitant with objective #2)

Workshops organized in Africa (concomitant with objective #2)

Training of 2 informaticians Training of 2 informaticians

Hardware installed and operational

Database contents up to date

Database contents up to date

Database contents up to date

Database contents up to date

Cassavabase is up and functional; users can upload phenotypic and genotypic data; genomic selection implemented; training population data available

27

Page 31: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  A:  Results  Framework-­‐Milestones© 2010 Bill Melinda Gates Foundation. All Rights Reserved.

Grant End

September 2012 August 2013 September

2013 August 2014 September 2014 August 2015 September

2015 August 2016 September 2016 August 2017 August 2017

Baseline(if relevant

and available)

Cumulative target at

grant endTarget at period end

Period Five

Objective #

Period Two

Target at period end

Period Four

Key Milestones

Period One

Target at period end Target at period end

Period Three

Target at period end

4 Shipment of Latin American germplasm to IITA and NacRRI

4Crosses with African elite parents in both locations, propagation of seedlings

4Clones carrying favorable alleles identified for use as as parents in breeding program.

4 One PhD student trained at NaCRRI

5 8 MSc students trained8 MSc degrees completed

5 Improvement of infrastructure for high throughput DNA extractions

5 Building of genetic conservation facility

5 Improvement of infrastructure for nutrient profiling.

5 Exchange visits: short-term visits

5 Improvement of internet connectivity and e-library service.

Shipment of 50 clones of M esculenta

Shipment of 50 clones of M esculenta

Shipment of 50 clones of M esculenta

2 visits hosted at NaCRRI

50 M esculenta crossed at NaCRRI and NRCRI.

50 M esculenta crossed at NaCRRI and NRCRI.

50 M esculenta crossed at NaCRRI and NRCRI.

8 MSc degrees completed

One PhD student trained at NaCRRI

Capacity to extract at least 100 quality DNA sample per day.

Bidding process and facility designs completed

Facility construction completed and commissioned

At least 1,000 cassava accessions collected

genotyped and conserved

Report on germplasm exchange

Advisory note on high speed internet connectivity and e-library service prepared.

Recommendations from the advisory note implemented and completed.

Bidding process and facility designs completed

Facility construction completed. Essential equipment identified,

ordered and delivered. Facility certified for use.

2 visits hosted at NaCRRI 2 visits hosted at NaCRRI

Laboratory internationally accreditated.

2 visits hosted at NaCRRI2 visits hosted at NaCRRI; Visiting scientist framework for NARO developed

Breeding material identified from first set of crosses.

Breeding material identified from second set of crosses.

28

Page 32: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  A:  Results  Framework-­‐Milestones© 2010 Bill Melinda Gates Foundation. All Rights Reserved.

Grant End

September 2012 August 2013 September

2013 August 2014 September 2014 August 2015 September

2015 August 2016 September 2016 August 2017 August 2017

Baseline(if relevant

and available)

Cumulative target at

grant endTarget at period end

Period Five

Objective #

Period Two

Target at period end

Period Four

Key Milestones

Period One

Target at period end Target at period end

Period Three

Target at period end

5 Enhancing leadership research capabilities

6 A regularly validated web site that allows easy access and interactive use

6

Development of science and biotechnology and biosafety outreach and communication strategies and partnership mechanisms

6

Sensitization workshops for farmers, scholars, researchers, decision makers and policy makers conducted and record of workshops published

6

Provide a platform for public debates on societal and economic impacts of agricultural biotechnology and biotechnology in general

6

Pathways for delivery of biotechnology products through commercialization packages for prioritized commodity products developed

6Publication of periodicals such as newsletters, pamphlets, and discussion and policy papers, white papers, and fact sheets

61 PhD trained in information and communication technology and biopolicy

Pairing scheme of NARO leaders with selected global model; At least one contact visit between leaders arranged

At least one contact visit between leaders arranged

At least one contact visit between leaders arranged

Website set up Monthly updates Monthly updates Monthly updates Monthly updates

Agbiotech and biosafety communication plan developed. Report on relevant ethical issues completed.

Reports on community discussions of regulatory process, including testing, and assessing and managing risks to human health and the environment, completed.

Workshop reports on community confidence in ag-biotech, its regulation, the industry and the way risks are assessed and managed.

Report on policy contribution to ag-biotech R&D completed.

Report on community concerns related to potential socio-economic effects of ag-biotech, and its impacts completed.

Quarterly reports on sensitization workshops for targeted stakeholder segments

Quarterly reports on sensitization workshops for targeted stakeholder segments

Quarterly reports on sensitization workshops for targeted stakeholder segments

Quarterly reports on sensitization workshops for targeted stakeholder segments

Quarterly reports on sensitization workshops for targeted stakeholder segments

A well-populated platform created

Population of the platform with essential information

Population of the platform with essential information

Population of the platform with essential information

Population of the platform with essential information

1 PhD trained in information and communication technology and biopolicy

Annual step by step analysis reports on product commercialisation pathways

Annual step by step analysis reports on product commercialisation pathways

Annual step by step analysis reports on product commercialisation pathways

Annual step by step analysis reports on product commercialisation pathways

Annual step by step analysis reports on product commercialisation pathways

At least two well researched information materials published every three months

At least two well researched information materials published every three months

At least two well researched information materials published every three months

At least two well researched information materials published every three months

At least two well researched information materials published every three months

29

Page 33: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  A:  Results  Framework-­‐Milestones© 2010 Bill Melinda Gates Foundation. All Rights Reserved.

Grant End

September 2012 August 2013 September

2013 August 2014 September 2014 August 2015 September

2015 August 2016 September 2016 August 2017 August 2017

Baseline(if relevant

and available)

Cumulative target at

grant endTarget at period end

Period Five

Objective #

Period Two

Target at period end

Period Four

Key Milestones

Period One

Target at period end Target at period end

Period Three

Target at period end

7

Calculate simple statistics such as gene density and provide to Cornell for GS model improvement. Assess amount of model improvement.

7

Load WGS SNP information to vizualization platform and integration with GBS SNP information, maintain with up-to-date data

7

Genome browser with all available SNP data integrated with gene loci data, SNPs may be associated with causative gene(s)/pathways

7Missing GBS haplotype information imputed from WGS SNP calls, calculate amount of improvement to models

7

Investigate use of probabilistic functional gene networks and other analysis to predict gene function and use this information to further improve models. Assess amount of improvement

7Develop and maintain tool withing Phytozome for vizualization and searching segregating SNPs in arbitrary crosses

7Communicate available genomic data and tools to breeders and train them in best practices for using it.

Calculate missing data, add to models, calculate amount of improvement

GBS and WGS data loaded GBS and WGS data loaded

Travel to Africa to train breeders

Travel to Africa to train breeders

Travel to Africa to train breeders

Travel to Africa to train breeders

Travel to Africa to train breeders

Investigation, implementation and

assessment of improvements to various methods for improving

model accuracy

Investigation, implementation and

assessment of improvements to various methods for improving

model accuracy

Investigation, implementation and

assessment of improvements to various methods for improving

model accuracy

Investigation, implementation and

assessment of improvements to various methods for improving

model accuracy

Tool developed and shared with breeders

Tool further developed to integrate requests from

breeders

Tool further developed to integrate requests from

breeders

Tool further developed to integrate requests from

breeders

GBS and WGS can be vizualized and searched

within Phytozome platform

Platform further developed to integrate requests from

breeders

Platform further developed to integrate requests from

breeders

Platform further developed to integrate requests from

breeders

Statistics calculated, added to models, improvement in model accuracy calculated

30

Page 34: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  B:    Preliminary  Results  from  Previous  Findings    

Appendix B Preliminary Results from Previous Funding

Milestones 1.1.1 to 1.1.3: Identification of an appropriate collection of germplasm. IITA has a large Genetic Gain (GG) population for which there are many years of historical phenotype data collected in many environments. Our validation population consists of those members of the GG population for which plant material still exists (i.e., DNA can be extracted), and for which phenotype data is based on a sufficient number of observations. In all we have extracted DNA and genotyped 623 of the GG clones. Curation of phenotype data has proved to be challenging, as records collected over many years need to be assembled, checked, and standardized. Historical phenotypic data from 37,752 plots from GG trials and 11,266 plots from Uniform Yield Trials (UYT) was analyzed using a minimal mixed model to obtain a single value for each trait for each clone. The linear model fitted environment as fixed effects (44 environments from the GG experiment and 53 environments from the UYT), and replication within environment and clone as random effects. Seventeen traits were analyzed (Table 1). Plot basis broad-sense heritabilities for traits were calculated as the ratio of the clone variance to the sum of clone and error variances. For some traits (Table 1), the number of plants harvested per plot was fitted as a fixed-effect covariate in the linear model. As of the writing of this report, we have both genotype data and good phenotypic records for 512 clones. Work to increase the number of clones with curated phenotypic data is ongoing.

Table 1. Abbreviations, plot-basis broad-sense heritabilities, and definitions of traits analyzed.

Abbrev. H2 Definition SPROUT 0.21 Proportion of stakes germinated, scored one month after planting. VIGOR 0.15 Assessment of plant vigor during establishment, scored one month after planting. HI 0.23 Ratio of fresh root weight divided by total biomass. NKLGT 0.12 Root neck length with 0 = absent, 3 = short, 5 = intermediate, and 7 = long. ROTNO 0.05 Number of rotted storage roots per plot at the time of harvest. DM 0.34 Percentage dry matter storage root.

CMDS 0.63 Cassava mosaic virus severity, family Geminiviridae genus Begomovirus. 1 = (clean, no infection) to 5 = (extremely severe, severely diseased).

CMDI 0.64 Proportion of plants within a plot showing symptoms of cassava mosaic disease.

CBBS 0.10 Bacterial blight disease severity, Xanthonomas axonopodis pv. Manihotis. 1 = (clean, no infection) to 5 = (extremely severe, severely diseased).

CBBI 0.04 Proportion of plants within a plot showing symptoms of cassava bacterial blight.

CGM 0.18 Cassava green mite damage, Mononychellus tanajoa. Symptoms rated from 1 = (clean, no infection) to 5 = (extremely severe, severely diseased).

RTWT 0.12 Total fresh weight of storage roots harvested per plot, measured in kg. FYLD 0.17 Fresh weight of harvested roots, expressed in tons per hectares per plant.

DYLD 0.14 Dry weight of harvested roots derived by multiplying fresh storage root yield by dry matter content, expressed in tons per hectares.

SHTWT 0.09 Total fresh weight of harvested foliage and stems, in kilograms per plot. TYLD 0.13 Total fresh weight of harvested foliage and stems, expressed in tons per hectare. RTNO 0.14 Total number of storage roots harvested per plot.

31

Page 35: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  B:    Preliminary  Results  from  Previous  Findings    

Plot basis broad-sense heritabilities for traits were calculated as the ratio of the clone variance to the sum of clone and error variances. For the traits RTWT, FYLD, DYLD, SHTWT, TYLD, and RTNO, the number of plants harvested per plot was fitted as a fixed-effect covariate in the linear model.

Milestones 1.2.1 and 1.2.2: Selection of a genetic marker system. The number of SNP genotype assays that are available for cassava is small, and the assays suffer from ascertainment bias. Furthermore, work was already planned for a large genotyping-by-sequencing (GBS) effort in cassava, as part of the Rounsley project funded by the BMGF. GBS uses a bioinformatic pipeline to call SNPs from next-generation sequencing of reduced-representation, bar-coded libraries. Because of the efficiency, high marker number, low cost, and lack of bias in GBS data, we chose GBS as our marker system. We are using a protocol developed in the Bucker lab at Cornell University (Elshire et al. 2011).

Milestone 1.3.1: Extraction of DNAs and scoring of markers. DNAs were extracted at IITA and sent to Cornell University, where PstI libraries were constructed and sequenced using the Illumina HiSeq. PstI was chosen because the low fragment number results in the higher read depth necessary to score heterozygous genotypes accurately. A bioinformatics pipeline designed for maize was modified for cassava and used to extract ~5000 high quality SNPs. The number of markers obtained from the PstI libraries is sufficient for our initial analyses but will need to be increased to obtain the maximal benefit from GS. We are currently testing libraries made with double-digested DNAs, which will generate more fragments of the appropriate size for sequencing. This will increase genome coverage but reduce depth, which can be adjusted by reducing multiplexing factors if necessary. DNAs for these new libraries come from a biparental (outcrossed parents) mapping population, which will also help us to fine-tune our SNP-calling algorithm.

Milestones 1.4.1 and 1.4.2: Data analysis. Table 2. Estimation of the population recombination parameter.

scaffold genetic posa

size in kb

# SNPs 4Necb E(r2)c at 10kb

E(r2) at 50kb

E(r2) at 100kb

12794 1, 33-41 2014 26 4.67E-05 0.68 0.30 0.18 6656 1, 48 1173 20 5.00E-05 0.67 0.29 0.17 2538 4, 8-10 126 12 4.13E-04 0.19 0.05 0.02 3741 5, 44-49 949 21 5.67E-05 0.64 0.26 0.15 1551 5.2, 16-21 1872 37 2.83E-05 0.78 0.41 0.26 7520 11, 75 2483 63 1.10E-05 0.90 0.65 0.48 4457 11, 96 1283 30 2.67E-05 0.79 0.43 0.27 7571 11, 110 493 25 3.90E-05 0.72 0.34 0.20 3614 12, 54-64 1679 21 2.50E-05 0.80 0.44 0.29 7012 13, 12 1131 19 3.75E-04 0.21 0.05 0.03 5875 na 1951 39 2.50E-05 0.80 0.44 0.29 7991 na 530 16 2.10E-04 0.32 0.09 0.05 5280d 7,112 691 14 2.8E-02 0.00 0.00 0.00

a Some of the scaffolds have been assigned a genetic map position based on SNP genotyping assays that were used in construction of a genetic map (Rabbi et al, submitted) and also used in BLAST searches against the cassava genome. b Estimates of 4Nec were obtained using the software MaxDip (genapps.uchicago.edu/maxdip/), which uses the method of Hudson (2001). Because MaxDip accommodates a maximum of 48 diploid individuals per run, 48 individuals with low missing data were randomly sampled from 6 to 8 times, to obtain 6 to 8 estimates of 4Nec for each scaffold. The estimates were similar among runs for the same scaffold. c Based on the relationship E(r2) = 1/(1 + 4Nec), where c is recombination rate/base pair.d This scaffold is clearly an outlier. The likelihood curve for this scaffold was also very flat.

32

Page 36: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  B:    Preliminary  Results  from  Previous  Findings    

Population genetic analyses so far consist of estimates of the population recombination parameter, 4Nec, which underlies the expected level of linkage disequilibrium (LD) across the genome. E(r2) = 1/(1 + 4Nec), where r2 is the correlation coefficient for alleles at loci that are c recombination units apart, and Ne is effective population size. The cassava genome assembly is very fragmented, but a number of large scaffolds exist such that 4Nec can be calculated based on patterns of genotype frequencies at SNPs across the scaffold. Estimates of 4Nec for 13 scaffolds are shown in Table 2.

Based on 12 scaffolds (leaving out one outlier), the average estimate of 4Nec is 0.000109. An average value of c can be obtained from the lengths of the physical and genetic maps: c = 17.99 Morgans/760 Mb = 2.3 x 10-8 M/bp. An estimate of Ne can be calculated as 4Nec/4c = 0.000109/(4 * 2.3 x 10-8) = 1185. These analyses suggest that, for this training population, useful LD is likely to decay between 10 and 50 kb. Given the size of the cassava genome (760 Mb), increasing the number of markers to ~20,000 should improve GS prediction accuracy. Genomic Prediction Genomic prediction accuracies have been estimated for 17 traits, which range in plot-basis broad-sense heritability from 0.04 to 0.64 (Table 1); traits with low heritability are harder to predict. The simplest genomic selection model is ridge regression. This model assumes that all marker effects are drawn from the same normal distribution, with zero mean and a variance that can be estimated by maximum likelihood. The assumption of this model is that genetic effects are spread out evenly across the genome, in other words, that the trait is truly polygenic. The model has been shown to be equivalent to an analysis that first estimates additive genetic relationships among individuals (clones in our case) using the marker data and then predicts individual effects as random effects with a covariance matrix proportional to the relationship matrix (Habier et al. 2007; Hayes et al. 2009). This analysis provides a breeding value estimate rather than a genotypic value estimate. That is, the model is truly additive and seeks to predict what a cassava clone may pass on to its progeny rather than seeking to predict the performance of the clone itself. Dominance and epistatic effects will cause the performance of a clone to deviate from the value that it may pass on to its progeny.

Two approaches were used to capture more of the genotypic value of a clone. In the first, the additive genetic relationship matrix was combined with a dominance relationship matrix also calculated from the marker data. This approach allows dominance effects to be included in the estimate but retains the assumption that genetic effects are evenly spread across the genome. Note that because we have not (yet) tried to phase the cassava genotypes, the calculation of the dominance relationship matrix was approximate at missing marker data points. In the second approach, we used a genomic selection method called Random Forest (Breiman, 2001). Without going into the details of the method, it is more effective at identifying and accounting for large effect loci and it can also capture gene interactions. It is less effective at capturing the influence of many loci with small effect loci. The accuracies of these models were assessed by ten-fold cross validation as follows. The dataset of 512 lines was randomly split into ten subsets or "folds." Each fold in turn was predicted by the genomic selection model while excluding the fold from the training population. Accuracy was calculated as the correlation between the observed phenotype and the prediction. We note that to assess the value of genomic predictions for selection, they should be correlated with the true breeding value. Unfortunately, true breeding values are not known and observed phenotypes must be used instead. The accuracy based on correlation to the phenotype under-estimates the true accuracy in two ways. First, there is error associated with the phenotype that reduces the correlation. Second, there are deviations between the genotypic value represented in the phenotype and the breeding value that should be selected upon. This latter source of downward bias is not an issue for the additive plus dominance relationship matrix and the random forest approaches. A rough comparison between the accuracy of phenotypic selection and

33

Page 37: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  B:    Preliminary  Results  from  Previous  Findings     38  

genomic selection can be made by plotting the accuracy of the prediction model against the square root of the heritability (Fig. 1).  Figure 1. Accuracy of genomic prediction versus that of a single phenotypic plot. Each point represents one trait. The red line is the 1:1 line of equality between phenotypic and genomic accuracies.

Genomic prediction using additive-effect ridge regression was generally slightly more accurate than the prediction obtained from a single phenotypic plot (Fig. 1). The two exceptions (points to the right and below the red equality line) were cassava mosaic disease severity and incidence. For this disease, there is a well-known major gene conferring resistance that falls outside of the ridge regression assumption of a highly polygenic trait (Akano et al 2002). In practice, for these important traits, we would follow an approach of association mapping for the major genes and then include those loci as fixed effects in the prediction model, leaving ridge regression models to predict the effects of minor loci. Consistent with this major gene hypothesis for the diseases, random forest gave higher prediction accuracies than ridge regression only for the five disease traits (Fig. 2).  Figure 2. Accuracy of random forest versus that of additive ridge regression. Each point represents one trait. The red line is the 1:1 line of equality between the two genomic prediction models. The five traits for which random forest did better than ridge regression are disease and pest resistance traits (mosaic disease incidence and severity; bacterial blight incidence and severity; and green mite damage). Smaller but more consistent gains in accuracy were achieved by including dominance in the genomic prediction models (Fig. 3). We note that some improvement would be expected regardless of the importance of dominance because the additive+dominance model estimates an additional dominance variance component. We suspect that the improvement due to addition of dominance in the model for disease resistance traits arises because of the action of dominant major genes.  Figure 3. Accuracy of additive + dominance versus that of additive ridge regression. Each point represents one trait. The red line is the 1:1 line of equality between the two genomic prediction models. The red symbols indicate those traits for which we have stronger belief in the importance of dominance. Among them are four disease traits (mosaic disease incidence and severity; bacterial blight incidence and severity) and four fitness components (sprouting, early vigor, harvest index, and dry matter yield). To further explore the importance of heterozygosity in cassava we correlated trait values with the degree of heterozygosity of each clone. The clones varied widely in their

!"#$

!"%$

!"&$

!"'$

!"($

!"#$ !"%$ !"&$ !"'$ !"($

!""#$%"&'%((

)*+,'-'(./

)0%0

",'

!""#$%"&'%(()*+,'/.(,1'

!"#$

!"%$

!"&$

!"'$

!"($

!"#$ !"%$ !"&$ !"'$ !"($

!""#$%"&'$%

()*+

',*$-./'

!""#$%"&'%))012-'+*)-3'

!"!#

!"$#

!"%#

!"&#

!"'#

!# !"$# !"%# !"&# !"'#

!""#$%"&'()*

+,-"'.$)/-"0+*

'

!""#$%"&'1-*(2)'.3)*+4&.-"'.2+4'

34

Page 38: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  B:    Preliminary  Results  from  Previous  Findings    

degree of heterozygosity (Fig. 4). The relationship between heterozygosity and phenotype was highly significant for several traits though it did not explain a very high fraction of the variance (the maximum was 6% of the variance of harvest index). These results suggest that marker data will be additionally helpful to avoid making crosses among clones that are similar and could lead to partially inbred progeny. Figure 4. Histogram of the degree of heterozygosity across genotyped clones.

Conclusions We emphasize that these results are very preliminary. Nevertheless, they indicate that prediction accuracies for cassava are similar to those for other species given the training population size available. Thus, the unique aspects of cassava (the fact that is a root crop, is clonally propagated, and has high levels of heterozygosity) are not negatively affecting prediction accuracies in ways we do not understand. Given that we anticipate that genomic selection can accelerate the breeding cycle of cassava between two and a half and five fold (for two and one year genomic selection breeding cycles, respectively), even the accuracies obtained thus far indicate that rates of improvement under genomic selection will be more rapid than under phenotypic selection. We believe that our current accuracies are a lower bound on what is achievable. In particular, several analysis improvements are still low-hanging fruit:

1. We have so far done minimal data curation / outlier detection on either phenotype or genotype data. We have yet to compare genotypic similarities obtained from markers with those expected from pedigree.

2. We have treated observations from Genetic Gain and Uniform Yield Trial experiments the same, though plot sizes and management differ.

3. We have done no environment or genotype-by-environment analysis. Our inclusion of all environments in the analysis means that we are looking at very broad-adaptation performance, which inevitably lowers heritability and decreases prediction accuracy.

4. Genomic selection analyses can be improved by incorporating information from genome-wide association analyses to better account for major gene effects and by looking at subpopulation structure within the Genetic Gain panel.

Proportion SNPs heterozygous

Freq

uenc

y

0.10 0.15 0.20 0.25 0.30 0.35

020

4060

8010

0

35

Page 39: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  B:    Preliminary  Results  from  Previous  Findings    

5. Finally, we are still building the training population with further phenotypes and clones to come from IITA advanced yield trials. These latter clones are more representative of the material that would be tested in the ongoing breeding program at IITA.

Expansion of the training population is a constant task for genomic selection that we take seriously and that will deliver incremental accuracy gains. Improvement in the marker system represents a somewhat higher hanging fruit, but one that many people are working on. We anticipate that we will obtain more SNPs and that increased knowledge of the cassava genome resulting from other research will help increase the reliability of SNP calling and imputation.

 

36

Page 40: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

Appendix C

Supporting documentation for Specific Objectives

For each scientific objective, the responsible personnel are listed. For most of the objectives, additional detail is provided for rationale and experimental procedures.

Objective 1. Fast Cassava: Improved Reproduction…………………..…………….….. 38

Objective 2. Tools for Genomic Selection Implemented………………..……………..... 39

Objective 3. Database Developed and Curated…………………………..…………..…...43

Objective 4. Germplasm Developed………….…………………………..…………..……43

Objective 5. Infrastructure Developed and Plant Breeders Trained………….……..… 44

Objective 6. Biotechnology/Biosafety Education and Awareness…………………...…..48

Objective 7. Enhancement of Genomic Selection through Cassava Genomics………...50

37

Page 41: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

Objective 1. Fast Cassava: Improved Reproduction

Name Institution Email Tim Setter Cornell University, Ithaca, USA [email protected] Peter Kulakow IITA/Ibadan, Nigeria [email protected] Anthony Pariyo NaCRRI, Namulonge, Uganda [email protected] Chiedozie Egesi NRCRI, Umudike, Nigeria [email protected]

Shortening the juvenile period and increasing the rate of flowering in low flowering cassava genotypes will be tested by a combination of greenhouse/laboratory experiments at Cornell and field tests at the African partner locations, IITA, NRCRI, and NaCRRI.

Cornell researchers will take a lead role in developing the treatment test combinations in greenhouse/growth chambers and monitoring diagnostic signaling molecules. They will develop diagnostic tests for the flowering agents using a small set of genotypes that are available at Cornell. The methods that are successful at Cornell will then be tested in the field on a larger set of genotypes in Africa, preferably using diagnostic tests adapted for use there. If this proves to be problematic, tissue samples from treated plants will be collected for shipment to Cornell for analysis. Each African partner may specialize in testing different treatments to avoid redundancy and assist in testing of more treatment options.

Once significant progress has been made, then the most promising flower-inducing and/or seed set-inducing protocol will be adopted in undertaking the next set of crosses under Objective 2.

Field tests of Fast Cassava Technology: Field tests will be conducted to determine the most promising treatments to reduce the time to flowering and to increase the rate of flowering. Two types of experiments will be conducted. The first will involve chemical treatments with plant growth regulators. The second type of treatments will utilize grafting combinations in which genotypes representing a range of flowering abilities will be grafted on to stock-plant genotypes with a tendency for early and profuse flowering. The field tests will be conducted as factorial designs (chemical treatments x time/rate of application x variety) or (grafting treatment x time after planting x variety). Field experiments will be timed to coincide with normal flowering periods for each location, thus the timing of the trials will be important. Experiments will be managed by local field staff but monitoring of these studies will need to be done by specially trained staff or graduate students (preferable). The first two years of trials will focus on empirical development of treatments that show promising effects on flowering. The following years will focus on optimizing the treatments, aided by measurements of signaling molecules that operate in conjunction with florigen proteins (FT) in modulating the flowering response.

The combination of laboratory and field studies will be essential for development of treatments to improve flowering and seed production as an essential component to accelerate breeding progress in cassava.

38

Page 42: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

Objective 2. Tools for Genomic Selection Implemented

Name Institution Email Jean-Luc Jannink USDA/ARS/Cornell Univ, Ithaca, USA [email protected] Martha Hamblin Cornell University, Ithaca, USA [email protected] Charlotte Acharya Cornell University, Ithaca, USA [email protected] Peter Kulakow IITA/Ibadan, Nigeria [email protected] Ismail Rabbi IITA/Ibadan, Nigeria [email protected] Yona Baguma NaCRRI, Namulonge, Uganda [email protected] Robert Kawuki NaCRRI, Namulonge, Uganda [email protected] Chiedozie Egesi NRCRI, Umudike, Nigeria [email protected] Emmanuel Okogbenin NRCRI, Umudike, Nigeria [email protected]

Breeding cycle schemes: Two genomic selection schemes will be implemented: an annual cycle at IITA, and a 2-year cycle at NaCRRI and NRCRI. These schemes were chosen by the breeding programs based on their capacity to extract DNA and make crosses within these time frames. The tables below show the schedules and numbers of plants for each of the activities, from crossing to field evaluations. In the annual cycle, seeds will be harvested in two batches, so that seedling nurseries and DNA extractions can begin as soon as possible. After the GS model is run and parents are chosen for the next cycle, unselected seedlings will be propagated and evaluated in small plots to collect data for updating the training population; genotype data for these clones will already be available. The annual cycle will allow collection of high quality phenotypic data on the first two cycles of progeny; only preliminary yield trials will be performed in programs using the two-year cycle.

39

Page 43: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

!"#$%#&'()*+,#

!"#$%& '()*+$,-* .+$,-,$/01 .+$,-,$/02 .+$,-,$/03 .+$,-,$/04 506*#"$/7*& 5089:#$& ;:#<0=%:>

?3@A B*+"C(,#*0=D,E&$0+/+9*>

FE"&&,#G0(9"+H +E"&&,#G I:E-*&$0*:E9/0&**<& I:E-*&$09:$*0&**<& -..(G*#"$/7*&J02K079:#$&0*:+%0L04KKK

KMA

4@1K N*9*+$0O0!P9$,79/ N**<0G*EC,#:$,"#0 F!Q0?0&+E**#,#G !P9$,79,+:$,"# /...0&**<9,#G& KMA

R@S 6*#"$/7*0O08E*<,+$ QT.0 6UN 6N0C"<*9 -/..0&**<9,#G&

1K@21 8%*#"$/7*V 2KK0F10=F9"#:9>0N*9*+$*<

23KK0F10=F9"#:9>0#"#?&*9*+$*<

-/..0+9"#*& 0/... 1

W@1X B*+"C(,#*0=&*+"#<0+/+9*>

FE"&&,#G0(9"+H +E"&&,#G I:E-*&$0*:E9/0&**<& I:E-*&$09:$*0&**<& -..0G*#"$/7*&J0R079:#$&0*:+%0L012KK

KM1

1R@22 N*9*+$0O0!P9$,79/ N**<0G*EC,#:$,"#0 F!Q0?0&+E**#,#G !P9$,79,+:$,"# /...0&**<9,#G& KMA

1S@2K 6*#"$/7*0O08E*<,+$ QT.0 6UN 6N0C"<*9 -/..0&**<9,#G&

22@33 8%*#"$/7*V 2KK0F20=F9"#:9>0N*9*+$*<

23KK0F20=F9"#:9>0#"#?&*9*+$*<

4KK0F10=8YZ>0?0209"+:$,"#&

-1..0+9"#*& 20... 2

21?2W B*+"C(,#*0=$%,E<0+/+9*>

FE"&&,#G0(9"+H +E"&&,#G I:E-*&$0*:E9/0&**<& I:E-*&$09:$*0&**<& -..0G*#"$/7*&J0R079:#$&0*:+%0L012KK

KM1

2S?34 N*9*+$0O0!P9$,79/ N**<0G*EC,#:$,"#0 F!Q0?0&+E**#,#G !P9$,79,+:$,"# /...0&**<9,#G& KMA

3K?32 6*#"$/7*0O08E*<,+$ QT.0 6UN 6N0C"<*9 -/..0&**<9,#G&

34?4A 8%*#"$/7*V 2KK0F30=F9"#:9>0N*9*+$*<

23KK0F30=F9"#:9>0#"#?&*9*+$*<

4KK0F20=8YZ>0?0209"+:$,"#&

-1..0+9"#*& 20... 2

33?41 B*+"C(,#*0=D"PE$%0+/+9*>

FE"&&,#G0(9"+H +E"&&,#G I:E-*&$0*:E9/0&**<& I:E-*&$09:$*0&**<& -..0G*#"$/7*&J0R079:#$&0*:+%0L012KK

KM1

4K?4R N*9*+$0O0!P9$,79/ N**<0G*EC,#:$,"#0 F!Q0?0&+E**#,#G !P9$,79,+:$,"# /...0&**<9,#G& KMA

42?44 6*#"$/7*0O08E*<,+$ QT.0 6UN 6N0C"<*9 -/..0&**<9,#G&

4R?AX 8%*#"$/7*V 2KK0F40=F9"#:9>0N*9*+$*<

23KK0F40=F9"#:9>0#"#?&*9*+$*<

4KK0F30=8YZ>0?0209"+:$,"#&

2KK0F10:#<0F20=.YZ>0309"+:$,"#&

20..0+9"#*& 31... 4

4A?A3 B*+"C(,#*0=D,D$%0+/+9*>

FE"&&,#G0(9"+H +E"&&,#G I:E-*&$0*:E9/0&**<& I:E-*&$09:$*0&**<& -..0G*#"$/7*&J0R079:#$&0*:+%0L012KK

KM1

A2?AS N*9*+$0O0!P9$,79/ N**<0G*EC,#:$,"#0 F!Q0?0&+E**#,#G !P9$,79,+:$,"# /...0&**<9,#G& KMA

A4?AR 6*#"$/7*0O08E*<,+$ QT.0 6UN 6N0C"<*9 -/..0&**<9,#G&

AS?RW 8%*#"$/7*V 2KK0FA0=F9"#:9>0N*9*+$*<

23KK0FA0=F9"#:9>0#"#?&*9*+$*<

4KK0F40=8YZ>0?0209"+:$,"#&

2KK0F20:#<0F30=.YZ>0309"+:$,"#&

20..0+9"#*& 31... 4

V0[\7*E,C*#$:90<*&,G#&0&P()*+$0$"0DPE$%*E0<,&+P&,"#0"D0"7$,C:90C*$%"<&0$"0*-:9P:$,"#0:99*9*07*ED"EC:#+*0P#<*E0G*#"C,+0&*9*+$,"#

!"#$%&'()*+,-&!"#$%& '()*+$,-* .+$,-,$/01 .+$,-,$/02 .+$,-,$/03 405*#"$/6*& 40789#$& :9#;0<%9=

>?1>@*+"A(,#*0<B,C&$0

+/+8*=DC"&&,#E0(8"+F +C"&&,#E G9C-*&$0&**;&

.//)E*#"$/6*&H02I0689#$&0*9+%0J0>III

IKL

13?2> M*8*+$0N0!O8$,68/ M**;0E*CA,#9$,"#0D!P09#;0DQMP0

&+C**#,#E!O8$,68,+9$,"# 0///)&**;8,#E& IKR

1R?2R 5*#"$/6*0N07C*;,+$ PS.0 5QM 5M0A";*8 1///0&**;8,#E&

2>?3R 7%*#"$/6*T2II0D10<D8"#98=0

M*8*+$*;

2UII0D10<D8"#98=0

#"#?&*8*+$*;1///0+8"#*& 23/// 1

2U?3U@*+"A(,#*0<&*+"#;0

+/+8*=DC"&&,#E0(8"+F +C"&&,#E G9C-*&$0&**;&

.//0E*#"$/6*&H0R0689#$&0*9+%0J012II

IK1

3V?>U M*8*+$0N0!O8$,68/ M**;0E*CA,#9$,"#0D!P09#;0DQMP0

&+C**#,#E!O8$,68,+9$,"# 2////)&**;8,#E& 1

>I?LI 5*#"$/6*0N07C*;,+$ PS.0 5QM 5M0A";*8 4///0&**;8,#E&

>U?RI 7%*#"$/6*T2II0D20<D8"#98=0

M*8*+$*;

2UII0D20<D8"#98=0

#"#?&*8*+$*;

>II0D10<7WX=0?020

8"+9$,"#&15//0+8"#*& 15/// 2

L2?R2@*+"A(,#*0<$%,C;0

+/+8*=DC"&&,#E0(8"+F +C"&&,#E G9C-*&$0&**;&

.//0E*#"$/6*&H0R0689#$&0*9+%0J012II

IK1

T0YZ6*C,A*#$980;*&,E#&0&O()*+$0$"0BOC$%*C0;,&+O&,"#0"B0"6$,A980A*$%";&0$"0*-98O9$,"#0988*8*06*CB"CA9#+*0O#;*C0E*#"A,+0&*8*+$,"#

!"#$%&'()$*)+",-&.)$/*$)+",-0 !"#$%"&'&(&)"%$*+&,%-.

&/01&'&23&4&5&6,)+ &701&'&23&4&8&6,)+9$+,",-*,:&'&&(&)"%$*+&,%-.

40

Page 44: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

Genotyping by Sequencing Strategies and Costs: The Institute for Genomic Diversity at Cornell University provides GBS on a fee-for-service basis. The IGD has a LIMS system for sample tracking by bar codes, robotics, access to the facilities of the Life Sciences Core Laboratories Center (http://cores.lifesciences.cornell.edu/brcinfo/), and a staff dedicated to providing the GBS service. The IGD is integrated with the Computational Biology Service Unit at Cornell, where the bioinformatic tools for SNP calling are installed and maintained. Charlotte Acharya is a Research Support Specialist II at the IGD, and will devote half of her time to this project.

To reduce costs and maximize use of the data, it is highly desirable to develop an accurate method for imputing missing data. Our intended strategy is to sequence at greater depth and coverage in the parents of each breeding cycle (~190+ clones), and to sequence at lower coverage in the progeny (2500 – 5000 clones). This strategy has been advocated in the literature (Habier et al. 2009). Using the genotypes of the parents and grandparents will allow us to impute the larger number of markers in the progeny. A plan for parent and progeny numbers, and how this strategy will favorably affect costs, is given in the table below.

year #High #Low US$ #High #Low US$ #High #Low US$ #Low US$ cost/yr DNAs/yr1 2500 30000 30000 2500

40 1520 1520 402 2500 30000 3072 36864 3072 36864 103728 8644

240 9120 9120 2403 2500 30000 240 9120 240 9120 9984 119808 168048 12964

240 9120 9120 2404 2500 30000 4992 59904 4992 59904 9984 119808 269616 22468

240 9120 9120 2405 2500 30000 240 9120 240 9120 9984 119808 168048 12964

240 9120 9120 240Total 188000 115008 115008 359424 777440 60540

#High = # clones sequenced at 48-plex, ie, higher depth

#Low = # clones sequenced at 384-plex, ie, lower depth

Note: Parents will be sequenced at higher depth than progeny, and missing data in the progeny will be imputed.

IITA GS NRCRI GS NaCRRI GS Pre-Breeding 2 locs

The problem of imputation has received a lot of attention both in the human genetics and animal genetics communities, because both must deal with the problem of phasing in outcrossed pedigrees and populations. Over the course of this project, we will build a haplotype library, using software such as AlphaImpute (Hickey et al. 2009), a software package for imputing and phasing genotype data. The program uses segregation analysis and haplotype library imputation to impute alleles and genotypes. Development of a haplotype library is an activity of Objective 7 (see page 10 of the main text).

The process of haplotype inference and imputation will be greatly facilitated by the construction of a genetic map from GBS data. We are currently in the process of doing this, using a biparental mapping population provided by Ismail Rabbi at IITA. Collecting GBS data on the pseudo-F2 progeny of this cross will result in ordering of many of the largest scaffolds of the reference genome, i.e., any scaffold that has a SNP segregating in the mapping population. This is important because AlphaImpute, for example, works only for a single chromosome at a time. The ability to concatenate scaffolds into chromosomes will allow haplotype inference across adjacent smaller scaffolds.

Habier D, RL Fernando, JCM Dekkers. 2009. Genomic Selection Using Low-Density Marker Panels. Genetics 182:343-353.

Hickey JM, BP Kinghorn and JHJ van der Werf. 2009. Long range phasing and haplotype imputation for improved genomic selection calibrations. Statistical Genetics of Livestock for the Post-Genomic Era. University of Wisconsin - Madison, USA, May 4-6, 2009.

41

Page 45: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

Training Populations: For all three breeding programs, training population design is already under way. Plans for these training populations have been developed during skype and in-person meetings with the breeders.

IITA: As reported in Objective 2 of the Project Description (page 6), genotyping has been done for 623 clones and prediction models have already been developed. In addition, this collection will be augmented with ~700 clones from advanced yield trials and landraces, to achieve a training population of over 1300 clones. This updating of the training population is being done with funds from the pilot project grant.

NaCRRI: A training population of 528 clones will be assembled from clones that are in the field at Namulonge this season:

1) 100 F1 clones from crosses of 4 CMD resistant lines with Namikonga, which has CBSD tolerance.

2) 300 F1 clones from 24 crosses (23 parents) combining CMD resistance, CBSD resistance, and beta-carotene content. Yellow cassavas from IITA and CIAT were crossed with white cassavas with disease resistance.

3) 128 clones in preliminary yield trials. 34 parents are represented in this set, including lines from CIAT that have M. flabellifolia in their pedigree.

NRCRI: A training population of 1056 clones will be assembled from clones that are in the field at Umudike this season:

1) 14 F1 UYT clones for which a 2 year phenotype data from 3 locations are available

2) 35 F1 PYT clones with 2 year data from Umudike

3) 35 F1 CIAT clones with high protein, drought as special traits and some are second generation backcross derivates from wild relatives 2 years data, 2 locations

4) 24 F1 IITA advanced clones with beta carotene with 2 year data for Umudike

5) 45 F1 IITA advanced clones with beta carotene with 2 year data for Umudike

6) 21 F1 clones with high dry matter with 2 year data at Umudike

7) 20 F1 clones with good food quality characteristics with 2 year data at Umudike

8) 33 F1 early bulking clones with 3 year data from 3 locations

9) 11 F1 UYT clones with 2 year data from Umudike

10) 38 F1 UYT clones from CIAT accessions 2 year data at Umudike

11) 80 F1 PYT clones with 2 year data at Umudike

12) 150 S1 clones with high beta carotene 3 year data 2 locations

13) 550 F1 clones in Clonal Evaluation Trial (2 replications) to be harvested in May 2012 at Umudike

42

Page 46: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

PhD students trained: Training of seven PhD students has been budgeted for this objective: one each at NaCRRI, NRCRI, and IITA, and four at Cornell. Three of the Cornell students will do their research in Africa; the fourth Cornell student will do a statistics-based project and be based at Cornell (see Appendix D for details).

Objective 3. Database Developed and Curated

Name Institution Email Lukas Mueller BTI/Cornell University, Ithaca, USA [email protected] Naama Menda BTI/Cornell University, Ithaca, USA [email protected] Moshood Bakare IITA/Ibadan, Nigeria [email protected] Ezenwanyi Uba NRCRI, Umudike, Nigeria

Objective 4. Germplasm Developed.

Name Institution Email Jean-Luc Jannink USDA/ARS/Cornell Univ , Ithaca, USA [email protected] Chiedozie Egesi NRCRI, Umudike, Nigeria [email protected] Robert Kawuki NaCRRI, Namulonge, Uganda [email protected]

Prebreeding / back-through-the-bottleneck research methods rationale.

Assumptions:

• Favorable alleles from Latin American germplasm will be rare. There may be a few common ones, but for the most part, the common ones would have made it through the bottleneck. So we want to optimize our efforts to finding rare ones.

• Favorable alleles will arise at loci where an allelic series exists: causal variation at the locus will not necessarily be rare, though rare alleles will extend the range of effects of the series. This assumption is more subject to debate. We should consider scenarios where it holds or not.

• Phenotyping and the ability to generate progeny from a specific mating design will be limiting rather than genotyping capacity.

Because the favorable exotic alleles will be rare, we can assume they will segregate in only one or two crosses. We will therefore want to be able to estimate their effects within a single family, so the total progeny number of each exotic (that is, the sum of family sizes across the different elites to which an exotic is crossed) should not be too small. We plan to evaluate 50 exotics per year per program and generate 100 progeny per exotic. The same 50 elites can be used each year. Different exotics will be used each year. This design will simultaneously allow exploration of the allelic effects of the elite population while also introgressing and characterizing exotic alleles. In Year 4, some fraction of the families will be made with progeny generated from the Set 1 crosses to create progeny with ¾ elite genetic background but carrying forward favorable alleles from exotic parents. These progeny will be made available to breeding programs across sub-Saharan Africa as parental materials to increase their diversity and agronomic performance. Over the course of the project, we will characterize three sets of 50 exotics. For each panel, clones will be propagated to obtain a sufficient number of plants for crossing, genotyped at high density, then crossed to selected elite cassava clones. Genomic selection can be applied to introgression problems using either within-family or across-family approaches (Lorenz et al. 2011). For the latter to work, the populations from which exotic alleles are derived cannot be too divergent from elite populations. We anticipate

43

Page 47: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

that our exotic sources will be quite divergent, given the bottleneck in introducing cassava to Africa and the long time period since introduction. We plan initially to use within-family models that, as the name suggests, develop many prediction models on the basis of small training sets provided by each family (Bernardo 2009). Using cross-validation, however, we will also evaluate cross-family predictive ability to determine if greater accuracy might be obtained from a single, large, multi-family training population.

Crossing Genotyping Propagation Phenotyping

Year 1 Set 1 (50) Set 1

Year 2 Progeny 1 (5000) Set 2 (50); Progeny 1 Set 2

Year 3 Progeny 2 (5000) Set 3 (50); Progeny 2 Set 3; Progeny 1

Year 4 Progeny 3 (5000) Progeny 3 Progeny 2 Progeny 1

Year 5 Progeny 3 Progeny 2

Crossing blocks: 50 plants/exotic, 50 plants/elite: 5000 plants = 0.5 hectare Phenotyping: From each cross, 5000 progeny x 5 plants: 25000 plants: 2 hectares This plan will be implemented in parallel at NaCRRI and at IITA, using elite germplasm specific to each region

PhD students trained: One student will be trained at NaCRRI.

Bernardo R. 2009. Genomewide Selection for Rapid Introgression of Exotic Germplasm in Maize. Crop Science 49:419-425.

Lorenz AJ, S Chao, FG Asoro, EL Heffner, T Hayashi, H Iwata, KP Smith, ME Sorrells, J-L Jannink. 2011. Genomic Selection in Plant Breeding: Knowledge and Prospects, in: D. L. Sparks (Ed.), Advances in Agronomy, Academic Press, San Diego, CA USA. pp. 77-123.

Objective 5. Infrastructure Developed and Plant Breeders Trained

Name Institution Email Yona Baguma NaCRRI, Namulonge, Uganda [email protected] Dr.Richard Edema

Makerere University, Kampala UG

[email protected]

Dr. Baguma will lead the infrastructure development component.

Dr. Edema will lead the component on training plant breeders. He has vast experience in managing similar capacity building initiatives at Makerere University. He is handling all the donor liaison, grant writing, administration, financial accounting, and reporting. In addition, he is implementing and expanding the African capacity building vision, teaching, and guiding the students with insight, wisdom and caring. He will be assisted by Prof. Paul Gibson.

A. Strengthening regional human resources for cassava research

i. MSc graduate training program A total of 8 MSc will be trained from the East and Central Africa sub-region, including Southern Sudan, Uganda, Tanzania, Kenya and Malawi, which suffers from very low manpower

44

Page 48: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

deployment to cassava research. The training will be conducted at Makerere University (Uganda) where a regional training platform in Plant Breeding is taking shape. The Regional Universities Forum for Capacity Building in Agriculture (RUFORUM) is a member-based organization of 29 universities in 13 countries of Eastern and Southern Africa whose goal is to strengthen graduate training and agricultural related research in the region. Specifically, RUFORUM is involved in generating MSc and PhD graduates that are responsive to stakeholder needs and national and regional development goals. With support from RUFORUM and Alliance for a Green Revolution in Africa (AGRA), Makerere University has designed two regional graduate training programs, a Master of Science in Plant Breeding and Seed Systems and a Doctor of Philosophy in Plant Breeding and Biotechnology. Both these programs involve intense structured coursework followed by thesis research. Already the course menu has proved effective in providing for core competencies, supporting disciplines, and additional skills (such as project planning, social research methods and “soft-skills” for the first cohort) —competencies required for the contemporary African plant breeder. Both theory and “hands-on” application are emphasized (see Appendix E). The MSc students at NaCRRI will be enrolled in the Plant Breeding and Seed Systems program at Makerere University.

The RUFORUM initiated regional MSc in Plant Breeding and Seed Systems at Makerere University is providing excellent theoretical and practical training of students from several countries involving thesis research projects on a number of crops. Placing MSc trainees from the cassava initiative is a natural extension of that program, requiring only some increase in staffing to provide adequate program coordination and supervision and mentoring of the increased numbers of students. With the close proximity of the academic training to the CRCoE, and the already strong cooperation that is present, the students will benefit from the ongoing, active involvement with the CRCoE. The overall cassava enhancement effort will benefit from the students pursuing research on topics given high priority by the CRCoE. Needless to say, meaningful MSc thesis projects on cassava breeding that can be completed within a research period of just over one year must be closely coordinated with ongoing breeding efforts, and the CRCoE can provide such projects and coordination while involving the students in training activities broader than the scope of the individual’s thesis research project. After completion of the MSc, these graduates will greatly strengthening the cassava improvement effort in the various countries, being already strongly integrated with the CRCoE and with region-wide cassava improvement efforts. The strengthening of the Regional MSc program through additional staff and resources provided by this project will greatly enhance the ongoing training activities that will continue to contribute to capacity development for cassava improvement as well as general capacity development in crop improvement. Existing official advertising avenues of Makerere University and collaborating NARS will be used to search for potential candidates. AGRA grantee networks and the AWARD program will also be used to recruit appropriate students and to boost the number of women participants to at least 40%. All potential candidates will have to be supported by their parent institutions and undergo written and oral interviews that have now been developed by the Department of Agricultural production for routine selection of students. The student research work will be embedded in on-going breeding work at either Makerere University or the National Crop Resources Research Institute at Namulonge (NaCRRI). This collaborative arrangement will ensure that students are trained at high academic and practical levels with good grounding in hands-on field and laboratory breeding procedures. The student research theses will be designed to help answer critical areas in cassava in the respective countries. Excellent hosting institutions include the NaCRRI-based centre of excellence for cassava research (CRCoE).

45

Page 49: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

B. Strengthening physical capacity for cassava research

ii. Establish a nutrient profiling facility To fully understand the nutritional value of different crops, there is need to establish capacity for comprehensive nutritional profiling of key crop staple commodities, including cassava. The facility will have the capacity for routine laboratory analyses (i.e., carbohydrates, lipids, proteins) as well as for other specialized biochemical analyses, providing nutrient profiles and metabolite levels using cutting-edge, high-throughput analytical methods. In addition, the facility will function as a training center for different groups of people including academia, food processers and consumers. The facility will also be able to provide a number of services ranging from product development, product quality testing and protocol development for various industries.

iii. Genetic conservation Uganda plays a pivotal role in cassava germplasm development and dissemination in the East African region, enabling quick delivery of high yielding CMD resistant materials. Exchange of cassava germplasm through tissue culture has also recently been commissioned with the International Centre for Tropical Agriculture (CIAT). However, in spite of threats from pests, diseases, and natural calamities, limited efforts have been directed towards cassava conservation. The impact of the current diseases epidemics on cassava genetic resources is a cause for serious concern in Uganda and surrounding countries in the great lakes region. Loss of genetic resources and the inability to appropriately diagnose plant pathogens are factors that could seriously affect food and nutrition security. Following the CMD epidemics that significantly reduced the cassava genetic base in Uganda, efforts were immediately made by the NARO in the early 1990’s to conserve cassava germplasm. However, because of the high CMD pressure at Namulonge, most of these collections were lost; currently only 250 landraces are available in the field gene bank. Field-based conservation presents major drawbacks, which limit its efficacy and sustainability (Withers and Engels, 1990). Of the alternatives, in vitro conservation methods are of highest priority. Various in vitro conservation methods have been developed and used. The encapsulation protocol was developed at CIAT, where over 6000 cassava entries from Latin America, Asia, and from Nigeria are kept (http://www.ciat.cgiar.org/biotechnology/crops_cassava.htm). Briefly, this procedure entails the following steps: encapsulation of ex-plants in alginate beads, followed by growth in a sucrose medium for several days, followed by partial desiccation, and then rapid freezing. This protocol ensures high survival rates and growth recovery, and involves no callus formation (Engelmann, 1997). Moreover, it allows cassava to be conserved for 30 years or more with no maintenance other than periodic monitoring (www.ciat.cgiar.org). In this project, we seek support for long-term cryopreservation of the cassava germplasm in the sub-region using the encapsulation dehydration protocol developed by CIAT.

This conservation initiative will also help support the variety improvement program in the region. As a regional leader for cassava research, NaCRRI will continue to develop varieties that can be disseminated to neighboring countries; these plants need to be virus-free. Even plants obtained by meristem tip culture or thermotherapy may not be virus-free. Therefore, such plants have to be tested (indexed) for the presence of cassava viruses before sharing with other countries. The project will establish a facility that will be used for virus indexing by symptomatology, biological indexing using indicator plants, as well as highly sensitive, rapid and precise serological and nucleic acid based assays. This will be a significant contribution to the access and exchange of germplasm among breeders in the region.

46

Page 50: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

C. Enhancing regional and global research connections

Strengthening regional research capacity through visiting scientist schemes The primary purpose of the visiting scientist program is to transfer knowledge and skills from the guest researchers to our scientists and students in the PhD program. Therefore, we propose to:

a. Establish good research facilities for full exploitation of the expertise of visiting scientists b. Determine the key thematic areas and prioritize them based on the most needed expertise c. Develop selection criteria for the visit, including, but not limited to: track record of the applicant, relevance of the applicant’s experience to our situation. Clearly articulate the purpose, objective, activities, results, deliverables, and timeframe, and methods for assessing the effectiveness/efficiency/impact of the visit on our research and PhD program. The emphasis will be on passing knowledge and skills to local scientists and ensuring maintenance of such knowledge after the visiting scientist has gone back to the home institution. For the PhD program, we will tap the expertise for provision of key foundational courses. d. Stimulate NARO to develop a functional visiting scientist policy and mechanisms for quality control and insurance

D. Strengthening NaCRRI communications and leadership capabilities

i. Establishing a broader bandwidth internet connectivity and e-library service We propose to link stakeholders in the cassava sub-sector via internet connectivity. The connectivity will allow open immediate interactive discussions amongst producers, extension service providers, researchers and interested commercial groups on pertinent issues to the sub-sector. We shall tap the experiences of other groups in establishing the internet list server system. This proposal is committed to anticipate and facilitate new communication opportunities that will advance the cassava sector. In addition, relevant books/online journals will be identified and subscribed too to support R&D in the cassava sector. ii. Enhancing leadership research capabilities This scheme will be based on the experience of AWARD with modification. The design will include peer mentoring where people who are of the same level from different institutions act as mentors (ees) to each other, while providing regular feedback to a senior mentor to the duo. Managerial and leadership skills will be developed. Visits to demonstrated centers of excellence in cassava research will be arranged i.e., Vietnam, Embrapa, CIAT, IITA,

Relationship between this project and the Cassava Regional Centre of Excellence (CRCoE) The East African Agricultural Productivity Programme (EAAPP) that ultimately resulted into the establishment of Regional Centres of Excellence (RCoE) was conceived for the purpose of operationalizing the Comprehensive African Agricultural Development Programme (CAADP) of the African Union. It took advantage of existing capacity in the region: Uganda (for cassava); Tanzania (rice); Kenya (diary); and Ethiopia (wheat). RCoE premises on the rationale of combining the scarce human and financial capitals that are scattered across the sub-region to solve common problems across the region; this approach is expected to help create economies of scale, avoid wasteful replication of efforts and maximize spillovers across the sub-region. The CRCoE that is based at NaCRRI, Uganda, is involved in the implementation of four components. First component is the establishment of CRCoE. This component will entail infrastructure improvement (civil works for rehabilitation of key facilities and procurement of laboratory and farm equipment), and human resource capacity building. Second component, technology generation, training and dissemination; this component will support research activities

47

Page 51: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

developed within the CRCoE and the RCoEs. Third component is improved availability of planting materials, seeds and livestock germplasm. This component will support multiplication of planting materials, seeds and breeds, strengthen the enabling environment for regional exchange and trade in seeds and breeds, and improve the capacity of seed and breed producers and traders. Fourth is project coordination and management. This component will finance management and coordination of the Project at the national and regional levels.

The expected outputs of this project shall include- 1. Human capacity in selected complementary fields of agricultural research (i.e., breeding,

pest and disease management) developed to achieve set goals in cassava research for development;

2. Physical capacity developed as follows; existing facilities upgraded, nutrient profiling, high throughput DNA extraction capacity, genetic conservation and indexing facility completed

3. Increased access to global cassava germplasm and information, and its sharing for the benefit of the region;

4. Enhanced national, regional and global agricultural research capacity through strategic research connections to develop and deploy improved technologies of cassava in partnership with key actors; and

5. Strengthened information management, communication and leadership capabilities at NaCRRI to promote data sharing and development of a new cadre of research leaders

Objective 6: Biotechnology/Biosafety Education and Awareness

Name Institution Email Barbara Mugwanya Zawedde

NaCRRI Namulonge, Uganda

[email protected]

The goal of this objective is the dissemination of factually accurate information about biotechnology and the establishment of a multi-stakeholder forum to openly discuss the priorities, benefits and risks of modern biotechnology in the national interest of Uganda. The objective will also provide a biotech resource hub for biotechnology policy development and analysis in areas that the technology will impact.

The important stakeholders in modern biotechnology are technology developers, scientists, teachers, farmers, non-governmental and civil society organizations, policy makers and administrators, the media, judiciary, consumer groups, the medical community and public at large. Thus, this objective will develop “modules” explaining the prospects of biotechnology as a crucial component of development, targeted at these groups of stakeholders.

Strategies for creating biotechnology awareness in different stakeholder groups:

1) For policy makers and administrators, periodic (once in four months) policy seminar trainings on topical issues directly related to topical issues and also emerging issues in biotechnology. They will be supplied with latest fact sheets, and authentic and reliable source of publications to keep them abreast of global and national issues surrounding biotechnology. Trainings will be organized for policy development and analysis.

2) For the scientific community, a four-monthly seminar or workshop will be conducted to sensitize them about the social, economic, ethical and commercial issues surrounding biotechnology. Interactive sessions with social scientists, legal experts and scholars and industry people to improve awareness about development and commercialization of

48

Page 52: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

biotechnology will be organized. Most importantly, scientific workshops, seminars and symposia will be organized to keep scientists abreast of latest developments in the field of molecular biology and biotechnology. Scientific conferences will also be organized either on a semi-annual or annual basis.

3) The center will organize workshops to inform the media fraternity about the latest developments in field of biotechnology and also the priorities of the Government. In order for them to better understand and appreciate biotechnology, media will be supplied with fact sheets and given periodic briefings on topical issues and emerging issues in biotechnology and visits to laboratories, fields and industries. Regular feature articles, news briefings, and press releases will be issued by the center for broadcast to the general public. 4) The medical and health community will be regularly consulted for development of printed materials and videos about health-related issues surrounding biotechnology. The center will organize annual exhibitions and workshops to create general awareness about biotechnology so that they can in turn communicate the risks and benefits of biotechnology to the general public through their own awareness and outreach programs. 5) The farmers and growers will be the top most stakeholders for the center. They will be regularly hosted at the center and other facilities to demonstrate how biotechnology works for them. Village level and farm level meetings will be organized to gather their opinions on various issues and problems that confront them in their agricultural operations and to explain how biotechnology might help them. There will be a quarterly interactive sessions with them to gather input in biotechnology policy and decision-making. 6) General consumers will be the second most important group of stakeholders that will be targeted under this initiative. They will be regularly polled to solicit their views and opinions for policy and program development, and there will be regular interactive sessions among the key stakeholder groups: farmers, growers and consumers. A regular channel of communication will be established so that these groups can provide feedback from the general public to policy makers and scientists for the development of appropriate biotechnology.

7) We will organize regular public lectures by local and visiting scientists and biotechnology experts to educate the public about current and emerging issues of biotechnology dealing with both scientific and societal aspects. We will also publish brochures, pamphlets and pocket guides to provide basic facts on biotechnology. In addition, we will organize annual biotechnology exhibitions, touring exhibits and videos and film shows to inform the public about all aspects of biotechnology. We will partner with local television stations to broadcast films and videos about biotechnology. For completeness of these interventions, we will provide a platform for interaction with the anti-biotechnology groups at scheduled times to have an opportunity to listen to their concerns.

The resource information hub will be a repository of world wide information of biotechnology and will disseminate all the information via an actively updated web site, publication of electronic news letters and bulletins and also printed materials in the form of brochures, pocket guides, reports, reviews and books. The hub will also bring out white papers on topical issues from to time to inform all the stakeholders. We will produce multi-media presentation materials on biotechnology matters and topics. It will be a one-stop resource center for anybody in the country to gather authentic information of biotechnology. Efforts will also be made for the hub to serve the East African region, as non is currently in the region. In addition, the hub will offer timely and topical courses to educate different stakeholders on biotechnology affairs and create general awareness.

49

Page 53: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  C:  Supporting  Documentation  

The hub will serve as a center for periodic public debates and discourses that will need critical stakeholder input into policy development. The centre will also host researchers who can carry out researches into risk assessment, needs basement, holistic technology assessment, and socio-economic impact analysis and issues in bioethics.

The center will establish national and international collaborative linkages to foster a climate for high-level intellectual discourses on the impacts of biotechnology on the society and nations economy. Furthermore, the center will host national and international workshops, symposia, seminars, conferences and meetings in overall support of the entire biotechnology capacity building project. It will be established by NARO under the guidance of an experienced consulting firm to be identified later. The center will be staffed by three professional people including a PhD level scientist.

Capacity development in information and communication technology, and biopolicy.

A PhD graduate student will be recruited to undertake training in (i) information packaging and communication (ii) information retrieval, database management and privacy codes (iii) website design, launching and operation (iv) e-conferencing (v) regulatory biopolicy and (vi) basic bioinformatics. In addition, one technician will be trained in general electronics and equipment mechanics at advanced diploma level for routine maintenance and advice on laboratory resources.

A successful PhD candidate will have a master’s level education in any of the following areas: sociology or economics or law or biological or agricultural sciences and any combination thereof.

The candidate will work under the joint supervision of a biotechnologist and the Head of the Center to develop protocols for carrying out attitudinal survey of the public opinion and perception on various issues surrounding agricultural biotechnology explore ways and means of effective communication, collaborate with leading international schools working on the policy biotechnology policy research (Newell-McGloughlin, 2004). The student’s work is expected to serve as an authoritative guide for biotechnology policy implementation and analysis for Uganda. Some of the policy areas that the student is expected to work on are gathering empirical evidence for setting agricultural needs, options for problem solving, exploring the most cost effective technological interventions for problem solving, and explore the policy and institutional framework for public-private sector participation in biotechnology development for Uganda. The student is also expected to assist in the operation of the Center.

The student will take at least six courses in the areas of biotechnology affairs either in Uganda or outside Uganda, and carry out all research within the country. The student is expected to attend national and international workshops, symposia and conferences in the subject areas relevant to his dissertation topic, and to graduate within a maximum of four years.

Newell-McGloughlin, M. 2004. Risk assessment and public policy issues. In: Handbook of Plant Biotechnology. Edited by Paul Christou and Harry Klee. John Wiley and Sons Ltd

Objective 7. Enhancement of Genomic Selection through Cassava Genomics

Name Institution Email Simon Prochnik Joint Genome Institute (DOE)

GeGenomeBerkeleyUniversity [email protected]

Daniel Rohksar Univ California, Berkeley [email protected]

50

Page 54: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  D:  PhD  Student  Project  Proposals  

Appendix D PhD Student Project Proposals

Student 1 Registered at Cornell Leveraging whole genome sequence to improve genomic prediction Genomic selection currently uses polymorphisms without reference to their possible genetic function or their genetic context. Because polymorphisms obtained from GBS data are sequence-referenced, however, they are easily combined with whole genome sequence (WGS) data that BMGF has invested in. This combination provides the context of the GBS polymorphism first in terms of the identities of neighboring SNPs and second in terms of the putative functions of these SNPs. The student at Cornell will collaborate with partners at the Joint Genome Institute and the University of California, Berkley to develop methods to leverage this information and test its value. The resources available to cassava make it an exciting test case. Research in Objective 7 will use these resources to test two approaches that may improve genomic predictions using the WGS. Student 2 Registered at Cornell, fieldwork carried out at IITA Assessment of the feasibility of a pan-African training population to predict cassava performance This student would specialize in computational biology and bioinformatics as it relates to genomic selection application in Africa. The specific research objectives would be developed in collaboration between Cornell Scientists and partners in Africa. This project would combine molecular biology, statistics/computational biology and field plant breeding. The extent of the relatedness of cassava populations used in East and West Africa is unknown. This variable will affect the extent to which training populations developed in Nigeria may accurately predict performance in Uganda of traits that are important in both regions. The student's emphasis may also include applied selection theory for selecting on multiple traits such as yield, dry matter content disease resistance and nutritional quality traits that would lead to development of selection index options for application of genomic selection in cassava. Student 3 Registered at WACCI, fieldwork carried out at IITA Gains from genomic selection using a one-year breeding cycle This student would focus on application of accelerated breeding methods in cassava through use of genomic selection. The IITA training population and results from two or three cycles of genomic selection on an annual cycle will be evaluated in multiple environments in Nigeria and perhaps also Ghana. This will demonstrate the proof of concept of genomic selection of cassava in Africa. The student will apply genomic selection tools developed in the project and will integrate the three objectives of genomic selection application, database development and fast cassava technologies. This project will test cutting edge technologies but will also be strongly grounded in applied field breeding with the intention to develop potential improved cassava varieties in a short period of time. Student 4 Registered at Cornell, fieldwork carried out at NRCRI Exploiting allelic variation in South American M. esculenta to improve African cassava Because cassava was domesticated in South America there is greater genetic diversity there than in Africa. To benefit from this diversity, we must go back through the bottleneck that occurred when cassava was brought to Africa. The student will make crosses between African and South American varieties to explore the allelic variability there. The research will identify alleles not present in Africa but that improve performance there for agronomic traits and disease resistances. The student will implement GBS to demonstrate the usefulness of the tool to pyramid desirable alleles in new populations as a pre-breeding strategy.

51

Page 55: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  D:  PhD  Student  Project  Proposals  

Student 5 Registered at WACCI, fieldwork carried out at NRCRI Genomic selection of cassava populations for early productivity and tolerance to biotic stresses The student will develop progenies from a training population (TP) using the 2-year cycle and will proceed to work with traits that affect early productivity and diseases and pests. The TP will also include materials from CIAT and therefore will be distinct from the training populations of IITA or NaCRRI. Student will implement GBS and use it to demonstrate the usefulness of GS using both cross validation and response to one actual breeding cycle. Student 6 Registered at Cornell, fieldwork carried out at NaCRRI Utilization of cassava wild relatives for genetic improvement in Uganda This thesis research will focus on introgression of alleles from cassava wild relatives and / or M. esculenta exotic germplasm sourced from Latin America into locally adapted, but virus sensitive cassava genotypes from Uganda. Segregation of alleles in the derived inter-specific and/or intra-specific hybrids will be monitored to identify loci undergoing segregation distortion. Families of exotic parents will be phenotyped for agronomic performance and virus resistance traits to identify favorable alleles derived from cassava wild relatives. This work will involve both genotyping that will be conducted at Cornell University and virus phenotyping that will be undertaken at NaCRRI, Uganda. Student 7 Registered at WACCI, fieldwork carried out at NaCRRI Genomic predictions in cassava populations developed by the Uganda breeding program Uganda has just established a cassava training population that contains ~ 230 cassava clones. These clones were constituted from small full-sib families (2-5 individuals/cross combination) that had been evaluated (at seedling stage) for virus resistance and other key agronomic traits. This germplasm set will be used to generate a genomic prediction model, which will have to be validated using the same set of clones. This will constitute the PhD study of the student.

52

Page 56: Next Generation Cassava Breeding · Next Generation Cassava Breeding Proposal July 6, 2012 Submitted to the Bill & Melinda Gates Foundation ... Objective 2: Tools for Genomic Selection

Cornell  University  /  Next  Generation  Cassava  Breeding  /  Appendix  E:  Description  of  MSc  Trainiing  Plan  

Appendix E Description of MSc Training Plan (February 2014-July 2017)

Phase 1: Practical Attachment to the Cassava Center of Excellence (February-July 2014) Purpose:

1. To ensure that each student becomes knowledgeable in the botanical and agronomic characteristics of cassava, 2. To initiate crossing between exotic (Latin American) germplasm and germplasm from their home country.

Phase 2: MSc coursework (August 2014–June 2015, 2 semesters) Includes plant breeding, genetics, statistics, quantitative genetics, molecular techniques for plant breeding, tissue culture, biosafety and biopolicy, social research methods, agricultural business and marketing, program and project planning, English for Scientific Communication. Students will also establish seedlings from their crossing, and begin data collection. Thesis research proposals will be prepared during the 1st semester, and presented (defended) for departmental approval about January 2015 Phase 3: Research Period (July 2015–July 2017) The research activity will contribute to the Next Generation Cassava Breeding Project, Objective 4 (Pre-breeding). Students will evaluate genotypes produced by their crossing of exotic germplasm from S. America with germplasm from their home country. The students will evaluate these materials at NaCCRI and at an additional site in their home country. The NaCCRI site will be the primary site, whereby close supervision and all support necessary will be provided to ensure quality data is obtained. The home country site will provide evaluation of the material under local conditions, opportunity for farmer participatory assessment of promising genotypes, and allow opportunity for initial increase of materials that show potential for release and distribution. The research period includes preparation and submission of the students' theses, and of scientific publications in international refereed journals. A departmental mid-program review of student's research progress will be conducted about January 2016

 

53