Download - Big data ciat april_2014_dj_et_slideshare
BIG DATA: BIG DATA ANALYSIS: is it a solution to understand big problems?
Rice program (Agrobioversity) & Big Data expert group (DAPA)
computational models are tailored to the analysis of the data rather than data to a particular
methodology, as researchers have done for over a century
Applying the principles of Big Data to research in agriculture
• Big Data refers to things that one can do at a large scale that cannot be done at a smaller one to extract new insights
• Sometimes to inform is better than explain – Looking for patterns or associations
• Approaching “N=All”
• Adding value to secondary databases
Big Data (Foreign Affairs magazine / McKinsey's High Tech)… Cukier and Mayer-Schönberger (2013)
computational models are tailored to the analysis of the data rather than data to a particular
methodology, as researchers have done for over a century
How?
• Including the use of ICTs to collect (androids app), analyze (traditional and machine learning techniques), share (in a way that facilitates the decision making at different levels and for different users)
• Analytical approaches tailored to the analysis of the data rather than data to a particular methodology, as researchers have done for over a century
• Development of tools as part of a close dialogue with end-users
How?
+ + =
Climate Soil Crop management productivity/ha (including varieties)
% ? + % ? + %? = To Explain (100 %)
Maximizing productivity in agricultural systems. Working with secondary databases
• To Identify the combination of factors that lead to high and low productivities (empirical approaches – machine learning)
• Within the framework “Convenio MADR-CIAT” climate change project – Adaptation strategy
19901991
19921993
19941995
19961997
19981999
20002001
20022003
20042005
20062007
20082009
20102011
20120
500
1000
1500
2000
2500
3000
3500
0.0
1.0
2.0
3.0
4.0
5.0
6.0Trends on Rice Production, Harvested Area and Yield in Colombia, 1990-2012
Area Production Yield
Thou
sand
s to
ns o
r has
Tn/h
a
The problem: In Colombia, since 2009 there is a significant reduction on the yields at the farm level
Source USDA-PSD
And what are the causes for this yield reduction?
We can see similar problems in Central America, Ecuador, Peru and Venezuela. Reductions on yield that are causing heavy losses to the rice farmersNot a single factor is involved: Drought, high minimum temperatures, low light, high humidity, bacteria, mites , fungus , lack of adaptation etc.
low yields are caused by Burkholderia glumae!
Misdiagnosis, wrong treatments and excessive pesticides applications causing others problems (Hoja Blanca)
Non ecoefficient
And to worsen the problem the farmers wants a “magical cure
Reducing stress because of lack of water. Water Harvest
Better agronomyKey points, Crop Rotation and Regulations
Improved CultivarsIncreasing Yield PotentialProtecting YieldAdding value
Trait Discovery
Gene Discovery &
Marker
Applications
Germplasm
Enhancement
Elite Breeding
Inbred&Hybrids
There is something missing here?
How we can manage this problem?
AMTECMassive Adoption of Technology
OBJECTIVES To transfer jointly the technology available for crop management.
To increase productivity and reduce production costs, with the least environmental impact, in a context of social responsibility
To aim for competitiveness and profitability of rice farmers in Colombia
TECHNOLOGY TRANSFER
Field days
Planning and good
management practices
Visits to research centers
Demonstration Trials
Reduction costs
AMTEC Results from 2012 and 2013… Source Fedearroz
Agronomy helps a lot!
2012
2013
Gene discoveryEmerging pathogen: Burkholderia glumae, producing grain sterility
Sources of tolerance identified
Tolerant genotype showing 60% less damage than susceptible genotypes
Molecular markers are being developed to speed up the transference of this trait into elite germplasm
Susceptible Tolerant (field evaluation)
Trait Discovery Gene Discovery & Marker Applications
Germplasm Enhancement Elite Breeding
Breeding pipeline
low light tolerance;nitrogen use efficiency;water use efficiency;high yield potential;panicle blight tolerance
recombinant populations;CSSL;NAM;iBridges;Software
introgressed lines;RS population;training population for GS
yield potential;grain quality; lodging resistance
•QTLs mapping; •QTL validation; •functional markers identification
•MABC;•recurrent selection;•genomic selection
•inbred FLAR•CIRAD & hybrids-HIAAL;•MET
•trait value characterization; •screening methods;•donors identification;•populations development;•sequencing;•gene validation
TECH
NO
LOGY
TR
ANSF
ER(2
5 ag
rono
mist
) RESEARCH BREEDING AND AGRONOMY
(45 researchers)
Breeding (Conventional 7,)
Agronomy (Physiology 3, Phytopatology 1, Soils 2, Water 2, Crop Management 26, Biotech 3, Weeds 1)
ECONOMICS(7 officials)
Updated Socio-economic studies
Our strategic partner for Rice Research in Colombia
computational models are tailored to the analysis of the data rather than data to a particular
methodology, as researchers have done for over a century
National Survey• Purpose: Keep the crop sector updated • N= 738 cropping events
Harvesting records• Purpose: Technical research (crop management, soils, breeding,
biotechnology, physiology)• N= 3193 cropping events
“Data is no longer regarded as static, whose usefulness is finished once the purpose for which it was collected is achieve”
Information on: Planting and harvesting date, productivity , grain humidity, variety, cropping system
Zones: Caribbean, Andean (Tolima), Plains (Llanos)
Databases:
Databases…. plenty of information
Adding value to secondary databases. The case of information on cropping events of rice
in Colombia
Planting dates experiments (Field trials) • Purpose: Technical research on the best sowing date• N= 272 cropping events
Adding value to secondary databases…but first, merging databases: Challenging task!!!
Climate • About 27 weather stations
Letting the data speak
“Before Big Data our analysis were usually limited to testing a small number of hypotheses that we defined well before we even collected the data. When we let the data speak we can make connections that we had never thought existed”
Cukier and Mayer-Schönberger (2013)
Sowing Harvest
a cropping event in rice = 120 days
Climate series for all variables
Crop
time
Hypothesis Yield variation is associated with climate
FEDEARROZ 733, 27 % of productivity variation explained
Multivariate analysis for Saldaña (research station- Andean zone ): cropping events (2007 to 2012)
Lagunas, 47 % of productivity variation explained
Varieties perform
differently under identical
climatic conditions
Letting the data speak
FEDEARROZ 733
N = 189
N = 63
Cimarrón Barinas
Letting the data speakClimate and analysis based on phenological stages in Saldaña (research station ) Andean zone 2007 – 2012 (N= about 800 cropping events – irrigated rice)
• The crop sector can suggest to farmers the best planting date• By assessing the same approach in other stations (enviroments) – New insights
for future breeding • Adaptation strategy for climate change
Climate accounts for 30% to 40% of production variability in irrigated rice
computational models are tailored to the analysis of the data rather than data to a particular
methodology, as researchers have done for over a century
Letting the data speakClimate and analysis based on phenological stages in Zone: Colombian Plains- 2007 – 2012 (N= about 500 cropping events – Upland rice)
• Rainfall is a critical driving factor for upland rice during grain filling and panicle initiation
• Machine learning (MLP)
Again! - climate accounts for 30% to 40% of production variability in upland rice
Letting the data speakClimate and analysis based on phenological stages in Zone Plains-Colombia 2007 – 2012 N= about 200 (cropping events – Upland rice.. variety F174)
• Temperature is a critical driving factor for variety 174 (upland rice) during grain filling• Machine learning (MLP)
This time climate explained more than 40% of production variability !!! in upland rice V F174
Case study : working with secondary databases: Seasonal forecast, niñ@s & Big Data. Rice in Colombia (Pompeya- Llanos)
What is likely to happen in March-April-May 2014?
We generated 24 clusters based on more than 500 cropping events
• Seasonal forecast + (data) Best technologies + Big Data analysis = Better adaptive responses to CC and CV
Cluster 7
Rice variety Productivity (Kg/Ha) Cropping eventsF174 4,564 31FORTALEZA 3,543 17F2000 4,977 8LAGUNAS 5,052 6MOCARI 4,604 6
What can we do with these results?FLAR and CIAT Rice Breeders• Better understanding of yield and its formation under changing, complex,
and extremely variable conditions.• New breeding objectives like low light tolerance, pattern of biomass
accumulation etc.• Better environments definition
FEDEARROZ• Reduce pesticide applications.. since it is demonstrated that there are
other factors behind the yield variation• Establish planting dates and new crop systems based on crop rotation• Establish a dynamic system for crop management based on short term
prediction to manage the risk associated with the changing conditions
CGIAR• Expand this experience to other crops and areas• Understand the importance of FARMERS ORGANIZATIONS to have impact• Interesting concept for CCAFS, GRiSP, MAYZE others
•The analytical approach used demonstrated that variation of rice productivity can be associated with climate (30 -45%)
• Internal Cooperation between research areas within CIAT and external FEDEARROZ is a powerful combination- Also… multidisciplinary work is key!!!
•As long as the information is available it can be applied in any other regions/ crops
• CCAFS is keen to integrate – CN selected CSMS (CIAT- FLAR-IRRI)
• Start collaborations with the yield gap taskforce
•Encourage others partners in LAC to collect information and be part of this idea…(e.g strategy of FLAR) and add value to info that has been already collected.
Concluding remarks and perspectives
Modern information technoloy, Big Data, Site-specific Management/Agriculture, digital soil mapping, Terra I, Bio-informatics are already here…
A new Ageekulture can be regarded as complementary to CIAT’s traditional research in order to fulfill the center`s mission
Concluding remarks and perspectives
THANK YOU!!!•Patricia Guzman – FEDEARROZ•Nestor Gutierrez- FEDEARROZ•Jose Levis – FEDEARROZ
•Gabriel Garces - FEDEARROZ
• Andy Jarvis (CC expert)•Edgar Torres – (Rice Breeder)•Daniel Jiménez (Agronomist)
• Camila Rebolledo – (Plant Physiologist)• Sylvain Delerce – Agronomist /Math background
•Hugo Dorado (Statistician) •Armando Muñoz (Biologist) •Victor Patiño (Statistician)
•Juan Felipe Rodriguez (The computer science component)
MADR, FEDEARROZ, CCAFS, GRiSP