big data, graphical modeling, and causal inference in
TRANSCRIPT
Guilherme J. M. Rosa Department of Animal Sciences
Big Data, Graphical Modeling, and Causal Inference in
Livestock Production
• Currently 7.2 billion people in the world.
• Expected increase to about 9 billion by 2050, mostly in developing countries.
• World food production will need to increase by 60 percent and food production in the developing world will need to double.
• Productivity, profitability, product quality, environmental footprint (land, water and energy use, greenhouse gas emissions, etc.)
Feeding the World
Genotype x Environment
Example
Ribeiro S, Eler JP, Pedrosa VB, Rosa GJM, Ferraz JBS and Balieiro JCC. Genotype x environment interaction for weaning weight in Nellore cattle using reaction norm analysis. Livestock Science 176: 40–46, 2015.
Genotype x Environment
• Nucleus herd vs. commercial settings environment
• Environmental diversity within countries/macro-regions
• Globalization of breeding • Increasing importance of South
America, Africa and Southeastern Asia • Global poverty and ecological footprint
Precision Livestock Production
• Animal-level data - Production indexes, well-being monitoring - Pattern recognition (e.g. early detection of health issues) - Predictive analytics (e.g. prediction of animal future
performance)
• Farm-level data - Efficiency of management protocols and product
administration - Genetics-environment interaction - Informed decision-making
Different Sources of Data and Information
Sensors: Prediction of behavior in lactating dairy cows
João Dórea
Animal-level Data High-Throughput, Real-Time Phenotyping
- 3 behaviors/activities: resting, eating and ruminating - Hidden Markov model - The data were analyzed for each axis: X, Y, and Z - When the probability of a state at a time t was greater
than 50%, the state was classified into one of the 3 possible states
- The total time of each state (predicted values) was compared to the observed values
Accelerometer to predict feeding behavior
X-axis: 3-state probabilities
Feeding Behavior
Ea#ng#me
Rumina#ng#me
Res#ng#me
X-axispred,min 45 47 42obs,min 45 51 40Accuracy,% 100 92 95Y-axispred,min 35 76 25obs,min 40 45 51Accuracy,% 89 32 48Z-axispred,min 54 72 10obs,min 45 51 40Accuracy,% 80 58 24
Computer Vision: Tilapia filet quality
• Data from more than 3000 fish • Dorsal and lateral pictures • Carcass weight and yield
Image Processing
• Image recognition and segmentation
Original image
Output segmentation
Pig weight and leg/back score
• Data 700 pigs • Weight across different ages • Leg and back scores
Arthur Fernandes
Prediction: Linear model
The use of artificial neural network to estimate feed intake in lactating cows through mid-infrared spectra of milk samples
Milk Mid-infrared Spectra
dry matter intake mid-infrared (MIR)
spectroscopy
milk sample
João Dórea
Objective: use of Fourier transform MIR of milk samples to estimate dry matter intake in lactating Holsteins cows
- MIR recorded for 599 milk samples from 189 lactating cows Individual DMI recorded with electronic feeding gates
- One-hidden-layer ANN model compared with partial least squares (PLS) regression
- Cross-validation method used to assess the predictive ability (PMSE)
Results: ANN PMSE decreased as the number of neurons increased until 15 (PMSE = 4.13, 3.61, 3.30 and 3.38 kg2/d2 , for 5, 10, 15 and 20 neurons); the PLS model (7 factors) resulted in higher PMSE = 4.41 kg2/d2
Gianola D, Okut H, Weigel KA and Rosa GJM. Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genetics 12:87, 2011.
Pérez-Rodríguez P, Gianola D, Weigel KA, Rosa GJM and Crossa J. An R package for fitting Bayesian regularized neural networks with applications in animal breeding. Journal of Animal Science 91: 3522-3531, 2013.
High-Throughput Genotyping
Predicting complex quantitative traits with Bayesian neural networks
Mixed Models
• Used extensively in animal breeding, with multiple traits and huge numbers of records and animals in the pedigree
• However, environmental effects coalesced into contemporary groups
• As such, individual effects of specific factors are not investigated, no issues with collinearity, no insight into indirect, direct and total effects, etc.
eZuXβy ++=G =G0 ⊗AΣ =R⊗ I⎧⎨⎩
Farm-level Data Historic data across farms
• Confinamento Monte Alegre (CMA): http://www.cma.agr.br
• Feedlot capacity 16,000 heads • Annual output around 50,000 heads
• TGC software: http://www.gestaoagropecuaria.com.br/produtos/tgc/
• 80 variables (input, output, economics, etc.)
Decision Tree
• Decision tree: decision support tool that uses a tree-like graph or model of decisions and their possible consequences
• Decision trees are commonly used in operations research, such as decision analysis, to help identify a strategy most likely to reach a goal
• Popular tool in machine learning
Decision Tree: U$/head – Income (feed cost)
Decision Tree: U$/head – Net Income
Company Owner#CPF/CNPJ
FarmRegion
ID
outcomeiAge iFat Month Year
gSlaugther Weight Quality Amounttypified
farmtraits
Farm Owner
Inscestadual
Size
City#CPF/CNPJ
State
La#t/Long
StateIDCPF/CNPJ
ID
OwnerOwnerCode
#CPF/CNPJ
FarmStateID
CityState
outcome
TechnologySalesteam
TechnicianYearpNutri#on
tNutri#on
wNutri#onSeason
Beef Production and Quality
Vera Cardoso
Amountofcarcassestypifiedperregion
Total:23,056,869carcasses(≅25%ofBrazilianproduc#on2014-2016)
NumberofcarcassesbyAge
0 2 4 6 8Years
feb apr jun aug oct dec
Carcassesslaughtered
bymon
th
Years2014-2016
Carcassesbyyear2014:2,229,7022015:1,924,1492016:1,808,955
NumberofcarcassesbyQuality
Desirable Acceptable Undesirable
Carcassesslaughtered
byweight
Weightin@
NumberofcarcassesbyiFat
1 2 3 4 5Fatindex
Preliminary results
• Data from two sources: JBS S.A. (81,053 farms) and DSM Produtos Nutricionais (22,223 farms). After merging, the final dataset comprised information from 7,248 farms and 1,571,023 carcasses slaughtered in the years 2014-2016.
• Outcome variables: body weight at slaughter, carcass fat index, age at slaughter (AS)
• Covariates: farm, AS, season, animal category (steer, bull, cull bull, heifer and cow), frequent technical consulting (FTC), regional sales team (RST), type of feedlot premix (no feedlot premix – NFP, finishing grazing cattle – FGC, feedlot without additives – FWA, and feedlot with additives – FA)
Ferreira et al. Big data analysis of beef production and quality: an example with the Brazilian cattle industry. ASAS meeting, Baltimore, MD, July 8-12, 2017 (to appear)
• Results: – Use of FA premix decreased AS, and increased BWS
and FI in comparison to NFP and FWA – Adopting FTC increased BWS and FI, and reduced AS – Bulls presented greater BWS and lower AS, but
presented lower FI in comparison with steers – Differences in BWS were observed for different RST
and seasons – AS was reduced and BWS and FI increased in raining
seasons of 2014-2016 – Combining FTC and FA was capable of increasing BWS
in 27.4 kg and reducing AS in approximately 10 months in comparison with FWA and non-FTC, suggesting that this approach might be favorable for production
Preliminary results
Location of Iowa Select Finishing Farms throughout Iowa
Pig Production
Tiago Fragoso
Pig Production
• The data set contains 503 farms divided into 3 production types: – Finishing [428 farms] – Nursery [25 farms] – Gilt Development Unit (GDU) [50 farms]
Traits
• Data set contains 140+ traits divided into 3 groups: – Performance: days on feed (DOF), average daily gain
(ADG), mortality, pigs produced, total weight produced, feed conversion (FC), profit
– Carcass traits: backfat, loin depth, % very light, % light, % target, % heavy, % very heavy, carcass weight (CW), live weight (LW), % yield (CW/LW)
– Utilization: fill days, location days, % utilization, number of turns, number of loads
Example – Mortality (%) • Reference values:
– Excellent: E < 2.5%, Regular: 2.5% < R < 5%, Bad: B > 5%
Type of Farm
Feed Mill and Sire Line
Precipita#on
Temperature
Soil
Geographical location, soil and weather condition