big data, graphical modeling, and causal inference in

37
Guilherme J. M. Rosa Department of Animal Sciences Big Data, Graphical Modeling, and Causal Inference in Livestock Production

Upload: others

Post on 01-Dec-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data, Graphical Modeling, and Causal Inference in

Guilherme J. M. Rosa Department of Animal Sciences

Big Data, Graphical Modeling, and Causal Inference in

Livestock Production

Page 2: Big Data, Graphical Modeling, and Causal Inference in

•  Currently 7.2 billion people in the world.

•  Expected increase to about 9 billion by 2050, mostly in developing countries.

•  World food production will need to increase by 60 percent and food production in the developing world will need to double.

•  Productivity, profitability, product quality, environmental footprint (land, water and energy use, greenhouse gas emissions, etc.)

Feeding the World

Page 3: Big Data, Graphical Modeling, and Causal Inference in

Genotype x Environment

Page 4: Big Data, Graphical Modeling, and Causal Inference in

Example

Ribeiro S, Eler JP, Pedrosa VB, Rosa GJM, Ferraz JBS and Balieiro JCC. Genotype x environment interaction for weaning weight in Nellore cattle using reaction norm analysis. Livestock Science 176: 40–46, 2015.

Page 5: Big Data, Graphical Modeling, and Causal Inference in

Genotype x Environment

•  Nucleus herd vs. commercial settings environment

•  Environmental diversity within countries/macro-regions

•  Globalization of breeding •  Increasing importance of South

America, Africa and Southeastern Asia •  Global poverty and ecological footprint

Page 6: Big Data, Graphical Modeling, and Causal Inference in

Precision Livestock Production

Page 7: Big Data, Graphical Modeling, and Causal Inference in

•  Animal-level data -  Production indexes, well-being monitoring -  Pattern recognition (e.g. early detection of health issues) -  Predictive analytics (e.g. prediction of animal future

performance)

•  Farm-level data -  Efficiency of management protocols and product

administration -  Genetics-environment interaction -  Informed decision-making

Different Sources of Data and Information

Page 8: Big Data, Graphical Modeling, and Causal Inference in

Sensors: Prediction of behavior in lactating dairy cows

João Dórea

Animal-level Data High-Throughput, Real-Time Phenotyping

Page 9: Big Data, Graphical Modeling, and Causal Inference in

-  3 behaviors/activities: resting, eating and ruminating -  Hidden Markov model -  The data were analyzed for each axis: X, Y, and Z -  When the probability of a state at a time t was greater

than 50%, the state was classified into one of the 3 possible states

-  The total time of each state (predicted values) was compared to the observed values

Accelerometer to predict feeding behavior

Page 10: Big Data, Graphical Modeling, and Causal Inference in

X-axis: 3-state probabilities

Page 11: Big Data, Graphical Modeling, and Causal Inference in

Feeding Behavior

Ea#ng#me

Rumina#ng#me

Res#ng#me

X-axispred,min 45 47 42obs,min 45 51 40Accuracy,% 100 92 95Y-axispred,min 35 76 25obs,min 40 45 51Accuracy,% 89 32 48Z-axispred,min 54 72 10obs,min 45 51 40Accuracy,% 80 58 24

Page 12: Big Data, Graphical Modeling, and Causal Inference in

Computer Vision: Tilapia filet quality

•  Data from more than 3000 fish •  Dorsal and lateral pictures •  Carcass weight and yield

Page 13: Big Data, Graphical Modeling, and Causal Inference in

Image Processing

•  Image recognition and segmentation

Original image

Output segmentation

Page 14: Big Data, Graphical Modeling, and Causal Inference in

Pig weight and leg/back score

•  Data 700 pigs •  Weight across different ages •  Leg and back scores

Arthur Fernandes

Page 15: Big Data, Graphical Modeling, and Causal Inference in

Prediction: Linear model

Page 16: Big Data, Graphical Modeling, and Causal Inference in

The use of artificial neural network to estimate feed intake in lactating cows through mid-infrared spectra of milk samples

Milk Mid-infrared Spectra

dry matter intake mid-infrared (MIR)

spectroscopy

milk sample

João Dórea

Page 17: Big Data, Graphical Modeling, and Causal Inference in

Objective: use of Fourier transform MIR of milk samples to estimate dry matter intake in lactating Holsteins cows

-  MIR recorded for 599 milk samples from 189 lactating cows Individual DMI recorded with electronic feeding gates

-  One-hidden-layer ANN model compared with partial least squares (PLS) regression

-  Cross-validation method used to assess the predictive ability (PMSE)

Results: ANN PMSE decreased as the number of neurons increased until 15 (PMSE = 4.13, 3.61, 3.30 and 3.38 kg2/d2 , for 5, 10, 15 and 20 neurons); the PLS model (7 factors) resulted in higher PMSE = 4.41 kg2/d2

Page 18: Big Data, Graphical Modeling, and Causal Inference in

Gianola D, Okut H, Weigel KA and Rosa GJM. Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genetics 12:87, 2011.

Pérez-Rodríguez P, Gianola D, Weigel KA, Rosa GJM and Crossa J. An R package for fitting Bayesian regularized neural networks with applications in animal breeding. Journal of Animal Science 91: 3522-3531, 2013.

High-Throughput Genotyping

Predicting complex quantitative traits with Bayesian neural networks

Page 19: Big Data, Graphical Modeling, and Causal Inference in
Page 20: Big Data, Graphical Modeling, and Causal Inference in

Mixed Models

•  Used extensively in animal breeding, with multiple traits and huge numbers of records and animals in the pedigree

•  However, environmental effects coalesced into contemporary groups

•  As such, individual effects of specific factors are not investigated, no issues with collinearity, no insight into indirect, direct and total effects, etc.

eZuXβy ++=G =G0 ⊗AΣ =R⊗ I⎧⎨⎩

Farm-level Data Historic data across farms

Page 21: Big Data, Graphical Modeling, and Causal Inference in

•  Confinamento Monte Alegre (CMA): http://www.cma.agr.br

•  Feedlot capacity 16,000 heads •  Annual output around 50,000 heads

•  TGC software: http://www.gestaoagropecuaria.com.br/produtos/tgc/

•  80 variables (input, output, economics, etc.)

Page 22: Big Data, Graphical Modeling, and Causal Inference in

Decision Tree

•  Decision tree: decision support tool that uses a tree-like graph or model of decisions and their possible consequences

•  Decision trees are commonly used in operations research, such as decision analysis, to help identify a strategy most likely to reach a goal

•  Popular tool in machine learning

Page 23: Big Data, Graphical Modeling, and Causal Inference in

Decision Tree: U$/head – Income (feed cost)

Page 24: Big Data, Graphical Modeling, and Causal Inference in

Decision Tree: U$/head – Net Income

Page 25: Big Data, Graphical Modeling, and Causal Inference in

Company Owner#CPF/CNPJ

FarmRegion

ID

outcomeiAge iFat Month Year

gSlaugther Weight Quality Amounttypified

farmtraits

Farm Owner

Inscestadual

Size

City#CPF/CNPJ

State

La#t/Long

StateIDCPF/CNPJ

ID

OwnerOwnerCode

#CPF/CNPJ

FarmStateID

CityState

outcome

TechnologySalesteam

TechnicianYearpNutri#on

tNutri#on

wNutri#onSeason

Beef Production and Quality

Vera Cardoso

Page 26: Big Data, Graphical Modeling, and Causal Inference in

Amountofcarcassestypifiedperregion

Total:23,056,869carcasses(≅25%ofBrazilianproduc#on2014-2016)

NumberofcarcassesbyAge

0 2 4 6 8Years

feb apr jun aug oct dec

Carcassesslaughtered

bymon

th

Years2014-2016

Carcassesbyyear2014:2,229,7022015:1,924,1492016:1,808,955

NumberofcarcassesbyQuality

Desirable Acceptable Undesirable

Carcassesslaughtered

byweight

Weightin@

NumberofcarcassesbyiFat

1 2 3 4 5Fatindex

Page 27: Big Data, Graphical Modeling, and Causal Inference in

Preliminary results

•  Data from two sources: JBS S.A. (81,053 farms) and DSM Produtos Nutricionais (22,223 farms). After merging, the final dataset comprised information from 7,248 farms and 1,571,023 carcasses slaughtered in the years 2014-2016.

•  Outcome variables: body weight at slaughter, carcass fat index, age at slaughter (AS)

•  Covariates: farm, AS, season, animal category (steer, bull, cull bull, heifer and cow), frequent technical consulting (FTC), regional sales team (RST), type of feedlot premix (no feedlot premix – NFP, finishing grazing cattle – FGC, feedlot without additives – FWA, and feedlot with additives – FA)

Ferreira et al. Big data analysis of beef production and quality: an example with the Brazilian cattle industry. ASAS meeting, Baltimore, MD, July 8-12, 2017 (to appear)

Page 28: Big Data, Graphical Modeling, and Causal Inference in

•  Results: –  Use of FA premix decreased AS, and increased BWS

and FI in comparison to NFP and FWA –  Adopting FTC increased BWS and FI, and reduced AS –  Bulls presented greater BWS and lower AS, but

presented lower FI in comparison with steers –  Differences in BWS were observed for different RST

and seasons –  AS was reduced and BWS and FI increased in raining

seasons of 2014-2016 –  Combining FTC and FA was capable of increasing BWS

in 27.4 kg and reducing AS in approximately 10 months in comparison with FWA and non-FTC, suggesting that this approach might be favorable for production

Preliminary results

Page 29: Big Data, Graphical Modeling, and Causal Inference in

Location of Iowa Select Finishing Farms throughout Iowa

Pig Production

Tiago Fragoso

Page 30: Big Data, Graphical Modeling, and Causal Inference in

Pig Production

•  The data set contains 503 farms divided into 3 production types: –  Finishing [428 farms] –  Nursery [25 farms] –  Gilt Development Unit (GDU) [50 farms]

Page 31: Big Data, Graphical Modeling, and Causal Inference in

Traits

•  Data set contains 140+ traits divided into 3 groups: –  Performance: days on feed (DOF), average daily gain

(ADG), mortality, pigs produced, total weight produced, feed conversion (FC), profit

–  Carcass traits: backfat, loin depth, % very light, % light, % target, % heavy, % very heavy, carcass weight (CW), live weight (LW), % yield (CW/LW)

–  Utilization: fill days, location days, % utilization, number of turns, number of loads

Page 32: Big Data, Graphical Modeling, and Causal Inference in

Example – Mortality (%) •  Reference values:

–  Excellent: E < 2.5%, Regular: 2.5% < R < 5%, Bad: B > 5%

Page 33: Big Data, Graphical Modeling, and Causal Inference in

Type of Farm

Page 34: Big Data, Graphical Modeling, and Causal Inference in

Feed Mill and Sire Line

Page 35: Big Data, Graphical Modeling, and Causal Inference in

Precipita#on

Temperature

Soil

Geographical location, soil and weather condition

Page 36: Big Data, Graphical Modeling, and Causal Inference in
Page 37: Big Data, Graphical Modeling, and Causal Inference in