Overnourishment & Undernourishment
Data AnalysisSevval Boylu Selin Eyupoglu Sena Necla Cetin
What causes malnutrition?● Overnourishment and undernourishment: both major health issues and
considered malnutrition
● Aim: To find out factors that trigger both problems.
● Period: 1999-2013
● Region: USA and Africa → extremities
We know that obesity is one of the critical health issues that cause a wide scale of
diseases, whereas hunger and undernutrition is detrimental for human development
in African countries. Thus, a correlation between the two would give a set of
efficient, mediating solutions.
Undernourishment DataYear Prevalence
1991 27.6
1992 27.3
1993 27.2
1994 27.1
1995 26.8
1996 26.6
1997 26.4
1998 26.2
Year
2007
Prevalence
22.3
2008 21.9
2009 21.6
2010 21.1
2011 20.7
2012 20.2
2013 20.0
2014 19.8
2015 19.8
Year
1999
Prevalence
25.9
2000 25.7
2001 25.4
2002 25.1
2003 24.7
2004 24.2
2005 23.5
2006 22.7
Undernourishment Graphs
Overnourishment Data Period Overweight Obese Extremely Obese
1999 34.0 30.5 4.7
2001 35.1 30.5 5.1
2003 34.1 32.2 4.8
2005 32.6 34.3 5.9
2007 34.3 33.7 5.7
2009 33.0 35.7 6.3
2011 33.6 34.9 6.4
2013 32.5 37.7 7.7
Overnourishment Data (obesity + extreme obesity)
Period Obesity
1999 35.2
2001 35.6
2003 37.0
2005 40.2
2007 39.4
2009 42.0
2011 41.3
2013 45.4
Overnourishment Graphs (obesity + extreme obesity)
The Correlation Matrix (before filling in missing data)
(before filling in)
Filling in Missing Data● Filled missing data points between 1999 and 2013 by using means of the
previous and next data points
● Eliminated data before and after these years
Filling in Missing Obesity Data Using MeansYear Obesity
AfterObesity Before
1991 NaN NaN
1992 NaN NaN
1993 NaN NaN
1994 NaN NaN
1995 NaN NaN
1996 NaN NaN
1997 NaN NaN
1998 NaN NaN
Year Obesity After
Obesity Before
2000 35.40 35.40
2001 35.60 NaN
2002 36.10 35.40
2003 37.00 NaN
2004 37.90 35.85
2005 40.20 NaN
2006 38.20 36.30
2007 39.40 NaN
Year Obesity After
Obesity Before
2009 42.00 NaN
2010 40.35 38.60
2011 41.30 NaN
2012 43.70 38.05
2013 45.40 NaN
2014 NaN NaN
2015 NaN NaN
Spearman’s and Pearson’s Correlation● Temperature(Celsius)● GDP_USA(American dollar)● Agriculture_Africa(Percentage of agriculture in African economy)● Africa_GDP(American dollar)
● Our null hypothesis: ‘There is at least one pair of variables which is statistically
insignificant.’ ● Our goal was to eliminate one of the potential factors. But, all pairwise p<0.05,
thus we say all pairs are statistically significant.
Spearman’s Correlation (1999-2013 only)
Strong negative correlation, makes sense
Does not suggest dependence between two variables since USA is almost always developing and undernourishment in Africa is mostly decreasing; no sensible one-to-one correlation between the two.
● Temperature is poorly correlated to other variables, which also makes sense.
● Not a good indicator of Obesity in USA and Undernourishment in Africa but we still kept it as a variable in our model because its correlations were around -0.5 or 0.5.
Pearson’s Correlation
Pearson’s Correlation
Pearson’s Correlation
Pearson vs. SpearmanSince Pearson’s correlation evaluates the linear relationship between two continuous variables, rate of change is constant whereas Spearman’s correlation evaluates monotonic relationships, our data was evaluated better in Spearman’s correlation.
We had no clue about the relationships and what kind of function it resembles. Thus we choose Spearman’s because it is more comprehensive.
Moreover, in real life situations we don’t expect to have a linear relationship between the variables.
Obesity Data Prediction
Undernourishment Prediction
Decision Tree Regressor Model● Specifically used Regressor functions because our data is numerical.
● 0.909 accuracy score for predicting Undernourishment percentage
● 0.86 accuracy score for Obesity in USA
● Mean ≈ 0.88
Decision Tree Regressor Model● We also tried to add Undernourishment as a predictor to see if altogether (all
columns) make a good model and make a proper estimation for Obesity values.
We reached the highest accuracy score with this model (0.86). It doesn’t
suggest that Obesity and Undernourishment percentages are strongly
dependent.
● Our conclusion: With Decision Tree for Regression model, Obesity and
Undernourishment values work well together. It also may suggest that this
model could predict future values better as well.
Training Our Model with Random ForestWe found 0.77 accuracy score which is not as good as Decision Tree. However, it can be considered successful in estimation.
Logistic RegressionLastly we tried Logistic Regression as a machine learning technique, found F1 scores of 0.86, 0.8.
Although high precision and low recall is preferable our model gives the opposite. In fact, all of the recall values are 1 .
F1 scores suggest that our model predicts with an average of 0.82 accuracy.
ConclusionIn early stages of our project, our aim was to suggest a single factor that possibly causes both problems, however giving single predictors to models didn’t yield efficient results (i.e. high accuracy scores) By including all possible predictors that we suggested, accuracy score got higher. Thus we conclude that our model with 4 predictors and 2 target is yielding.
Comparing scores of several machine learning techniques, we think Decision Tree
for Regression as a model gives the best model for our data because it has the highest accuracy score (0.88).
Failures
● We failed to build a proper decision tree with nodes and leaves. ● We spent too much time trying to fill the missing data on
obesity data(1991-1998), we ended up dropping these NaN values.
Resources● http://gamapserver.who.int/mapLibrary/Files/Maps/Global_MDG5_2011_Con
traceptive_prev.png● http://static4.businessinsider.com/image/54c127cfeab8ea447d9135d5-1128-
515/screen%20shot%202015-01-22%20at%2010.27.46%20am.png