Introduction Policy makers have long been interested in the relationship
between the environment and human health. In some cases, such as water contamination, this relationship has
been well studied. For other environmental factors, such as forest cover, health effects are not as easily quantifiable
(we do not know an LD-50 level for forest cover). The purpose of this project was to determine what effects, if any, various environmental factors have on the health of
Indonesian children. Do forest cover, water area, rainfall, and erosion affect the number of illnesses in children, after accounting for family,
housing, and village characteristics?
Data Collection Methods• Data were collected by Professor Subhrendu Pattanayak
during his doctoral research in Indonesia.
• Observations were obtained through surveys of randomly selected households in several villages.
• Villages were selected to represent a variety of environmental characteristics.
• Data were only collected from households with a total family size less than eight.
• A GIS was used to measure village area, forest cover, and water area. Erosion and sedimentation rates were also derived using a GIS.
Variables Selected for AnalysisDependent
• Annual number of illnesses per child
Independent • Adult education (years)• Total family size• Annual number of illnesses per
adult • Size of farm (hectares)• Condition of the floor: 1=stilts, 2=dirt,
3=cement
• Condition of the roof: 1=straw, 2=wood, 3=zinc
• Income from non-farm sources: 0=none, 1= > zero
• Annual expenditures per family member (rupiahs)
• Village density (people/hectare)• Primary forest cover (hectares)• Secondary forest cover (hectares)• Annual rainfall in watershed (mm)• Water area (hectares) • Annual erosion and
sedimentation rate in watershed (tons)
Summary Statistics• Focused on subset of data with > 0 illnesses per child
Numeric Variables (n=357)
Illness per child
Family size
Adult education (years)
Expenditures per Family Member (rupiahs)
Village Density (people/ hectare)
Primary Forest (ha)
Secondary Forest (ha)
Water Area (ha)
Rain (mm)
Erosion/ Sedimentation (tons)
Min 0.25 1.9 2 4666.67 0.12 0.22 0.05 1.46 6.38 0.01Mean 1.08 4.23 6.42 221391.43 1.67 23.64 99.14 322.24 972 1.86Median 1 3.9 5 161538.46 0.92 16.46 83.13 261.9 678.9 0.94Max 4 7.6 21 2101694.92 11.35 156.54 526.96 2030.44 6470 18.55STD 0.53 1.42 3.5 224346.31 2.06 24.02 90.17 316.71 897.1 2.59
Indicator Variables (Number of Observations per Code)
Code Floor RoofNon-farm Income
1 110 44 1512 182 29 2063 62 279
Exploratory AnalysisLog transformed all numeric variables except family size
0.250 0.625 1.000 1.375 1.750 2.125 2.500 2.875 3.250 3.625 4.000
0.0
0.5
1.0
1.5
Annual illnesses per child
Number of Illnesses
-1.386294-1.109035
-0.831777-0.554518
-0.2772590.000000
0.2772590.554518
0.8317771.109035
1.386294
0.0
0.5
1.0
1.5
2.0
2.5
Log(Annual illnesses per child)
Number of Illnesses
0.050341452.7414973
105.4326532158.1238091
210.8149651263.5061210
316.1972769368.8884328
421.5795887474.2707446
526.9619005
0.000
0.002
0.004
0.006
0.008
Secondary Forest
Number of Hectares (ha)
-2.988927-2.063322
-1.137716-0.212111
0.7134951.639100
2.5647063.490312
4.4159175.341523
6.267128
0.0
0.1
0.2
0.3
0.4
0.5
Log(Secondary Forest)
Log(Number of Hectares)
1.8526312.544860
3.2370883.929316
4.6215455.313773
6.0060016.698230
7.3904588.082687
8.774915
0.0
0.1
0.2
0.3
0.4
0.5
Log(Rainfall)
log(mm of rain)
6.37658652.72822
1299.079861945.43151
2591.783153238.13479
3884.486434530.83808
5177.189725823.54136
6469.89301
0.0000
0.0002
0.0004
0.0006
Rainfall (mm)
mm of rain
Model Selection• We considered plausible interaction effects and quadratic terms.
• Subset model selection (leaps) was used to determine which interaction effects were significant with all main effects included.
• Then, we used manual stepwise selection (backward) to analyze the significance of the main effects.– Farm size (p=0.4749), erosion rate (p=0.3694), and floor condition (p=0.5069)
were not significant and were not involved in a potentially important interaction effect.
Model -- All MainEffects, plus:
BIC Posterior Probability
Roof* Log(Rain) -637.66 0.1583Log(Village Density)^2 -637.01 0.1143Log(Adult Education)*Log(Expenditures)
-636.13 0.0736
Log(Adult Education)*Family size
-635.92 0.0662
Roof* Log(Rain) +Log(Village Density)^2
-635.75 0.0608
Model Selection• Subset model selection (leaps) was also used to determine which interaction
effects were significant with the remaining main effects (minus farm size, erosion, and floor condition).
• Analyzed the importance of each interaction effect using manual stepwise regression.
• Tests for influential observations using Cook’s distance indicated that there were no influential points. All of the observations displayed Cook’s Distance < 0.08.
• Finally, we considered additional transformations. Transforming family size (the final untransformed variable) did not significantly improve our model.
Model -- SignificantMain Effects, plus:
BIC Posterior Probability
Roof* Log(Rain) -656.93 0.1422Log(Village Density)^2 -656.80 0.1330Roof* Log(Rain) +Log(Village Density)^2
-656.06 0.0918
Log(Adult Education) *Log(Adult Illness)
-654.95 0.0526
Roof* Log(Adult Illness) -654.85 0.0501
Fitted Model• If the variable is highlighted in
red, then an increase in this variable is associated with an increase in the annual number of illnesses per child.
• If the variable is highlighted in blue, then an increase in this variable is associated with a decrease in the annual number of illnesses per child.
• R-squared = 0.2943• R-squared adjusted = 0.2688
• This model is homoskedastic, but has a heavy tailed distribution (QQ Normal plot not shown).
Intercept: 3.02(0.90)
Beta Variable Value P-Value
1 Log(Family Size) - 0.04 (0.02)
0.0511
2 Log (Adult Illness) 0.35(0.05)
0.0000
3 Roof - 0.45(0.21)
0.0347
4 Nonfarm Income 0.05(0.02)
0.0113
5 Log(Expenditures) - 0.04(0.02)
0.0877
6 Log(Village Density - 0.06(0.03)
0.0240
7 Log(Adult Education) 0.12(0.06)
0.0410
8 Log(Rain) - 0.39(0.13)
0.0028
9 Log(Primary Forest) 0.13(0.06)
0.0210
10 Log(Second Forest) 0.10(0.03)
0.0017
11 Log(Water Area) - 0.15(0.06)
0.0094
12 Log(Rain) * Roof 0.07(0.03)
0.0322
Question 1: How do the environmental factors considered affect child health as a group?
Hypotheses:• H0: The model without env. factors is adequate.
• HA: The full model (with env. factors) is a significant improvement.
Statistical Technique:• We used an Extra Sum of Squares F-Test to test the joint significance of the environmental
factors (log(rainfall), log(primary forest), log(secondary forest), log(water area), and [log(rain)*roof)].
Results:• The addition of environmental factors significantly improved our model (F=3.24, p=0.01, ESS F-
test).
Conclusions and Limitations:• There was sufficient evidence to reject the null hypothesis and conclude that inclusion of
environmental factors does significantly improve our understanding of the annual number of illnesses in Indonesian children.
• This is a heavy-tailed distribution, even after transformations, and this may have undue influence on the results. Likewise, there may be lurking variables (i.e. watershed area) which were not accounted for in the data.
Question 2: How does forest cover affect child health?
Hypotheses:• H0: Primary forest cover is not a significant explanatory variable (9=0).
• HA: Primary forest cover is significant (90).
and
• H0: Secondary forest cover is not a significant explanatory variable (10=0);
• HA: Secondary forest cover is significant (100).
Statistical Technique:• Two-sided t-tests were used to test the significance of each coefficient with
=0.05, after accounting for the other variables in the model. • We calculated 95% family-wise confidence intervals using Bonferroni
techniques in order to simultaneously estimate the coefficients associated with environmental factors.
Question 2: Results and ConclusionsResults:• According to the t-tests, there is sufficient evidence to reject the null hypotheses and
conclude that both primary and secondary forest are significant explanatory variables in this model (primary: p=0.0210; secondary: p=0.0017).
• However, under the more conservative approach of family-wise confidence intervals, primary forest does not appear to be significant (95% CI: -0.01551, 0.2819; includes zero).
• Secondary forest does appear to be significant, even with the conservative family-wise confidence interval (95% CI: 0.01824 ,0.1846).
Conclusions:• An increase in secondary forest cover is associated with an increase in the median
annual number of illnesses per child in Indonesia. Doubling the amount of secondary forest cover is associated with a 7% (95% CI: 1%,14%) increase in the median number of illnesses per child per year.
• We fail to reject the null hypothesis for primary forest cover under the Bonferroni confidence interval. However, our results are highly suggestive of an association between primary forest cover and child illness.
Question 3: How does the amount of rainfall affect child health?
Hypotheses:• H0: Rainfall is not a significant explanatory variable (8=0).
• HA: Rainfall is significant (80).
and
• H0: The interaction between rainfall and roof type is not significant (12=0).
• HA: The interaction effect is significant (120).
Statistical Technique:• Two-sided t-tests with =0.05.
• 95% family-wise confidence interval (Bonferroni) for family of environmental variables.
• Set up a dummy variable for roof to assess the degree of interaction between rainfall and each roof type.
Question 3: Results
Results:• According to the t-tests, there is sufficient evidence to reject the null
hypotheses and conclude that both rainfall and the interaction between rainfall and roof type are significant (p=0.0028; p=0.0032, respectively).
• Under the more conservative approach of family-wise confidence intervals, the interaction effect does not appear to be significant (95% CI: -0.0140, 0.1518; includes zero). On the other hand, the rainfall variable is significant (95% CI: -0.7326,-0.0548).
• T-tests analyzing the significance of the dummy variables for both the roof variable and the interaction between roof and rain indicate that roofs 1 and 2 do not significantly differ from roof 3. Therefore, a reduced model without the dummy variables is a more appropriate model. The coefficients, t-values and p-values for the dummy variables are as follows:
Coefficient t-value p-valueRoof 1 0.175 0.3265 0.7442Roof 2 -0.34 -1.807 0.0717Log(Rain)*Roof 1 -0.0267 -0.342 0.7325Log(Rain)*Roof 2 0.0514 1.8694 0.0625
Question 3: Conclusions
Conclusions:• An increase in rainfall is associated with a decrease in the median number of
illnesses per child per year. In fact, a doubling of annual rainfall is associated with a 23.9% (95% CI: 4%,40%) decrease in the median number of illnesses per child per year. This is the strongest multiplicative change in median number of illnesses among the environmental factors.
• The negative results of the dummy variable tests are somewhat surprising. It was assumed by those collecting the data that the quality of roof increased from roof 1 (straw) to roof 3 (zinc). Interestingly, the coefficients of the dummy variables suggest that roof 2 (wood) is actually associated with a lower rate of child illness than roof 3. Unfortunately, the lack of significance of the coefficients prevents us from definitively answering this question.
• On the other hand, the fact that the family-wise (Bonferroni) confidence interval indicated that the interaction effect was not significant makes the lack of significance among the dummy variables less surprising.
General Conclusions• Modeling Indonesian children’s health is an extremely complicated prospect.
With all of the variables we have included, our model explains just 29% of the variation in annual number of illnesses (or 26% with adjusted R2).
• Our analysis indicates that environmental factors are important when attempting to explain child health but the predictive power of such explanations is very low.
• Despite a lack of predictive power, however, the model does exhibit several interesting associations. For example, we expected the coefficient associated with water area to be negative because of a suspected increased number of insects; that coefficient turned out to be positive and significantly greater than 0.
• We began this project in hopes of finding a human health argument for conservation of primary forest. On the contrary, the significantly positive nature of the coefficient of the primary forest variable creates a disincentive for conservation. Before promoting deforestation for health reasons, however, we must again consider the uncertainty inherent in the model.
Recommendations and Further Research
• The observational nature of the data prevent any inference of cause and effect relationships. Thus, we may only discuss associations between variables.
• We were highly suspicious of observations claiming to have had no illnesses among children for a year and focused only on families with counted illnesses. Future surveys identifying the type of illness in question would be helpful in building a more descriptive model.
• Future studies should consider focusing analyses on a specific type of illness to increase the predictive power of the model.
• Few policy recommendations can be drawn from this particular model. More research is needed into the environmental factors affecting the amount of disease among children. Increasing the predictive power of the model will be key to increasing the utility of the model as a policy tool.
Acknowledgments:We would like to thank Professor Subhrendu Pattanayak for supplying the data.
How does the rate of erosion and sedimentation in the watershed affect child health?
• The erosion/sedimentation variable was not significant in any of the models considered.
• It was also not significant in the final model (F=0.3142, p=0.5755, ESS F-Test).
• There is only a 2% probability that erosion and sedimentation rate affects the annual number of illnesses in children in the top 20 models (total posterior probability).
• Limitations: The erosion/sedimentation variable is correlated with rain (correlation coefficient = - 0.75). This reduces our ability to assess the significance of this variable since it would have a lower
t-statistic and a wider confidence interval.
Does water area affect child health?
Hypotheses:• H0: Water area is not a significant explanatory variable (11=0).
• HA: Water area is significant (110).
Statistical Technique:• Two-sided t-tests with =0.05.• 95% family-wise confidence interval (Bonferroni) for family of environmental variables.
Results:• According to the t-test, there is sufficient evidence to reject the null hypothesis and
conclude water area is a significant explanatory variable in this model (p=0.0094). • The family-wise confidence interval supports the conclusion that water area is
significant (95% CI: -0.2907,-0.0011).
Conclusions:• An increase in water area is associated with a decrease in the annual number of
illnesses per child. Doubling the water area is associated with a 10% decrease in the median number of illnesses per child.
Possible Interactionslog.illadult
aduledu fmlsz
log.farmsz floor roof
nonfarm.binary
log.exppermem
log.villdensity
log.primeforest
log.secondforest
log.waterarea
log.rain
log.erosion
illness per adult log.illadultadult education aduledu Xfamily size fmlsz Xfarm size log.farmszfloor condition** floor X Xroof condition** roof X X Xincome from nonfarm sources** nonfarm.binary Xexpenditures per family member log.exppermem X Xvillage density log.villdensity Xprimary forest cover log.primeforestsecondary forest cover log.secondforestwater area log.waterarea Xrain log.rain X Xerosion log.erosion** Indicator Variables
Possible Quadratic Terms Village Density
Roof Type
Est. Mean for Intercept (# child illnesses)
95% CI for Intercept (# child illnesses)
Est. Mean for Slope (# child illnesses/ mm of rain)
95% CI for Slope (# child illnesses/ mm of rain)
1: Straw 7.90 (1.61,38.83) -0.78 (- 0.97,-0.62)2: Wood 4.72 (1.43,15.52) -0.84 (-0.99,-0.71)3: Zinc 6.63 (2.17,20.27) -0.80 (-0.94,-0.69)
Prediction at future value of 800mm of rainBetween 0.33 and 1.4 annual illnesses per child per year (so approximately 1 illness)Methods: Centered log(rain) at log(800); Used t(0.975,332)
floor
1.01.52.02.53.03.54.0
8101214
-2-1-012
-1.5-1.0-0.50.00.51.01.5
1.01.52.02.53.0
1.01.52.02.53.03.54.0
roof
expperfammember
0500000100000015000002000000
8 10 12 14
log.exppermem
villdensity
0 2 4 6 8 1012
-2 -1 -0 1 2
log.villdensity
illperchild
0 1 2 3 4
1.01.52.02.53.0
-1.5-1.0-0.50.00.51.01.5
0500000100000015000002000000
024681012
01234
log.illchild
Matrix of Variables 1
illperchild
-1.5-1.0-0.50.00.51.01.5
05
101520
01234
-4-3-2-1-012
-238
13
0 1 2 3 4
-1.5-1.0-0.50.00.51.01.5
log.illchild
fmlsz
1 3 5 7
0 5 10 15 20
aduledu
log.illadult
-2 -1 0 1
0 1 2 3 4
illperadult
farmsz
0 1 2 3 4 5 6
-4-3-2-1-01 2
log.farmsz
income.nonfarm
0E+0002E+0064E+0066E+0068E+0061E+0071E+007
01234
-2 3 8 13
1357
-2-101
0123456
0E+0002E+0064E+0066E+0068E+0061E+0071E+007
log.income.nonfarm
Matrix of Variables 2
illperchild
-1.5-1.0-0.50.00.51.01.5
0
50
100
150
-4-20246
0246
0 1 2 3 4
-1.5-1.0-0.50.00.51.01.5
log.illchild
log.primeforest
-2 0 2 4
0 50 100 150
primeforest
secondforest
0100200300400500
-4 -2 0 2 4 6
log.secondforest
totalforest
0250500750100012501500
01234
0 2 4 6
-2024
0100200300400500
0250500750100012501500
log.totalforest
Matrix of Variables 3
illperchild
-1.5-1.0-0.50.00.51.01.5
02468
2468
-6-4-2-02
0 1 2 3 4
-1.5-1.0-0.50.00.51.01.5
log.illchild
waterarea
0 500100015002000
0 2 4 6 8
log.waterarea
rain
0 200040006000
2 4 6 8
log.rain
erosion
0 5 10 15 20
01234
-6 -4 -2 -0 2
0500100015002000
0200040006000
05101520
log.erosion
Matrix of Variables 4
QQ Normal Plot and Residuals Plot for the Final Model
Quantiles of Standard Normal
Re
sid
ua
ls
-3 -2 -1 0 1 2 3
-1.0
-0.5
0.0
0.5
1.0
35152
189
Fitted : fmlsz + log.illadult + roof + nonfarm.binary + R
esi
du
als
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8
-1.0
-0.5
0.0
0.5
1.0
35152
189