groupsrs.stat20.project2
TRANSCRIPT
Exploring the Effects of Exercise on Academic SuccessSpencer Nelson, Brandon Yu, Kandace Mok, Katherine Delk
Introduction
In this study, we are using survey sampling techniques on UCLA students to investigate the relationshipbetween the amount of time spent exercising and academic performance. We are using the variables GPA,gender, major type, time spent studying in hours per week, and the number of days per week spent exercising.Ultimately, we would like to see if a student’s GPA is positively affected by devoting time to exercise. Ourgoal is to use this project to encourage students to maintain a healthy lifestyle, and potentially demonstratethe link between exercising and success in the classroom.
Many studies have proven that there are positive by-products from exercising outside of an individual’sphysical health. One such study, conducted by researchers at Purdue University [1], indicated that exerciseleads to reduced stress levels and this, in turn, makes students more awake (thus, allowing them to studymore). An article by the New York times [2] also noted that the more committed a student is to studying,the more likely they are to be committed to exercising as well - we want to see if commitment to exercisepromotes a strong work ethic in the classroom. Additional research has proven that children who exerciseoften are more attentive, have better time management, and have superior memory and problem solvingskills, which lead to higher scores on tests. We would like to test if a similar relationship exists within oursample of college students, and see how strong that relationship is. Additionally, we want to see if certaingroups, such as different majors, have an influence on academic achievement.
Let’s quickly define a few parameters. By “exercise,” we mean activity requiring physical effort, carried outespecially to improve health or fitness; examples include weight-lifting, yoga, and sports. We have recordedthis variable in days spent exercising per week. Next, we define “academic achievement” strictly as GPA.
We hypothesize that we will observe evidence that increased frequency of exercise has a positive effect on acollege student’s GPA; in addition, we hypothesize that upon subsetting our data by different categories,such as major type, this trend will still hold. In addition, we also believe that GPA will also be observed tobe strongly dependent on other factors, such as hours spent studying per week.
In total, we collected 151 responses via surveys conducted through a Google form sent out to peers.
Data Analysis
Please See Appendix for Enlarged Graphics for Data Analysis and Modeling Sections
Firstly, we would like to get a quick glimpse at our data in preparation for our modeling. For example, let’stake a look at the distributions of our various variables of interest. We would like to investigate whether wecan realize any type of relationship between certain categories. Some interesting questions we would like toanswer include: are high levels of exercise tied to high GPAS; are high frequencies of studying tied to highGPAs; are there differences in the distribution of GPA as you move across different majors?
Humanities Quan Science
Major Type Distribution (1a)
010
2030
4050
60
25
64 62
Low Medium High
Exercise Level Distribution (1b)
Fre
quen
cy
020
4060
80
8042
29
<= 2 days/wk3,4 days/wk>= 5 days/wk
Low Medium High
GPA Level Distribution (1c)
Fre
quen
cy
010
2030
4050
6070
25
53
73< 3.193.2−3.59> 3.59
Low Medium High
Study Frequency Distribution (1d)
Fre
quen
cy
010
2030
4050
60
33
5365
<10 hr/wk10−20 hr/wk>20 hr/wk
1
*A note on how “major types” were divided in Fig.1a
We define Humanities as creative-thinking majors, including writing, political science, and linguistics.Quantitative majors include statistics, mathematics, economics, and engineering. Science majors relate tosubjects tied with the life sciences, including biology, chemistry, and psychology.
Low Exerc. Med Exerc. High Exerc.
GPA and Exercise Levels (Fig. 2a)
010
2030
40
Low GPA (<3.19)Med GPA (3.2−3.59)High GPA (>3.6)
Female Male
GPA and Gender (Fig. 2b)
010
2030
Humanities Quantitative Science
GPA and Major Type (Fig. 2c)
05
1525
35
(2a)/(2b)/(2c) Here we have created barplots to observe how GPA varies across different categories inpreparation for our chi-squared test of independence. Notice how in Fig. 2a, the distribution of GPA doesnot seem to vary very much across different levels of exercise. Similarly in Fig. 2b, we can see that thedistributions of GPA among males and females are not radically different. However, in Fig. 2c, the distributionof GPA seems to change as you move across different major types. In particular, under Humanities, mediumGPA levels makes the largest chunk, while for Science majors, high GPAs is the most prevalent.
Humanities Quantitative Science
01
23
45
67
Major Type and Frequency of Exercise (Fig.3a)
Day
s S
pent
Exe
rcis
ing
per
Wee
k
Humanities Quantitative Science
010
3050
Major Type and Hours Studied (Fig.3b)
Hou
rs S
pent
Stu
dyin
g pe
r W
eek
0 1 2 3 4 5 6 7
010
3050
Study Hours v. Exercise Days (Fig. 3c)
Days Spent Exercising per Week
Hou
rs S
pent
Stu
dyin
g pe
r W
eek
(3a)/(3b) We would like to investigate if there are particular differences among our different majors whichmay be accounting for the varying distributions of GPA. In Fig. 3a, we notice that the distribution of thenumber of days per week spent exercising is quite similar across our three majors; in fact, Quantitative majorsand Science majors have identical distributions. However, by contrast in Fig. 3b, we can see the distributionof the number of hours per week spent studying varies much more; in particular, Humanities majors seem tobe spending less time studying, whereas Quantitative majors have the highest median in hours spent studyingper week. We will investigate the independence of Major Type and Study Levels in the next section.
Modeling
Now, we would like to perform chi-squared tests to observe if there exists independence between our categoriesof interest. For example, we will start by observing if there is any independence between GPA Levels andExercise Levels. We noted in our Data Analysis section that we noticed that, upon visual inspection, theredid not seem to be much variation in the distribution of GPA Levels across different Exercise Levels (see Fig.2a). Thus, we suspect that GPA Levels and Exercise Levels are independent of one another; or in other words,that we cannot predict GPA from Exercise Levels. More formally, we construct our hypothesis as follows.
Ho : GPA Levels and Exercise Levels are independent of one another.
Ha : GPA Levels and Exercise Levels are NOT independent of one another.
Running a chi-squared test of independence yields the following results:
2
0 5 10 15 20
0.00
0.10
Chi−Square Density Graph: df = 4
<−−− p = 0.8248χ2 = 1.5105
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
Standard Normal
dnor
m(x
, 0, 1
) Low Ex (<= 2)Med ExHigh Ex (>=5)Rejection region
At a significance level of α = 0.05, we fail to reject the null hypothesis; there is convincing evidence thatknowing a student’s Exercise Level will not help us predict his or her GPA Level, and that these two variablesare independent. Notice how our standardized residuals, which can be thought of as z-values under a standardnormal curve, stay between our rejection regions.
GPA v. Major Test
Observing Fig.2c from our Data Analysis section, we notice that we do NOT have similar GPA distributionsacross our different major types. In particular, medium GPA seems to make up a large proportion of theobservation in Humanities students compared to students studying Quantitative and Science topics. Wewould like to test if this implies the two variables are not independent. We set up our hypotheses similarly:
Ho : GPA Levels and Major Types are independent of one another.
Ha : GPA Levels and Major Types are NOT independent of one another.
Running a chi-squared test of independence yields the following results:
0 5 10 15 20
0.00
0.10
Chi−Square Density Graph
χ2 = 10.589
p = 0.03159
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
Standard Normal
dnor
m(x
, 0, 1
)
− 2.344
1.9952.856
HumanitiesQuantitativeScienceRejection region
Because our p-value is under the significance level of α = 0.05, we can safely reject the null hypothesis; there isconvincing evidence that GPA Levels and Major Type are NOT independent. We are able to pinpoint whichcategories are statistically significant. Notice how we have a highly negative residual for Science studentsunder our Medium GPA category of -2.34 and a highly positive residual for Science students for our HighGPA category of 1.99. This is an indication that our sample data underestimated the expected number ofScience students in the high GPA category, which was counterbalanced by overestimating the number ofScience students in the medium GPA category. Similarly, for Humanities students, our stray residual of 2.86demonstrates our sample data underestimated the expected number of Humanities students in the MediumGPA category, which was counterbalanced by overestimates in the Low GPA and High GPA categories.
Different Habits among Students of Different Majors?
3
Upon our results which show that GPA and Major are not independent, we would like to investigate if thereare certain habitual differences among students of different majors. In particular, we would like to test ifthere exists independence between a student’s major against two factors: his or her level of exercise and howfrequently he or she studies per week. Let’s investigate exercise as our first variable of interest. Again, we setup the hypotheses:
Ho : Major Types and Exercise Levels are independent of one another.
Ha : Major Types and Exercise Levels are NOT independent of one another.
Running a chi-squared test of independence yields the following results:
0 5 10 15 20
0.00
0.10
Chi−Square Density Graph: df = 4
χ2 = 6.559
p−value = 0.195
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
Standard Normal
dnor
m(x
, 0, 1
) HumanitiesQuantitativeScienceRejection region
At a p-value of 0.195, we fail to reject the null hypothesis; there is, in fact, convincing evidence demonstratingthat Major Types and Exercise Levels are independent of one another. This confirms our first chi-squaredtest, which showed that a student’s GPA and his or her exercise level were not dependent on one another.Notice again how our standardized residuals stay outside of the critical regions of our standard normal graph.If not exercise level, we suspect that there must be another factor influencing the differences in GPA amongdifferent major types. We will now focus our attention on analyzing if there exists independence between astudent’s particular major and how frequently he or she studies per week.
Major Type v. Study Levels
Here, we will be investigating if there is variation in the frequency of a student’s studying based on his or hermajor. Again, we set up our hypotheses similarly:
Ho : Major Types and Study Levels are independent of one another.
Ha : Major Types and Study Levels are NOT independent of one another.
Running a chi-squared test of independence produces the following results:
0 5 10 15 20
0.00
0.10
Chi−Square Density Graph: df = 4
χ2 = 13.123
p = 0.01069
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
Standard Normal
dnor
m(x
, 0, 1
)
− 2.548− 1.987 2.478
2.93
HumanitiesQuantitativeScienceRejection region
4
At a p-value of 0.01, we reject the null hypothesis; there is convincing evidence that Major Levels and StudyLevels are NOT independent. Thus, we have shown that there does, indeed, exists differences in study habitsamong students of different majors. We can observe our residual summary to pinpoint which categories arecontributing to the test’s statistical significance. In particular, notice that in our Quantitative category, ourresidual of 2.48 indicates we vastly underestimated the number of students who study frequently, and thiswas counterbalanced by by an overestimation of Quantitative students who had low frequencies of studying,as indicated by the negative residual of -1.99. In addition, the opposite trend occurred among Humanitiesstudents, where our sample data overestimated the expected number of these students with high frequenciesof studying, indicated by the negative residual of -2.55; this was counterbalanced by the underestimation ofHumanities students with low levels of studying, as indicated by the highly positive residual of 2.93.
Conclusion
Overall, our findings in this study reject our initial hypothesis that we would observe differing distributionsof academic performance among students who exercised at different weekly frequencies. Rather, it appearsthat GPA distribution varies when we categorize students based upon their field of study. Furthermore, upondividing students by major types, we find that it is likely differences in the number of weekly hours dedicatedto studying which accounts for this non-uniform GPA distribution.
Let us revisit some of our statistical results which led to the aforementioned conclusions. After running achi-squared test of independence between student’s GPA levels (low, medium, high) and their weekly exercisefrequency, we obtained a p-value of 0.82, a strong indication that the two categories are independent. Inother words, we do not expect to see substantially variable GPA distributions among students who exerciseat different rates. A similar analysis between GPA levels and Major Types yielded an extremely low p-valueof 0.03; again, this was a very good indicator that we expect the distribution of GPAs to change as we moveacross different major types. Indeed, our residual analysis proved this to be true; for Science students, ahighly positive residual of 1.99 showed our sample data underestimated the expected number of studentsin High GPA category, and this was compensated by overestimating the number of Science students in theMedium GPA category - indicated by a highly negative residual of -2.34. Similarly, a residual of 2.86 indicatedour sample data underestimated the expected number of Humanities students in the Medium GPA category,which was offset by an overestimation of Humanities students in the High GPA category.
We were interested in investigating potential reasons as to why GPA distribution varied across Major Types,so we ran two separate chi-squared tests of independence: Major Type v. Exercise Levels and Major Type v.Study Frequency. Unsurprisingly, our test between Major Type and Exercise Levels yielded a p-value of 0.195,demonstrating that knowing a student’s major does not give us information about his or her frequency ofexercise. This adds a level of confirmation to our first chi-squared test which showed GPA Levels and ExerciseLevels were independent. However, running a chi-squared test between Major Type and Study Frequencyyielded a p-value of 0.01, exemplifying that the distribution of student’s study frequency should be expectedto be different among different majors. Indeed, residual analysis demonstrated that we underestimated thenumber of Humanities students with Low study frequency and overestimated the number of Humanitiesstudents with High frequency of study. This, indeed, aligns with the fact that the Humanities lacked a highproportion of its students in the High GPA category, a strong indication that hours spent studying and GPAare strongly dependent.
Let’s discuss the real-world implications of our results. We have found evidence against the claim that thereis dependency between GPA Level and Exercise Level. And this results does make sense; one would notexpect exercise alone to be a contributor to a high GPA. Some make the claim that students who exercisemore have higher GPAs, because students who exercise more also tend to be more active academically. Forour particular sample, however, as seen in Fig. 3c, whether a student exercises zero days per week or sevendays a week, the distribution of hours spent studying seems fairly uniform. We then conclude that exercisealone has little effect on GPA, and that higher GPAs are largely a byproduct of simply longer hours dedicatedto studying; studies which claim a relationship exists between exercise and GPA likely derive their resultsfrom samples containing students who BOTH study highly frequently AND exercise highly frequently.
5
Appendix
References
[1] A study by Purdue University students investigating the effects of exercise on academic success
http://www.purdue.edu/newsroom/releases/2013/Q2/college-students-working-out-at-campus-gyms-get-better-grades.html
[2] A study by the New York Times investigating the positive effects of exercise on cognitive abilities andmental health.
http://well.blogs.nytimes.com/2010/06/03/vigorous-exercise-linked-with-better-grades/
Below are enlarged graphics from our Data Analysis and Modeling Sections
Humanities Quan Science
Major Type Distribution (1a)
010
2030
4050
60
25
64 62
Low Medium High
Exercise Level Distribution (1b)
Fre
quen
cy
020
4060
80
8042
29
<= 2 days/wk3,4 days/wk>= 5 days/wk
Low Medium High
GPA Level Distribution (1c)
010
3050
70
25
53
73< 3.193.2−3.59> 3.59
Low Medium High
Study Frequency Distribution (1d)
Fre
quen
cy
010
2030
4050
60
33
5365
<10 hr/wk10−20 hr/wk>20 hr/wk
6
Low Exerc. Med Exerc. High Exerc.
GPA and Exercise Levels (Fig. 2a)0
1020
3040
Low GPA (<3.19)Med GPA (3.2−3.59)High GPA (>3.6)
Female Male
GPA and Gender (Fig. 2b)
010
2030
Humanities Quantitative Science
GPA and Major Type (Fig. 2c)
05
1015
2025
3035
7
Humanities Quantitative Science
01
23
45
67
Major Type and Frequency of Exercise (Fig.3a)
Day
s S
pent
Exe
rcis
ing
per
Wee
k
Humanities Quantitative Science
010
2030
4050
60
Major Type and Hours Studied (Fig.3b)
Hou
rs S
pent
Stu
dyin
g pe
r W
eek
0 1 2 3 4 5 6 7
010
2030
4050
60
Study Hours v. Exercise Days (Fig. 3c)
Days Spent Exercising per Week
Hou
rs S
pent
Stu
dyin
g pe
r W
eek
0 5 10 15 20
0.00
0.05
0.10
0.15
Chi−Square Density Graph: df = 4
<−−− p = 0.8248
χ2 = 1.5105
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
Standard Normal
dnor
m(x
, 0, 1
)
Low Ex (<= 2)Med ExHigh Ex (>=5)Rejection region
8
0 5 10 15 20
0.00
0.05
0.10
0.15
Chi−Square Density Graph
χ2 = 10.589
p = 0.03159
−3 −2 −1 0 1 2 30.
00.
10.
20.
30.
4
Standard Normal
dnor
m(x
, 0, 1
)
− 2.344
1.995
2.856
HumanitiesQuantitativeScienceRejection region
0 5 10 15 20
0.00
0.05
0.10
0.15
Chi−Square Density Graph: df = 4
χ2 = 6.559
p−value = 0.195
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
Standard Normal
dnor
m(x
, 0, 1
)
HumanitiesQuantitativeScienceRejection region
9
0 5 10 15 20
0.00
0.05
0.10
0.15
Chi−Square Density Graph: df = 4
χ2 = 13.123
p = 0.01069
−3 −2 −1 0 1 2 30.
00.
10.
20.
30.
4
Standard Normal
dnor
m(x
, 0, 1
)
− 2.548
− 1.987 2.478
2.93
HumanitiesQuantitativeScienceRejection region
10