pedagogical approach to ability group sectioning- a ... · pedagogical approach to ability group...
TRANSCRIPT
Pedagogical Approach to Ability Group Sectioning- A Mathematical
Investigation
Chris Collins, Jim Pleuss, Dusty Turner
This paper was completed and submitted in partial fulfillment of the Master Teacher Program, a 2-year faculty
professional development program conducted by the Center for Faculty Excellence, United States Military Academy,
West Point, NY, 2018.
_____________________________________________________________________________________
Abstract: In the continual pursuit of classroom learning effectiveness, researchers and
educators aim to develop strategies that improve student performance and learning. One such
strategy is to create academically homogeneous environments where students are grouped
into classes based on their preconceived academic ability. The research team hypothesizes that
the students placed in homogenous ability groups, regardless of ability level placement, will
perform better than they would in normal heterogeneous classes. Students during the AY 18-01
were placed in either ability or randomly grouped sections of MA206 based on mathematical
projections of a student’s historical mathematical performance. The analysis uses data across
major graded events, as well as qualitative data in the form of end-of-course surveys to gauge
student and faculty feedback. A quantitative analysis of these results indicates that there is not
a significant difference in student performance between ability and randomly grouped sections.
Contents Pedagogical Approach to Ability Group Sectioning- A Mathematical Investigation ............... 1
Contents ...................................................................................................................................... 1
1. Introduction ............................................................................................................................. 2
2. Background .............................................................................................................................. 3
3. Literature Review .................................................................................................................... 5
4. Pedagogical Approach: ............................................................................................................ 7
5. Hypothesis ............................................................................................................................... 7
6. Methodology ........................................................................................................................... 8
6.1. Methodology Design ............................................................................................................ 8
6.2. Data .................................................................................................................................... 10
6.3. Model Building ................................................................................................................... 11
6.3.1. Linear Regression ............................................................................................................ 12
6.3.2. Regularization .................................................................................................................. 12
6.3.3. Random Forest ................................................................................................................ 13
6.3.4. Ensemble ......................................................................................................................... 13
6.4. Survey Methodology .......................................................................................................... 14
7. Results ................................................................................................................................... 15
7.1. Model Results ..................................................................................................................... 15
7.2. Multiple Pairwise Comparisons .......................................................................................... 15
7.3. Testing Effect of Sectioning By Ability ................................................................................ 17
7.4. Student Survey Results ....................................................................................................... 18
7.5. Faculty Results .................................................................................................................... 21
8. Discussion .............................................................................................................................. 22
9. Conclusion ............................................................................................................................. 23
References: ................................................................................................................................ 25
1. Introduction In the continual pursuit of classroom learning effectiveness, researchers and educators aim to
develop strategies that improve student performance and learning. One such strategy is to
create academically homogeneous environments where students are grouped into classes
based on their preconceived academic ability. There are many inherent challenges within such
studies if the goal is statistically robust findings extending beyond the anecdotal: developing
effect control and treatment groups while maintaining a sufficiently large sample size is difficult
without creating uncomfortably large classroom environments; isolating specific variables to
test for effectiveness is also difficult with innumerable other influencing factors, or “noise”
variables; finally, the administrative support required to do a large-scale experiment of this
nature can be difficult to obtain. Nonetheless, the literature is rich with pedagogical studies
exploring the academically homogeneous classroom.
This research expands and further explores the pedagogical discussion of grouping
undergraduate students by academic ability groups. Research predominantly focuses on the
benefits for elementary and middle school students with few studies on undergraduate
education. Therefore, within the Department of Mathematical Sciences at the United States
Military Academy (USMA), our research team wants to explore the impact of an academically
homogeneous classroom environment on student performance, student perception of learning,
and instructor practices. Based on current research in the literature and past studies along with
consultation from senior faculty within the Mathematics Department, we developed a study
that took 580 students (predominantly sophomores1) taking Introduction to Probability and
Statistics (MA206) during the Fall Semester of Academic Year 2018 (AY 18-01) and created two
sets of classroom environments (i.e. student distribution). The control group, or random group
as we will call it in this paper, contained sections, or classes, to which students were randomly
assigned. The treatment group, or ability group, was divided into sections based on their
projected performance.
This paper will begin by discussing the background of the Research Team’s study, to include the
unique circumstances and academic environment at USMA, along with a review of the current
state of the literature in this field. We will then explain the methodology of the Research
Team’s study and the model we used to determine academic ability. The results of our study
will follow, focusing on student performance on course-wide major graded events as well as
student and faculty survey results. We then conclude with a discussion of the results,
contributions of this research to the literature, and some pedagogical recommendations and
observations to the field.
2. Background The mathematical curriculum for students at USMA require all students to take a minimum of
three math courses in order to satisfy graduation requirements and as part of the program
accreditation for attainment of a Bachelor’s of Science. Students can take additional math
courses as part of their major. However, they are required to take the following three core
courses: 1.) MA103: Mathematical Modeling; 2.) MA104: Introduction to Calculus; and 3.)
MA206: Introduction to Probability and Statistics. If a student selects a major that is STEM
(Science, Technology, Engineering, and Mathematics) focused then they will be required to take
an additional mathematical course, MA205: Multivariate Calculus. Within each of the core math
core math courses, students have an opportunity to be placed in advanced sections based on
past performance and/or as accessed through placement exams proctored before the start of
the academic year. These advanced courses mirror their adjacent courses and are given their
own course nomenclature. The intent of these courses is to provide high ability students the
opportunity to compete with other like-ability students and be challenged in classes that offer
advanced instruction in what would likely otherwise be unchallenged in mixed ability classes.
USMA is a unique institution, and provides an opportunity as a specific case study, as all
students that have been accepted into the academy have gone through an extensive and
rigorous academic, physical, leadership, and background vetting process. As a consequence, all
graduates earn a Bachelor’s of Science upon graduation. The median score for the math portion
of the ACT, for the graduating Class of 2020 (the majority of the research sample population), is
1 Referred to as Yearlings at USMA.
292, thus supporting the academic pedigree of USMA students.3 Therefore, it is assumed that
the majority of students should have a strong mathematical background and foundation as well
as the capacity to learn. In conjunction with small class sizes (~15-16) and the nation’s #1 most
accessible university professors4 the conditions are set for students to succeed in class.
Unfortunately, due to a myriad of reasons not all students succeed academically while they are
Cadets at USMA. As instructors and professors of these students, it is their duty to determine
where the gap or problem exists and find ways to curtail these deficiencies.
USMA has a deep and rich legacy of many things, with their academic approach or pedagogy
being one of them. The Thayer System, named after the institution’s first Superintendent, is
geared towards placing the learning responsibility onto the student. As such, it is the student’s
responsibly to read an assignment daily in preparation for the class they are going in to have
and come prepared to participate in discussion and further build upon their knowledge base
through application and lecture. This coupled with the aforementioned small class sizes lends
itself to an environment that promotes academic rigor while also providing opportunity for
more individual attention from the instructors. USMA spends a lot of energy and resources to
ensure that the conditions are met for students to learn in these intimate classroom
environments. It does not overtly dictate the composition of the classes themselves, however.
This heterogeneous, randomness therefore is then a key predictor for the type of classroom
environment that each class composition provides.
Each major department at USMA has the autonomy to section students in their classes as they
prefer, though they must adhere to the Dean’s Policies.5 Within the Department of Math a
student can be selected to populate advanced courses, in line with graduation requirements,
through the demonstration of past academic achievement. However, within the Department of
Math there is not a concerted effort to section classes within the common core curriculum by
ability.
Research has been done that both supports and fails to support the advantages of ability
sectioning, or academically homogenous grouping. Principally, this is done at the elementary
and high school levels where school administrators have the benefit of more regimented class
schedules and common curriculums. There is comparatively little research done regarding the
impact of such academically homogeneous classrooms at the undergraduate level. The inherent
institutional structure of USMA creates an adequate environment for effective experiments of
this nature. The Research Team’s results provide justification to some of the existing literature
as well as novel discoveries and conclusions not yet explored.
2 Class profile of the Class of 2020, Office of the Dean, USMA. 3 According to the ACT National Profile Report of the Graduating (HS) Class of 2016 the national average of the math portion of the ACT was 20.6. The median was not reported but it can be extrapolated from the Quartile Score report that the national median for the Class of 2016 was 19 based on a sample size of 2M. 4 Princeton Review, 2018. 5 Per Dean’s Policy and Operating Memorandum 02-10, dated 16 August 2004.
3. Literature Review For the sake of clarification, especially in the academic realm, there is a unique distinction of
the definitions of both ability groups and of “tracked students”.6 From his research, William Viar
[2008] identifies “tracking” as the placement of students into directed curriculum (math,
history, English, science, etc.) based on past achievement and potential for success. These
directed curriculums, or tracks, streamline students based on their performance as well as their
parent’s wishes. In the case of mathematics, a high school student would be tracked as having
the capacity and academic path that would require them to meet college math prerequisites.
Therefore they would take math classes all the way to their senior year ending with Calculus. A
high school student that was not tracked for higher education, possibly indicating preference
for vocational training, may opt out of the minimum high school graduation requirements and
end at two or three years of high school math (Algebra II or Pre-Calculus). Conversely, ability
groups are most visibly organized, according to Viar, within elementary schools and principally
with reading groups. Students would be placed into ability groups where the instruction would
be given based on the capability of the students. In his research of grouping in high school
classes7, Dr. Robert Slavin8 identifies five different types of ability grouping: grouping students
as a class by ability for all subjects, primarily heterogeneous grouping, mix of ability and
heterogeneous grouping, non-graded instructional grouping, and in-class grouping. Though
these five types of groupings differ they come back to the principal difference between ability
grouping and tracking which is simply capability vs. capacity.
The majority of the literature on ability groups focuses on grouping within elementary school
classrooms with a preponderance of the authors concluding that they did not support the
practice of sectioning by ability, except in very specific cases which will be highlighted later in
this section. These negative reviews and research of ability grouping could be due to the
composition of these education levels where other developmental indicators may not be
considered. Nonetheless, their observations and notes are valuable as important lessons can
still be derived from them. Since this study will be testing the hypothesis for potential gain from
ability grouping it may be beneficial to first highlight the contrarian arguments against this
methodology in order to gain perspective and better shape the methodology for this research.
Tom Loveless is an education research/former professor whom is the most referenced author in
cases against ability grouping and tracking. In his seminal work, “Making Sense of the Tracking
and Ability Grouping Debate”9, he concludes that tracking, and to a lesser extent ability
grouping, “fosters race and class segregation”, potentially harms students’ self-esteem, and
leads to a self-fulfilling prophecy for most students. He also poses the ethical issues raised
against the practice of sectioning by ability as it may set and maintain conditions of inequality
6 Viar, 2008. 7 High School is also referred to as secondary school whereas elementary school is referred to as primary school. This research focuses above high school or what is considered undergraduate or post-secondary education. 8 Slavin, 1990. 9 Loveless, 1988.
amongst demographic and socio-economically different student populations. Mr. Loveless
concludes that there is not a definite net that gained from ability grouping or tracking. The
disadvantages are further discussed in Colonel Gary Tildwell’s10 research citing developmental
hazards such as labeling, psychological damage for low-ability performers, and difficulty in
shedding the labels and moving to higher ability groups. These disadvantages are difficult to
quantify and are qualified anecdotally, without consistent metrics employed. Notwithstanding
the ethical or moral implications, the Research Team will focus on the application of the
grouping by ability and the effects on the student.
Within the “Notes” of Mr. Loveless’ research he cited groups and institutional entities that
“condemned” tracking. These notes were used in the forward of his research but a deeper
review revealed that a large portion (~50%) of surveyed educators wanted more ability
grouping in K-12 classrooms.11 Additionally, only 40% of those surveyed said they believed that
heterogeneously mixed classrooms would improve a student’s education. As those surveyed
were binned by those that taught in middle school, those that preferred ability grouping rose to
66%, indicating that they saw the benefit of pairing students with similar abilities. The research
for the case for sectioning by ability can be summarized as revolving primarily around the
benefits to the students, which should be expected as most teachers show traits of selfless
service and expertise in their lesson preparation. Therefore the majority of the advantages to
ability grouping are, and should be, focused on the benefits to the students. These advantages
relate to the comfort level of the students, as students of like ability tend to be more
comfortable around each other. In these environments everyone should have similar capacity,
promoting healthy conversation within the classroom. The thought is that a question for one is
actually a question for all. The added benefit for the teacher is that they can seemingly teach at
one level, or pace, alleviating the varying types and levels of questions from the normal flow of
class. In this way a response to one is actually a response to many. With regards to the special
cases where ability grouping was found to be beneficial, both Loveless and Slavin agree that
these were a result of the institution itself. An institution that cements a culture that focuses on
effort and held students accountable were most correlated to strong performances within
ability groups. These institutions were identified as Catholic Schools, where even low track
classrooms were facilitated through good teaching, with small classes, student participation,
and parent involvement.
In comparison to studies that focus on elementary and middle school ability grouping, there is
not a comparable amount of research for undergraduate education. However, a peer from the
Department of Mathematics at USMA conducted his own research on ability group sectioning in
200712. LTC Randall Hickman grouped classes by ability in a USMA Core Math Class, MA205
(Multivariate Calculus). He created these homogenous “sections” and then collected
10 Tildwell, 2007. 11 Loveless, 1988. 12 Hickman, 2007.
performance data based on how students did on the final, comprehensive test for MA205
named the Term End Exam (TEE). His quantitative analysis of the TEE scores indicates that
students perform better on the TEE, compared to previous years. He further determines that
the high performing students perform as expected but the largest increase was within the
lower ability groups/sections, where they were “pulled up” by their peers. His results are
mixed, however; the ability grouping was most beneficial to the lower ability groups as opposed
to the higher ability groups. These results are interesting and form the bases for a deeper
investigation in the placement and tracking of students within these ability groups both
providing a case for lower and against higher ability groups.
4. Pedagogical Approach: In his research, Colonel Gary Tidwell13 provides the following skills that are necessary for
instructors that teach ability sections, regardless of the level of the grouping:
1. Instructors must create the conditions in the classroom where all students are
challenged.
2. Instructors must be able to provide opportunities to cover new lesson material
faster.
3. Instructors must be knowledgeable of the material so that they can be able to be
flexible for all students.
The Research Team’s pedagogical approach for this study investigates grouping students by
common mathematical ability. Different level and depths of preparation for varying ability
group sectioning is expected with different composition of academic abilities. It is assumed that
the pedagogical approach outlined above is achievable due to the small class sizes of MA206 as
well as instructor educational experience (e.g. the material is well known). It is additionally
important to recognize the experience and comfort level of the material within the instructors
(MA206 teachers are not first semester instructors and thus have experience teaching). This
assumption is important as all of these instructors would have experience teaching
heterogeneous classes within the core math program according to the teaching hierarchy
within the department. This will allow instructors the flexibility to focus on their own teaching
approach.
5. Hypothesis The research team hypothesizes that the students placed in homogenous ability groups,
regardless of ability level placement, will perform better than they would be projected to
perform in normal heterogeneous classes. We hypothesize that this is because the students in
homogenous ability classrooms should be more participative based on everyone theoretically
being of equal standing with a more uniform pace to the class. The team further hypothesize
that those that will be identified as “lower performers” will benefit more than those that are
13 Tidwell, 2007.
identified as “high performers”. Muir would support this as he introduced the idea of “self-
fulfilling prophecy”, meaning that the high-performers will earn the grade that they want.
Regardless, the Research Team is initially interested in the short term benefits as assessed
based on performance on course-wide graded events. There is also a strong interest in the
perceived benefits of this methodology in the long term. While this is difficult to quantify, we
remain dedicated to the USMA and Department of Mathematics focus on life-long learning and
anticipate growth in this as a result of this research.
6. Methodology Students were “tracked” in Loveless’ research by assigning them to English and mathematics
courses based on a variety of metrics to include the student’s previous grades, teacher
recommendations, and placement tests.14 Similar to this model, the participants of the
Research Team’s study were composed of those that took MA206 in AY 18-01. This population
tends to include many STEM majors or those that were on the advanced math track based on
general Academy scheduling.
The Research Team’s methodology differs from LTC Hickman’s due to the selection of students
for the study. We use a mathematical model to predict the scores of every student taking
MA206 in AY 18-01 and base the structure of our experiment on this predicted score. This gives
us a more robust research foundation and provides a better framework for analysis of the
Research Team’s results.
6.1. Methodology Design With the ultimate goal of determining if students learn better in classrooms with peers of
similar ability, we must first properly determine their ability. Based on two previous years of
performance in MA206, we develop a model to predict the future performance of students
in academic year 18-01. We discuss the development of this model in detail in future
subsections.
We use this predictive model to project the course average for each student. In order to
isolate the impact of ability grouping on student learning, we design the experiment by
breaking down class hours for the two groups.
Hour Day Time Ability / Random Students
B (2nd) 1 840-935 Ability 145
C (3rd) 1 950-1045 Random 135
H (2nd) 2 840-935 Random 157
I (3rd) 2 950-1045 Ability 143
14 Loveless, 1998.
Table 1: Experimental Design Technique to Control for Class Day/Time
All students take MA206 during 2nd or 3rd hours on Days 1 or 2 for AY 18-01. As noted in
Table 1, we designate two class hours (C and H) to randomly section students and two class
hours (B and I) to section students by their predicted ability. We split these sections across
day and time slot in order to control for any effect day or time might have had on student
performance.
Furthermore, we take steps to control for the impact a teacher has on student learning.
Based on constraints on available teachers, instructors teach the same predicted ability for
Ability hours. Some instructors are not available to teach all four hours, adding the
possibility of confounding variables to our results. Most instructors also teach two Random
sections.
The teacher breakdown is shown in Table 2, where each instructor has a unique ID number
and the projected ability of a section decreases as the corresponding number increase. For
example, Ability 1 indicates the Ability section consisting of students with the best
projected scores. Conversely, Ability 9 is the section containing students with the worst
projected scores.
Instructor ID B Hour I Hour C Hour H Hour
1 Ability 5 Ability 5 Random Random
2 Ability 6 Random
3 Ability 8 Ability 8 Random Random
4 Ability 1 Random Random
5 Ability 1
6 Ability 6
7 Random
8 Ability 3 Ability 3 Random Random
9 Ability 7 Ability 7 Random Random
10 Ability 2 Ability 2 Random Random
11 Ability 4 Ability 4 Random Random
12 Random
13 Ability 9 Ability 9 Random Random
Table 2: Instructor Breakdown Sections
In order to further limit the impact of the instructor on final grade, we will only analyze
course wide graded events which include three mid-term tests and one final exam. The
Research Team’s prediction model uses overall course average instead of only course-wide
graded events as the primary response variable as this was the only grade data available for
the previous semesters.
6.2. Data In order to predict student performance, the Research Team uses data on 1023 students
from the academic semesters AY 17-01 and AY 17-02. Students at USMA take one of several
main tracks through the core math program as shown in Error! Reference source not
ound.. The team removed four students that took very non-standard tracks from this
analysis.
Core Math Track Courses Number of Cadets
Standard Non STEM MA103, MA104 454
Standard STEM MA103, MA104, MA205 320
STEM w/ MA103 Validation MA104 23
Remedial to STEM w/ Validation MA100, MA104, MA205 10
Advanced Non STEM MA153 8
Advanced STEM MA153, MA255 197
Advanced to Standard MA153, MA205 7
Table 3: Track through the Core Mathematics Program at USMA
The team uses these common tracks in order to develop the set of variables in Table 4. These
variables represent the factors for possible inclusion in the Research Team’s predictive model.
Table 4: Variables Used for Possible Inclusion in the Predictive Model
Unfortunately, upon synthesizing the data under the framework of Table 4, the team ends
up with observations containing missing data. 2.26% of the Single Variable Calculus Grade
and 0.1% of the Modeling Course Grade are missing due to Cadets validating these courses.
One option in dealing with this missing data is to ignore the pertinent observations. This is
not ideal as doing so eliminates certain tracks of how students move through their math
courses. Because of this, we decided to impute the data. A typical assumption when
Independent Variable Description
Modeling Grade Final Grade of MA103 or MA153
Single Variable Calculus Grade Final Grade MA104 or MA153
Multi-variable Calculus Grade Final Grade MA205 or MA255
Core Math Average Average of Core Math GPA
CQPA Cumulative Academic Grade Point Average
CCPS Cumulative Overall Grade (Academic/Military/Physical)
Rock Whether or not a Student took an Introductory Math Course
imputing data is that the data is missing at random15. This is not true in our case, but we
believe nonetheless that imputing the data will actually enhance the Research Team’s
model instead of biasing it.
To impute the missing data, we use a K-Nearest-Neighbor clustering method with ’k’ = 5. In
summary, after centering and scaling the data, the five nearest observations in Euclidean
distance are averaged and that number is assigned as the missing data point.
6.3. Model Building To begin the construction of the Research Team’s model, we separate our data into a
testing and training set consisting of 75% and 25% of the available observations,
respectively. From there we create three different types of models: a Random Forest, a
LASSO, and a Linear Regression Model, each developed through 10-Fold Cross Validation
from the CARET 16(Classification And REgression Training) package in R. We ensemble our
models (combine them all in a linear combination) in order to limit the errors made by any
particular model.17
Once we arrive at our ensemble model, we use the entire dataset to build our final
ensemble model. The results we present below are the final model created from the entire
dataset.
Error! Reference source not found. below provides a summary of the models. RMSE is the
esidual mean squared error and gives the sum of squared errors for each of the
observations. Lower numbers indicate less overall error from actual course grades to the
predicted ones. R2 is the coefficient of variation and reveals the proportion of variation in
course score accounted for by the model. The closer the number to one, the more
variation the model takes into account.
Model RMSE R2 MAE
Linear Regression .2631 .9308 .1873
Regularization .2626 .9310 .1901
Random Forest .3148 .9010 .2314
Ensemble Model .2663 .9291 .1943
Table 5: Model Summary
15 Heitjan and Basu 2012. 16 Kuhn et. al, 2017. 17 Dietterich, 2000.
6.3.1. Linear Regression To arrive at the best linear regression model, the stepAIC function was used from the
MASS18 library in R. After 10 Fold Cross-Validation, the coefficients and associated p-values
for the best model is shown in Table 6.
Covariate Coefficient p-Value
Intercept -87.712 ≤ .05
Core Math Average 1.861 ≤ .05
CQPA .065 ≤ .05
CCPS .0532 .0667
Took MA100 -.2771 ≤ .05
Modeling -.5148 ≤ .05
Single Variable Calculus -.6233 ≤ .05
Standard Non STEM .9141 ≤ .05
Standard STEM .8738 ≤ .05
STEM w/ MA103 Validation .8176 ≤ .05
Remedial to STEM w/ Validation 1.2673 ≤ .05
Advanced Non STEM .9070 ≤ .05
Advanced STEM .8528 ≤ .05
Table 6: Resulting Coefficients from Linear Regression Model
The reader may notice that the p-value for CCPS is > .05. We acknowledge that the p-value
is larger than our significance level (α), however, we leave it in the model because this
model has the smallest error in predicting our test set.
The four modeling assumptions for linear regression (linearity, normality,
heteroscedasticity, and independence) are satisfied.
6.3.2. Regularization To arrive at the best regularization model, tuned parameters of α at 0, .55, and 1 and λ from
0 to 1 using the glmnet package through CARET. After 10 Fold Cross-Validation the best
model as determined by RMSE with α = 1 and λ = .001821 is as follows:
Predictor Variable Coefficient
Intercept .00468
Core Math Average 1.7842
CQPA .0725
CCPS .0537
Rock -.2301
Modeling -.48654
18 Venables and Ripley, 2002.
Single Variable Calculus -.5838
Standard Non STEM .0318
STEM w/ MA103 Validation -.0439
Remedial to STEM w/ Validation .2976
Advanced NON Stem .0189
Advanced STEM -.00039
Advanced to Standard -.7877
Table 7: Resulting Coefficients from Regularization Model
6.3.3. Random Forest We also ran a random forest model using the randomForest package in R. We arrived at the
best tree-based model through creating 500 trees per forest. After 10 Fold Cross-Validation,
the best model (as determined using the smallest RMSE with 13 variables randomly
sampled as candidates at each split), organized by variable importance as determined by
node purity is shown below in Table 8.
Rank Predictor Variable
1 Core Math Average
2 CQPA
3 Single Variable Calculus
4 Modeling
5 Rock
6 CCPS
7 Advanced STEM
8 Standard Non STEM
9 Standard STEM
10 STEM w/ MA103 Validation
11 Rock
12 Advanced to Standard
13 Remedial to STEM w/ Validation
14 Advanced NON Stem
Table 8: Rank of Variable Importance from the Random Forest Model
6.3.4. Ensemble While there are many complicated ways to ensemble multiple models, we shied away from
the more complicated methods of bagging, boosting, or stacking and averaged the results of
the model. This helped maintain interpret-ability while giving equal weight to each model.
Instead of simply picking one model with the lowest RMSE on the test set, we chose this
ensemble method so that no single model, if it made systematically biased estimates, would
dominate the predictions.
6.4. Survey Methodology There is far more to the question of whether academically homogeneous classrooms are
beneficial that cannot be answered through student test scores. The student’s perspective of
this approach can provide valuable insight into the qualitative impact of the Research Team’s
study. Additionally, faculty techniques and practices add useful experiential detail from a
pedagogical reference point. We use two surveys to achieve our desired feedback from this
study.
We aim to avoid leading questions on the surveys to avoid bias. Students in the ability group
are not explicitly told that they are sectioned by ability during the semester but it is clear to
many that there is some structure to the class as shown in Section 4.4. We do not think that a
student being aware that they are sectioned by ability will bias their survey results in any way
but that is a possibility we recognize and a potential limitation to our results. We give the
student survey at the end of the semester and the five questions related to the study were
mixed in with other course and department-level questions. The goal of this survey is to
answer five key questions of interest for us related to the study by comparing the results
between the Ability and Random groups.
1. Was our model effective at putting Cadets of like ability together? 2. Does grouping by ability help Cadets learn better/feel more comfortable with the
course material? 3. Does a homogeneous classroom make students more comfortable and willing to
engage in class? 4. Does a homogeneous classroom allow instructors to cater to the class in a way that
does not hold stronger students back or leave struggling students behind? 5. Would students prefer to learn in an academically homogeneous environment?
The actual questions/statements we pose are Likert Scale in nature and include:
1. My mathematical abilities were aligned with the rest of my section.
2. This course helped me understand and analyze complicated problems.
3. I was engaged and felt comfortable participating during class.
4. The pace of the lessons throughout the semester were…
5. Would sectioning by academic ability be advantageous or disadvantageous to
learning?
Statements 1-3 have Strongly Agree, Agree, Neutral, Disagree, and Strongly Disagree as their
possible answers while statement 4 has the following possible responses: Much to Slow, A little
Slow, Just Right, A little Fast, and Much too Fast. Question 5’s responses are Mostly
Advantageous, Somewhat Advantageous, Neutral, Somewhat Disadvantageous, and Mostly
Disadvantageous.
Additionally, we create a separate study for those teachers that teach both random and ability
sections. We value the perspective of the instructors because of the pedagogical impact such
classroom design instills. The Research Team’s main focus for the faculty survey is to see if
faculty members have to prepare differently for a random section vice an ability section and if,
in the end, they prefer one method of teaching over the other. Results of these questions can
be found in Section 4.5.
This section focuses on the methods we used to garner the data necessary for the Research
Team’s analysis. At the conclusion of the semester, we compile all the available data and
analyze the Research Team’s results. The next section of this paper focuses on the results of
this study and the Research Team’s interpretation.
7. Results
7.1. Model Results We analyze the results from the Research Team’s study in several different ways to
determine if student learning improves when placed in classrooms with other students of
similar ability. We conduct multiple two-sample tests between students of the same ability
level split across whether the students were in the ability or random groups. Second, we
create a linear model to predict their end of course average using their experimental group
and controlling for other confounding variables.
In summary, this section shows how our model explains 61% of the variation in MA206
scores (R2). In Figure 1 below you can see the general trend in predicted vs actual scores.
7.2. Multiple Pairwise Comparisons We do two different pairwise comparisons. First, we test to determine if the mean course
average of those sectioned by ability perform differently than those randomly assigned.
Figure 1: Predicted Vs Actual Scores by Grouping
Random Ability
Cadets Assigned 272 283
Mean Grade .8104 .8060
Standard Deviation .0876 .0906
Table 9: Descriptive Statistics of Student Performance Across the Two Groups
The 95% confidence interval of the true mean difference in students randomly assigned vs
those sectioned by ability is (-.0104,.019307). Because of this, we are 95% confident that there
is no difference in student performance based on how students are sectioned.
We also look at pairwise comparisons across the nine ability levels. Table 10 below shows
the means in performance between the randomly assigned students and the students
grouped by ability. The last column denotes whether the difference between the groups is
statistically significant or not.
It appears that there is a significant difference between Ability 8 Random vs Ability.
However, after applying even the most liberal corrections for multiple hypothesis testing,
the finding is irrelevant. It is likely that this finding is due to expectable randomness in the
data as indicated by the rest of the ability groups, than a systemic difference in
performance for students in Ability Level 8.
Ability Level Random Ability P-Value Significant
Ability 1 .903 .901 ≥ .05 No
Ability 2 .878 .872 ≥ .05 No
Ability 3 .869 .881 ≥ .05 No
Ability 4 .847 .842 ≥ .05 No
Ability 5 .836 .844 ≥ .05 No
Ability 6 .789 .771 ≥ .05 No
Ability 7 .762 .753 ≥ .05 No
Ability 8 .735 .706 < .05 Yes
Ability 9 .676 .688 ≥ .05 No
Table 10: Mean Exam Performance based on Ability Level
To further support the evidence that there is no measurable difference in student learning
when students are sectioned by ability, Figure Based on Ability shows the performance level
of each group side by side based on their inherent student ability level. There is no
discernible difference between the two types of classrooms. One might even see evidence
to conclude that students in randomized sections actually do worse than their counterparts
in the ability sections.
Figure 2: Exam Performance Based on Ability Level
7.3. Testing Effect of Sectioning By Ability In order to determine the significance of sectioning by ability, we build a model to predict
cadet score by predicted score (to account for ability) and Random/Ability grouped status.
The output from our linear model is shown in Table 11. The final column reveals that none
of the factors that represent a student’s position with regards to our experiment carries
significance in predicting that student’s performance. The only predictor with any
significance is that which represents the student’s projected score. This corroborates all
other findings in our experiment.
Covariate Coefficient P-Value Significant
Predicted Grade .9376 ≤ .05 Yes
Ability 1 Random .0403 ≥ .05 No
Ability 2 Random .0494 ≥ .05 No
Ability 3 Random .0530 ≥ .05 No
Ability 4 Random .0638 ≥ .05 No
Ability 5 Random .0421 ≥ .05 No
Ability 6 Random .0376 ≥ .05 No
Ability 7 Random .0081 ≥ .05 No
Ability 8 Random .0332 ≥ .05 No
Ability 9 Random .0082 ≥ .05 No
Ability 1 Ability .0122 ≥ .05 No
Ability 2 Ability .0172 ≥ .05 No
Ability 3 Ability .0515 ≥ .05 No
Ability 4 Ability .0371 ≥ .05 No
Ability 5 Ability .0671 ≥ .05 No
Ability 6 Ability .0185 ≥ .05 No
Ability 7 Ability .0274 ≥ .05 No
Ability 8 Ability .0132 ≥ .05 No
Ability 9 Ability .0270 ≥ .05 No
Table 11: Linear Regression Variable Output Results
7.4. Student Survey Results The results from our end of semester cadet survey provide us insight into the students’
viewpoints on this research. It does this directly, by asking cadets if they think ability sectioning
would be better for learning, and indirectly, through generic questions analyzed based upon the
respondent’s position in our experiment (Ability and Random section).
The five questions and their results are shown in Figure Figure 3: Cadet Survey Results. The
responses for each question are divided between the Ability and Random groupings.
Additionally, the p-value associated with question indicates whether or not the difference
between the two groups is significantly different. For our purposes, a p-value less than 0.05
indicates statistical significance. A higher p-value indicates similarity between the responses of
the two groups. A p-value of 1 indicates identical responses.
The only question that has a significant difference between the Ability and Random groups
references whether the Cadets feel that their abilities were aligned with the rest of their class.
We expect this to be the case and it indicates that students in the ability group generally
recognize the structure of the experiment. The rest of the questions still offer some insight
even if the difference between the groups is not statistically significant. In general, the
randomly sectioned students feel they could understand complex problems more and are more
comfortable engaging in class than those sectioned by ability. This is a surprising finding and
perhaps suggests that ability sectioning does not achieve the hypothesized outcome. The same
percentage of students in both groups feel the pace of the course was just right with a few
more students in the random group feeling rushed. Finally, sentiment regarding whether
sectioning by ability would be advantageous was consistent across the two groups. 59% of
students feel that sectioning by ability would be either advantageous or strongly advantageous.
While the results of the student survey are inconclusive statistically when contrasting the
Ability and Random groups, there are still some interesting insights to be gained from them. It
appears as though students like the idea of sectioning by ability but the questions that aim to
measure how effective the method truly is are not as convincing and perhaps even indicate a
negative effect.
Figure 3: Cadet Survey Results
7.5. Faculty Results At the conclusion of the semester, we surveyed the MA206 faculty anonymously on their
inclinations with regards to ability group sectioning as well as their teaching pedagogy approach
from the semester. It should be noted that amongst the 29 sections there were only 13
instructors of which only nine instructors taught both ability and non-ability sections making
this the target population. Unfortunately, with only nine data points the sample size is
considered too small to truly provide analytical insights beyond just descriptive statistics.
Furthermore, of the nine instructors only seven responded to the survey (two of which are part
of this research group), further decreasing the sample size. Nevertheless, insight can be
observed from their feedback, specifically regarding their teaching approach and their
perception on the effectiveness of ability group sectioning. Preliminary findings, as shown in
the figure below, indicate that the faculty is split on whether they would prefer to teach in
ability grouped sections again in the future. One faculty member aptly summarizes the findings,
stating that while they have no preference in teaching by ability section in the future, there is a
substantial difference in the amount of additional instruction (AI) they provide in their lower
performing ability-grouped sections compared to their random sections. This disparity in
additional instruction is noted in almost all of the surveys, concluding that this effects their
workload, beyond normal class preparation and grading.
Figure 4. Instructor Feedback.
Across all surveys, the faculty unanimously note that the teaching pace was different in their
Ability sections, compared to their Random sections. This difference is also realized in those
that taught higher ability and lower ability sections. Due to this difference in pace, and to the
fact that instructors taught a mixture of ability and non-ability grouped sections, there is a
perceived difference in lesson preparation. Notably, instructors feel that they have to prepare
for two different classes. One instructor notes that the “randomized sections took less time in
lecture and less time in board work than in the low-ability sections. I could go further in detail
and explain more tangents.” Faculty also note that there is a difference in the level of
preparation between the higher ability sections and the lower and non-ability grouped sections,
adding to the evidence that preparation does differ between the varying groups.
Finally, with regards to the value of ability grouped sections, five of the seven instructors feel
that ability grouping is beneficial to the student. Similar to the literature review on the merits
and consequence of this approach some instructors feel strongly while others were rather non-
committal. Instructors also feel that the pace is consistent in ability sections and keeps all
students engaged, promoting peer pressure to keep up and fostering a learning environment.
Other faculty are not as convinced that a true benefit exists, thinking the learning responsibility
falls squarely on the students. One instructor stated, “I believe in theory that there is some
benefit to sectioning by ability. However, in application I think this is more of a function of
student desire. A student will perform based on how much work they put in. My goal, however,
was to fan the flame within them to get them to want to perform.”
8. Discussion The Research Team notes in the Section 6.4 that students did not perform as hypothesized,
namely that the ability grouped sections did not perform better than the random group. The
descriptive statistics support this findings and actually indicate that the ability grouped sections
performed slightly below the randomly assigned sections. The rationale for this is mixed and
requires a lookback at the Literature Review to help understand. Principally, the team
anticipated that students would enjoy being in the company of other like-minded students. As a
consequence, this should provide an atmosphere conducive to learning and that facilitates
discourse. Though the research does not indicate that this interaction did not occur the
quantitative results of higher grades did not materialize tangible results, or at least that are
significant enough to report. Therefore inspection of the student surveys are necessary and
thus indicate that the ability grouped section class pace was generally on the right side of “just
right”, or just a little too fast. This is equitable to the non-ability grouped sections and does not
provide as much insight as the Research Team desires. This can be viewed as there was not a
noticeable difference between the two sections and as a result, no comparison can be assessed,
though anecdotally the instructors could have made the adjustments of the class pace
throughout the course. This would support the literature, specifically literature that indicates
that instructors will adjust to where the mean of achievement of their classes are.19 This
adjustment is still left unrealized, as the student’s indicated that the pace was “too fast” in all
sections, though this may be more of a function of general student attitudes, specifically in
response to a course that is generally spread out between two semester, Probability and
Statistics, in other institutions.
However, it is still left unanswered as to why there was not any marked improvement between
the ability grouped sections. This is particularly odd as both the students and the faculty
identified that they recognize the positive potential for sectioning by ability. This benefit
19 Hopkins, 2006.
should have been realized through the culture of USMA, being competitive in nature, and
positive peer pressure, or the response of students to keep up with their peers. It, of course,
cannot be understated that there are additional factors that are not being considered, such as
the student’s course load, however, this would be marginal when considering all students
collectively. Therefore, this begs follow-on questions to whether the Research Team properly
grouped the sections by equivalent ability. It must be assumed yes, however, upon inspection
of student grades in the sections it can be seen that the grades have normal variability,
indicating that they were sectioned adequately. Inspection of the boxplots in Section 6.4,
which compares student averages on course-wide graded events, that compared each of the
sections indicates that there was very little variability between the sections. Therefore, the lack
of a net change cannot be determined topically. This again begs follow-on consideration and
possibly, in retrospect, the faculty surveys should have surveyed the faculty about their
perceived observations of student’s general participation and comfort within their sections.
This may have then been beneficial to compare to the student’s responses to determine if
perceptions of the sections were aligned, thus indicating benefit.
Finally, the results of this research does not mirror the same findings from LTC Hickman’s
research, which was conducted in a similar fashion as this research.20 Unfortunately, where his
research indicated positive findings in favor of ability group sectioning in Calculus courses the
same does not hold true to ability sectioning in Probability and Statistics courses. In comparison
the target populations are different (Plebes vs. Yearlings) though this slight deviation of
populations should not have caused such vastly different findings as all other variables are the
same. This may further tease out the question if there are other social behaviors at play or if
the results are purely academic.
9. Conclusion The ultimate question that the Research Team wants to answer is if there is benefit to grouping
by ability section and should the Department of Math and USMA continue to group by ability in
the future. The Research Team hypothesized that there would be a positive impact of
sectioning students by mathematical ability, however, based on the quantitative and qualitative
findings from this research the hypothesis cannot be supported as there is no significant
difference in academic performance between ability and randomly grouped sections. In
actuality, the ability grouped sections actually performed slightly worse overall on average
(81.04% vs. 80.60%), though the results were not significant. With these quantitative findings in
hand students still indicated an overall positive sentiment towards the benefit of sectioning by
ability without a noticeable difference in the pace of the classes. The faculty echoed this
sentiment and reported a weak sentiment to teach ability sections in the future only citing
length of preparation as the only discernable factor. Therefore, though the results are of this
research are not able to substantiate the research hypothesis it can still prove beneficial and
20 Hickman, 2007.
can help address future class development at the United States Military Academy by providing
a basis for future research in the matter.
References: Dietterich, T.G. “Ensemble Methods in Machine Learning. In: Multiple Classifier Systems.” MCS 2000. Lecture Notes in Computer Science, vol 1857. Springer, Berlin, Germany. 2000. Heitjan, Daniel F. & Basu, Srabashi “Distinguishing ‘Missing at Random’ and ‘Missing Completely at Random’.” The American Statistician, 2012, 50:3, 207-213. Hickman, Randal. “Ability Group Sectioning in an Undergraduate Calculus Curriculum.” West
Point, NY. 2007.
Hopkins, Gary, “Is Ability Grouping the Way to Go—or Should it Go Away,” Education World,
2006.
Kuhn, Max. “caret: Classification and Regression Training.” R Package version 6.0-78. Available
at https://CRAN.R-project.org/package=caret . 2017
Loveless, Tom. “Making Sense of the Tracking and Ability Group Debate.” 1998.
Office of the Dean, USMA. “Class profile of the Class of 2020.” Available at
https://www.usma.edu/oir/Class%20profiles/Class%20of%202020.pdf, 2016.
Princeton Review. “Most Accessible Professors”. Available at
https://www.princetonreview.com/college-rankings?rankings=most-accessible-professors,
2018.
Slavin, Robert E. “Achievement Effects of Ability Grouping in Secondary Schools: A Best-
Evidence Synthesis,” Review of Educational Research, fall of 1990, volume 60, 1990, pgs. 471-
499.
“The ACT Profile Report-National: Graduating Class 2016.” Available at
https://www.act.org/content/dam/act/unsecured/documents/P_99_999999_N_S_N00_ACT-
GCPR_National.pdf, 2016.
Tidwell, Gary. “Psychological Foundations of Teaching and Learning: Student Motivation by
Sectioning Students.” West Point, NY. 2007.
Venables, W. N. & Ripley, B. D. “Modern Applied Statistics with S. Fourth Edn. Springer, New
York. 2002.
Viar, William. “Tracking and Ability Grouping” 2008.