pedagogical approach to ability group sectioning- a ... · pedagogical approach to ability group...

Pedagogical Approach to Ability Group Sectioning- A Mathematical

Investigation

Chris Collins, Jim Pleuss, Dusty Turner

This paper was completed and submitted in partial fulfillment of the Master Teacher Program, a 2-year faculty

professional development program conducted by the Center for Faculty Excellence, United States Military Academy,

West Point, NY, 2018.

_____________________________________________________________________________________

Abstract: In the continual pursuit of classroom learning effectiveness, researchers and

educators aim to develop strategies that improve student performance and learning. One such

strategy is to create academically homogeneous environments where students are grouped

into classes based on their preconceived academic ability. The research team hypothesizes that

the students placed in homogenous ability groups, regardless of ability level placement, will

perform better than they would in normal heterogeneous classes. Students during the AY 18-01

were placed in either ability or randomly grouped sections of MA206 based on mathematical

projections of a student’s historical mathematical performance. The analysis uses data across

major graded events, as well as qualitative data in the form of end-of-course surveys to gauge

student and faculty feedback. A quantitative analysis of these results indicates that there is not

a significant difference in student performance between ability and randomly grouped sections.

Contents Pedagogical Approach to Ability Group Sectioning- A Mathematical Investigation ............... 1

Contents ...................................................................................................................................... 1

1. Introduction ............................................................................................................................. 2

2. Background .............................................................................................................................. 3

3. Literature Review .................................................................................................................... 5

4. Pedagogical Approach: ............................................................................................................ 7

5. Hypothesis ............................................................................................................................... 7

6. Methodology ........................................................................................................................... 8

6.1. Methodology Design ............................................................................................................ 8

6.2. Data .................................................................................................................................... 10

6.3. Model Building ................................................................................................................... 11

6.3.1. Linear Regression ............................................................................................................ 12

6.3.2. Regularization .................................................................................................................. 12

6.3.3. Random Forest ................................................................................................................ 13

6.3.4. Ensemble ......................................................................................................................... 13

6.4. Survey Methodology .......................................................................................................... 14

7. Results ................................................................................................................................... 15

7.1. Model Results ..................................................................................................................... 15

7.2. Multiple Pairwise Comparisons .......................................................................................... 15

7.3. Testing Effect of Sectioning By Ability ................................................................................ 17

7.4. Student Survey Results ....................................................................................................... 18

7.5. Faculty Results .................................................................................................................... 21

8. Discussion .............................................................................................................................. 22

9. Conclusion ............................................................................................................................. 23

References: ................................................................................................................................ 25

1. Introduction In the continual pursuit of classroom learning effectiveness, researchers and educators aim to

develop strategies that improve student performance and learning. One such strategy is to

create academically homogeneous environments where students are grouped into classes

based on their preconceived academic ability. There are many inherent challenges within such

studies if the goal is statistically robust findings extending beyond the anecdotal: developing

effect control and treatment groups while maintaining a sufficiently large sample size is difficult

without creating uncomfortably large classroom environments; isolating specific variables to

test for effectiveness is also difficult with innumerable other influencing factors, or “noise”

variables; finally, the administrative support required to do a large-scale experiment of this

nature can be difficult to obtain. Nonetheless, the literature is rich with pedagogical studies

exploring the academically homogeneous classroom.

This research expands and further explores the pedagogical discussion of grouping

undergraduate students by academic ability groups. Research predominantly focuses on the

benefits for elementary and middle school students with few studies on undergraduate

education. Therefore, within the Department of Mathematical Sciences at the United States

Military Academy (USMA), our research team wants to explore the impact of an academically

homogeneous classroom environment on student performance, student perception of learning,

and instructor practices. Based on current research in the literature and past studies along with

consultation from senior faculty within the Mathematics Department, we developed a study

that took 580 students (predominantly sophomores1) taking Introduction to Probability and

Statistics (MA206) during the Fall Semester of Academic Year 2018 (AY 18-01) and created two

sets of classroom environments (i.e. student distribution). The control group, or random group

as we will call it in this paper, contained sections, or classes, to which students were randomly

assigned. The treatment group, or ability group, was divided into sections based on their

projected performance.

This paper will begin by discussing the background of the Research Team’s study, to include the

unique circumstances and academic environment at USMA, along with a review of the current

state of the literature in this field. We will then explain the methodology of the Research

Team’s study and the model we used to determine academic ability. The results of our study

will follow, focusing on student performance on course-wide major graded events as well as

student and faculty survey results. We then conclude with a discussion of the results,

contributions of this research to the literature, and some pedagogical recommendations and

observations to the field.

2. Background The mathematical curriculum for students at USMA require all students to take a minimum of

three math courses in order to satisfy graduation requirements and as part of the program

accreditation for attainment of a Bachelor’s of Science. Students can take additional math

courses as part of their major. However, they are required to take the following three core

courses: 1.) MA103: Mathematical Modeling; 2.) MA104: Introduction to Calculus; and 3.)

MA206: Introduction to Probability and Statistics. If a student selects a major that is STEM

(Science, Technology, Engineering, and Mathematics) focused then they will be required to take

an additional mathematical course, MA205: Multivariate Calculus. Within each of the core math

core math courses, students have an opportunity to be placed in advanced sections based on

past performance and/or as accessed through placement exams proctored before the start of

the academic year. These advanced courses mirror their adjacent courses and are given their

own course nomenclature. The intent of these courses is to provide high ability students the

opportunity to compete with other like-ability students and be challenged in classes that offer

advanced instruction in what would likely otherwise be unchallenged in mixed ability classes.

USMA is a unique institution, and provides an opportunity as a specific case study, as all

students that have been accepted into the academy have gone through an extensive and

rigorous academic, physical, leadership, and background vetting process. As a consequence, all

graduates earn a Bachelor’s of Science upon graduation. The median score for the math portion

of the ACT, for the graduating Class of 2020 (the majority of the research sample population), is

1 Referred to as Yearlings at USMA.

292, thus supporting the academic pedigree of USMA students.3 Therefore, it is assumed that

the majority of students should have a strong mathematical background and foundation as well

as the capacity to learn. In conjunction with small class sizes (~15-16) and the nation’s #1 most

accessible university professors4 the conditions are set for students to succeed in class.

Unfortunately, due to a myriad of reasons not all students succeed academically while they are

Cadets at USMA. As instructors and professors of these students, it is their duty to determine

where the gap or problem exists and find ways to curtail these deficiencies.

USMA has a deep and rich legacy of many things, with their academic approach or pedagogy

being one of them. The Thayer System, named after the institution’s first Superintendent, is

geared towards placing the learning responsibility onto the student. As such, it is the student’s

responsibly to read an assignment daily in preparation for the class they are going in to have

and come prepared to participate in discussion and further build upon their knowledge base

through application and lecture. This coupled with the aforementioned small class sizes lends

itself to an environment that promotes academic rigor while also providing opportunity for

more individual attention from the instructors. USMA spends a lot of energy and resources to

ensure that the conditions are met for students to learn in these intimate classroom

environments. It does not overtly dictate the composition of the classes themselves, however.

This heterogeneous, randomness therefore is then a key predictor for the type of classroom

environment that each class composition provides.

Each major department at USMA has the autonomy to section students in their classes as they

prefer, though they must adhere to the Dean’s Policies.5 Within the Department of Math a

student can be selected to populate advanced courses, in line with graduation requirements,

through the demonstration of past academic achievement. However, within the Department of

Math there is not a concerted effort to section classes within the common core curriculum by

ability.

Research has been done that both supports and fails to support the advantages of ability

sectioning, or academically homogenous grouping. Principally, this is done at the elementary

and high school levels where school administrators have the benefit of more regimented class

schedules and common curriculums. There is comparatively little research done regarding the

impact of such academically homogeneous classrooms at the undergraduate level. The inherent

institutional structure of USMA creates an adequate environment for effective experiments of

this nature. The Research Team’s results provide justification to some of the existing literature

as well as novel discoveries and conclusions not yet explored.

2 Class profile of the Class of 2020, Office of the Dean, USMA. 3 According to the ACT National Profile Report of the Graduating (HS) Class of 2016 the national average of the math portion of the ACT was 20.6. The median was not reported but it can be extrapolated from the Quartile Score report that the national median for the Class of 2016 was 19 based on a sample size of 2M. 4 Princeton Review, 2018. 5 Per Dean’s Policy and Operating Memorandum 02-10, dated 16 August 2004.

3. Literature Review For the sake of clarification, especially in the academic realm, there is a unique distinction of

the definitions of both ability groups and of “tracked students”.6 From his research, William Viar

[2008] identifies “tracking” as the placement of students into directed curriculum (math,

history, English, science, etc.) based on past achievement and potential for success. These

directed curriculums, or tracks, streamline students based on their performance as well as their

parent’s wishes. In the case of mathematics, a high school student would be tracked as having

the capacity and academic path that would require them to meet college math prerequisites.

Therefore they would take math classes all the way to their senior year ending with Calculus. A

high school student that was not tracked for higher education, possibly indicating preference

for vocational training, may opt out of the minimum high school graduation requirements and

end at two or three years of high school math (Algebra II or Pre-Calculus). Conversely, ability

groups are most visibly organized, according to Viar, within elementary schools and principally

with reading groups. Students would be placed into ability groups where the instruction would

be given based on the capability of the students. In his research of grouping in high school

classes7, Dr. Robert Slavin8 identifies five different types of ability grouping: grouping students

as a class by ability for all subjects, primarily heterogeneous grouping, mix of ability and

heterogeneous grouping, non-graded instructional grouping, and in-class grouping. Though

these five types of groupings differ they come back to the principal difference between ability

grouping and tracking which is simply capability vs. capacity.

The majority of the literature on ability groups focuses on grouping within elementary school

classrooms with a preponderance of the authors concluding that they did not support the

practice of sectioning by ability, except in very specific cases which will be highlighted later in

this section. These negative reviews and research of ability grouping could be due to the

composition of these education levels where other developmental indicators may not be

considered. Nonetheless, their observations and notes are valuable as important lessons can

still be derived from them. Since this study will be testing the hypothesis for potential gain from

ability grouping it may be beneficial to first highlight the contrarian arguments against this

methodology in order to gain perspective and better shape the methodology for this research.

Tom Loveless is an education research/former professor whom is the most referenced author in

cases against ability grouping and tracking. In his seminal work, “Making Sense of the Tracking

and Ability Grouping Debate”9, he concludes that tracking, and to a lesser extent ability

grouping, “fosters race and class segregation”, potentially harms students’ self-esteem, and

leads to a self-fulfilling prophecy for most students. He also poses the ethical issues raised

against the practice of sectioning by ability as it may set and maintain conditions of inequality

6 Viar, 2008. 7 High School is also referred to as secondary school whereas elementary school is referred to as primary school. This research focuses above high school or what is considered undergraduate or post-secondary education. 8 Slavin, 1990. 9 Loveless, 1988.

amongst demographic and socio-economically different student populations. Mr. Loveless

concludes that there is not a definite net that gained from ability grouping or tracking. The

disadvantages are further discussed in Colonel Gary Tildwell’s10 research citing developmental

hazards such as labeling, psychological damage for low-ability performers, and difficulty in

shedding the labels and moving to higher ability groups. These disadvantages are difficult to

quantify and are qualified anecdotally, without consistent metrics employed. Notwithstanding

the ethical or moral implications, the Research Team will focus on the application of the

grouping by ability and the effects on the student.

Within the “Notes” of Mr. Loveless’ research he cited groups and institutional entities that

“condemned” tracking. These notes were used in the forward of his research but a deeper

review revealed that a large portion (~50%) of surveyed educators wanted more ability

grouping in K-12 classrooms.11 Additionally, only 40% of those surveyed said they believed that

heterogeneously mixed classrooms would improve a student’s education. As those surveyed

were binned by those that taught in middle school, those that preferred ability grouping rose to

66%, indicating that they saw the benefit of pairing students with similar abilities. The research

for the case for sectioning by ability can be summarized as revolving primarily around the

benefits to the students, which should be expected as most teachers show traits of selfless

service and expertise in their lesson preparation. Therefore the majority of the advantages to

ability grouping are, and should be, focused on the benefits to the students. These advantages

relate to the comfort level of the students, as students of like ability tend to be more

comfortable around each other. In these environments everyone should have similar capacity,

promoting healthy conversation within the classroom. The thought is that a question for one is

actually a question for all. The added benefit for the teacher is that they can seemingly teach at

one level, or pace, alleviating the varying types and levels of questions from the normal flow of

class. In this way a response to one is actually a response to many. With regards to the special

cases where ability grouping was found to be beneficial, both Loveless and Slavin agree that

these were a result of the institution itself. An institution that cements a culture that focuses on

effort and held students accountable were most correlated to strong performances within

ability groups. These institutions were identified as Catholic Schools, where even low track

classrooms were facilitated through good teaching, with small classes, student participation,

and parent involvement.

In comparison to studies that focus on elementary and middle school ability grouping, there is

not a comparable amount of research for undergraduate education. However, a peer from the

Department of Mathematics at USMA conducted his own research on ability group sectioning in

200712. LTC Randall Hickman grouped classes by ability in a USMA Core Math Class, MA205

(Multivariate Calculus). He created these homogenous “sections” and then collected

10 Tildwell, 2007. 11 Loveless, 1988. 12 Hickman, 2007.

performance data based on how students did on the final, comprehensive test for MA205

named the Term End Exam (TEE). His quantitative analysis of the TEE scores indicates that

students perform better on the TEE, compared to previous years. He further determines that

the high performing students perform as expected but the largest increase was within the

lower ability groups/sections, where they were “pulled up” by their peers. His results are

mixed, however; the ability grouping was most beneficial to the lower ability groups as opposed

to the higher ability groups. These results are interesting and form the bases for a deeper

investigation in the placement and tracking of students within these ability groups both

providing a case for lower and against higher ability groups.

4. Pedagogical Approach: In his research, Colonel Gary Tidwell13 provides the following skills that are necessary for

instructors that teach ability sections, regardless of the level of the grouping:

1. Instructors must create the conditions in the classroom where all students are

challenged.

2. Instructors must be able to provide opportunities to cover new lesson material

faster.

3. Instructors must be knowledgeable of the material so that they can be able to be

flexible for all students.

The Research Team’s pedagogical approach for this study investigates grouping students by

common mathematical ability. Different level and depths of preparation for varying ability

group sectioning is expected with different composition of academic abilities. It is assumed that

the pedagogical approach outlined above is achievable due to the small class sizes of MA206 as

well as instructor educational experience (e.g. the material is well known). It is additionally

important to recognize the experience and comfort level of the material within the instructors

(MA206 teachers are not first semester instructors and thus have experience teaching). This

assumption is important as all of these instructors would have experience teaching

heterogeneous classes within the core math program according to the teaching hierarchy

within the department. This will allow instructors the flexibility to focus on their own teaching

approach.

5. Hypothesis The research team hypothesizes that the students placed in homogenous ability groups,

regardless of ability level placement, will perform better than they would be projected to

perform in normal heterogeneous classes. We hypothesize that this is because the students in

homogenous ability classrooms should be more participative based on everyone theoretically

being of equal standing with a more uniform pace to the class. The team further hypothesize

that those that will be identified as “lower performers” will benefit more than those that are

13 Tidwell, 2007.

identified as “high performers”. Muir would support this as he introduced the idea of “self-

fulfilling prophecy”, meaning that the high-performers will earn the grade that they want.

Regardless, the Research Team is initially interested in the short term benefits as assessed

based on performance on course-wide graded events. There is also a strong interest in the

perceived benefits of this methodology in the long term. While this is difficult to quantify, we

remain dedicated to the USMA and Department of Mathematics focus on life-long learning and

anticipate growth in this as a result of this research.

6. Methodology Students were “tracked” in Loveless’ research by assigning them to English and mathematics

courses based on a variety of metrics to include the student’s previous grades, teacher

recommendations, and placement tests.14 Similar to this model, the participants of the

Research Team’s study were composed of those that took MA206 in AY 18-01. This population

tends to include many STEM majors or those that were on the advanced math track based on

general Academy scheduling.

The Research Team’s methodology differs from LTC Hickman’s due to the selection of students

for the study. We use a mathematical model to predict the scores of every student taking

MA206 in AY 18-01 and base the structure of our experiment on this predicted score. This gives

us a more robust research foundation and provides a better framework for analysis of the

Research Team’s results.

6.1. Methodology Design With the ultimate goal of determining if students learn better in classrooms with peers of

similar ability, we must first properly determine their ability. Based on two previous years of

performance in MA206, we develop a model to predict the future performance of students

in academic year 18-01. We discuss the development of this model in detail in future

subsections.

We use this predictive model to project the course average for each student. In order to

isolate the impact of ability grouping on student learning, we design the experiment by

breaking down class hours for the two groups.

Hour Day Time Ability / Random Students

B (2nd) 1 840-935 Ability 145

C (3rd) 1 950-1045 Random 135

H (2nd) 2 840-935 Random 157

I (3rd) 2 950-1045 Ability 143

14 Loveless, 1998.

Table 1: Experimental Design Technique to Control for Class Day/Time

All students take MA206 during 2nd or 3rd hours on Days 1 or 2 for AY 18-01. As noted in

Table 1, we designate two class hours (C and H) to randomly section students and two class

hours (B and I) to section students by their predicted ability. We split these sections across

day and time slot in order to control for any effect day or time might have had on student

performance.

Furthermore, we take steps to control for the impact a teacher has on student learning.

Based on constraints on available teachers, instructors teach the same predicted ability for

Ability hours. Some instructors are not available to teach all four hours, adding the

possibility of confounding variables to our results. Most instructors also teach two Random

sections.

The teacher breakdown is shown in Table 2, where each instructor has a unique ID number

and the projected ability of a section decreases as the corresponding number increase. For

example, Ability 1 indicates the Ability section consisting of students with the best

projected scores. Conversely, Ability 9 is the section containing students with the worst

projected scores.

Instructor ID B Hour I Hour C Hour H Hour

1 Ability 5 Ability 5 Random Random

2 Ability 6 Random


4 Ability 1 Random Random

5 Ability 1

6 Ability 6

7 Random





12 Random


Table 2: Instructor Breakdown Sections

In order to further limit the impact of the instructor on final grade, we will only analyze

course wide graded events which include three mid-term tests and one final exam. The

Research Team’s prediction model uses overall course average instead of only course-wide

graded events as the primary response variable as this was the only grade data available for

the previous semesters.

6.2. Data In order to predict student performance, the Research Team uses data on 1023 students

from the academic semesters AY 17-01 and AY 17-02. Students at USMA take one of several

main tracks through the core math program as shown in Error! Reference source not

ound.. The team removed four students that took very non-standard tracks from this

analysis.

Core Math Track Courses Number of Cadets

Standard Non STEM MA103, MA104 454

Standard STEM MA103, MA104, MA205 320

STEM w/ MA103 Validation MA104 23

Remedial to STEM w/ Validation MA100, MA104, MA205 10

Advanced Non STEM MA153 8

Advanced STEM MA153, MA255 197

Advanced to Standard MA153, MA205 7

Table 3: Track through the Core Mathematics Program at USMA

The team uses these common tracks in order to develop the set of variables in Table 4. These

variables represent the factors for possible inclusion in the Research Team’s predictive model.

Table 4: Variables Used for Possible Inclusion in the Predictive Model

Unfortunately, upon synthesizing the data under the framework of Table 4, the team ends

up with observations containing missing data. 2.26% of the Single Variable Calculus Grade

and 0.1% of the Modeling Course Grade are missing due to Cadets validating these courses.

One option in dealing with this missing data is to ignore the pertinent observations. This is

not ideal as doing so eliminates certain tracks of how students move through their math

courses. Because of this, we decided to impute the data. A typical assumption when

Independent Variable Description

Modeling Grade Final Grade of MA103 or MA153

Single Variable Calculus Grade Final Grade MA104 or MA153

Multi-variable Calculus Grade Final Grade MA205 or MA255

Core Math Average Average of Core Math GPA

CQPA Cumulative Academic Grade Point Average

CCPS Cumulative Overall Grade (Academic/Military/Physical)

Rock Whether or not a Student took an Introductory Math Course

imputing data is that the data is missing at random15. This is not true in our case, but we

believe nonetheless that imputing the data will actually enhance the Research Team’s

model instead of biasing it.

To impute the missing data, we use a K-Nearest-Neighbor clustering method with ’k’ = 5. In

summary, after centering and scaling the data, the five nearest observations in Euclidean

distance are averaged and that number is assigned as the missing data point.

6.3. Model Building To begin the construction of the Research Team’s model, we separate our data into a

testing and training set consisting of 75% and 25% of the available observations,

respectively. From there we create three different types of models: a Random Forest, a

LASSO, and a Linear Regression Model, each developed through 10-Fold Cross Validation

from the CARET 16(Classification And REgression Training) package in R. We ensemble our

models (combine them all in a linear combination) in order to limit the errors made by any

particular model.17

Once we arrive at our ensemble model, we use the entire dataset to build our final

ensemble model. The results we present below are the final model created from the entire

dataset.

Error! Reference source not found. below provides a summary of the models. RMSE is the

esidual mean squared error and gives the sum of squared errors for each of the

observations. Lower numbers indicate less overall error from actual course grades to the

predicted ones. R2 is the coefficient of variation and reveals the proportion of variation in

course score accounted for by the model. The closer the number to one, the more

variation the model takes into account.

Model RMSE R2 MAE

Linear Regression .2631 .9308 .1873

Regularization .2626 .9310 .1901

Random Forest .3148 .9010 .2314

Ensemble Model .2663 .9291 .1943

Table 5: Model Summary

15 Heitjan and Basu 2012. 16 Kuhn et. al, 2017. 17 Dietterich, 2000.

6.3.1. Linear Regression To arrive at the best linear regression model, the stepAIC function was used from the

MASS18 library in R. After 10 Fold Cross-Validation, the coefficients and associated p-values

for the best model is shown in Table 6.

Covariate Coefficient p-Value

Intercept -87.712 ≤ .05

Core Math Average 1.861 ≤ .05

CQPA .065 ≤ .05

CCPS .0532 .0667

Took MA100 -.2771 ≤ .05

Modeling -.5148 ≤ .05

Single Variable Calculus -.6233 ≤ .05

Standard Non STEM .9141 ≤ .05

Standard STEM .8738 ≤ .05

STEM w/ MA103 Validation .8176 ≤ .05

Remedial to STEM w/ Validation 1.2673 ≤ .05

Advanced Non STEM .9070 ≤ .05

Advanced STEM .8528 ≤ .05

Table 6: Resulting Coefficients from Linear Regression Model

The reader may notice that the p-value for CCPS is > .05. We acknowledge that the p-value

is larger than our significance level (α), however, we leave it in the model because this

model has the smallest error in predicting our test set.

The four modeling assumptions for linear regression (linearity, normality,

heteroscedasticity, and independence) are satisfied.

6.3.2. Regularization To arrive at the best regularization model, tuned parameters of α at 0, .55, and 1 and λ from

0 to 1 using the glmnet package through CARET. After 10 Fold Cross-Validation the best

model as determined by RMSE with α = 1 and λ = .001821 is as follows:

Predictor Variable Coefficient

Intercept .00468

Core Math Average 1.7842

CQPA .0725

CCPS .0537

Rock -.2301

Modeling -.48654

18 Venables and Ripley, 2002.

Single Variable Calculus -.5838

Standard Non STEM .0318

STEM w/ MA103 Validation -.0439

Remedial to STEM w/ Validation .2976

Advanced NON Stem .0189

Advanced STEM -.00039

Advanced to Standard -.7877

Table 7: Resulting Coefficients from Regularization Model

6.3.3. Random Forest We also ran a random forest model using the randomForest package in R. We arrived at the

best tree-based model through creating 500 trees per forest. After 10 Fold Cross-Validation,

the best model (as determined using the smallest RMSE with 13 variables randomly

sampled as candidates at each split), organized by variable importance as determined by

node purity is shown below in Table 8.

Rank Predictor Variable

1 Core Math Average

2 CQPA

3 Single Variable Calculus

4 Modeling

5 Rock

6 CCPS

7 Advanced STEM

8 Standard Non STEM

9 Standard STEM

10 STEM w/ MA103 Validation

11 Rock

12 Advanced to Standard

13 Remedial to STEM w/ Validation

14 Advanced NON Stem

Table 8: Rank of Variable Importance from the Random Forest Model

6.3.4. Ensemble While there are many complicated ways to ensemble multiple models, we shied away from

the more complicated methods of bagging, boosting, or stacking and averaged the results of

the model. This helped maintain interpret-ability while giving equal weight to each model.

Instead of simply picking one model with the lowest RMSE on the test set, we chose this

ensemble method so that no single model, if it made systematically biased estimates, would

dominate the predictions.

6.4. Survey Methodology There is far more to the question of whether academically homogeneous classrooms are

beneficial that cannot be answered through student test scores. The student’s perspective of

this approach can provide valuable insight into the qualitative impact of the Research Team’s

study. Additionally, faculty techniques and practices add useful experiential detail from a

pedagogical reference point. We use two surveys to achieve our desired feedback from this

study.

We aim to avoid leading questions on the surveys to avoid bias. Students in the ability group

are not explicitly told that they are sectioned by ability during the semester but it is clear to

many that there is some structure to the class as shown in Section 4.4. We do not think that a

student being aware that they are sectioned by ability will bias their survey results in any way

but that is a possibility we recognize and a potential limitation to our results. We give the

student survey at the end of the semester and the five questions related to the study were

mixed in with other course and department-level questions. The goal of this survey is to

answer five key questions of interest for us related to the study by comparing the results

between the Ability and Random groups.

1. Was our model effective at putting Cadets of like ability together? 2. Does grouping by ability help Cadets learn better/feel more comfortable with the

course material? 3. Does a homogeneous classroom make students more comfortable and willing to

engage in class? 4. Does a homogeneous classroom allow instructors to cater to the class in a way that

does not hold stronger students back or leave struggling students behind? 5. Would students prefer to learn in an academically homogeneous environment?

The actual questions/statements we pose are Likert Scale in nature and include:

1. My mathematical abilities were aligned with the rest of my section.

2. This course helped me understand and analyze complicated problems.

3. I was engaged and felt comfortable participating during class.

4. The pace of the lessons throughout the semester were…

5. Would sectioning by academic ability be advantageous or disadvantageous to

learning?

Statements 1-3 have Strongly Agree, Agree, Neutral, Disagree, and Strongly Disagree as their

possible answers while statement 4 has the following possible responses: Much to Slow, A little

Slow, Just Right, A little Fast, and Much too Fast. Question 5’s responses are Mostly

Advantageous, Somewhat Advantageous, Neutral, Somewhat Disadvantageous, and Mostly

Disadvantageous.

Additionally, we create a separate study for those teachers that teach both random and ability

sections. We value the perspective of the instructors because of the pedagogical impact such

classroom design instills. The Research Team’s main focus for the faculty survey is to see if

faculty members have to prepare differently for a random section vice an ability section and if,

in the end, they prefer one method of teaching over the other. Results of these questions can

be found in Section 4.5.

This section focuses on the methods we used to garner the data necessary for the Research

Team’s analysis. At the conclusion of the semester, we compile all the available data and

analyze the Research Team’s results. The next section of this paper focuses on the results of

this study and the Research Team’s interpretation.

7. Results

7.1. Model Results We analyze the results from the Research Team’s study in several different ways to

determine if student learning improves when placed in classrooms with other students of

similar ability. We conduct multiple two-sample tests between students of the same ability

level split across whether the students were in the ability or random groups. Second, we

create a linear model to predict their end of course average using their experimental group

and controlling for other confounding variables.

In summary, this section shows how our model explains 61% of the variation in MA206

scores (R2). In Figure 1 below you can see the general trend in predicted vs actual scores.

7.2. Multiple Pairwise Comparisons We do two different pairwise comparisons. First, we test to determine if the mean course

average of those sectioned by ability perform differently than those randomly assigned.

Figure 1: Predicted Vs Actual Scores by Grouping

Random Ability

Cadets Assigned 272 283

Mean Grade .8104 .8060

Standard Deviation .0876 .0906

Table 9: Descriptive Statistics of Student Performance Across the Two Groups

The 95% confidence interval of the true mean difference in students randomly assigned vs

those sectioned by ability is (-.0104,.019307). Because of this, we are 95% confident that there

is no difference in student performance based on how students are sectioned.

We also look at pairwise comparisons across the nine ability levels. Table 10 below shows

the means in performance between the randomly assigned students and the students

grouped by ability. The last column denotes whether the difference between the groups is

statistically significant or not.

It appears that there is a significant difference between Ability 8 Random vs Ability.

However, after applying even the most liberal corrections for multiple hypothesis testing,

the finding is irrelevant. It is likely that this finding is due to expectable randomness in the

data as indicated by the rest of the ability groups, than a systemic difference in

performance for students in Ability Level 8.

Ability Level Random Ability P-Value Significant

Ability 1 .903 .901 ≥ .05 No

Ability 2 .878 .872 ≥ .05 No

Ability 3 .869 .881 ≥ .05 No

Ability 4 .847 .842 ≥ .05 No

Ability 5 .836 .844 ≥ .05 No

Ability 6 .789 .771 ≥ .05 No

Ability 7 .762 .753 ≥ .05 No

Ability 8 .735 .706 < .05 Yes

Ability 9 .676 .688 ≥ .05 No

Table 10: Mean Exam Performance based on Ability Level

To further support the evidence that there is no measurable difference in student learning

when students are sectioned by ability, Figure Based on Ability shows the performance level

of each group side by side based on their inherent student ability level. There is no

discernible difference between the two types of classrooms. One might even see evidence

to conclude that students in randomized sections actually do worse than their counterparts

in the ability sections.

Figure 2: Exam Performance Based on Ability Level

7.3. Testing Effect of Sectioning By Ability In order to determine the significance of sectioning by ability, we build a model to predict

cadet score by predicted score (to account for ability) and Random/Ability grouped status.

The output from our linear model is shown in Table 11. The final column reveals that none

of the factors that represent a student’s position with regards to our experiment carries

significance in predicting that student’s performance. The only predictor with any

significance is that which represents the student’s projected score. This corroborates all

other findings in our experiment.

Covariate Coefficient P-Value Significant

Predicted Grade .9376 ≤ .05 Yes

Ability 1 Random .0403 ≥ .05 No









Ability 1 Ability .0122 ≥ .05 No









Table 11: Linear Regression Variable Output Results

7.4. Student Survey Results The results from our end of semester cadet survey provide us insight into the students’

viewpoints on this research. It does this directly, by asking cadets if they think ability sectioning

would be better for learning, and indirectly, through generic questions analyzed based upon the

respondent’s position in our experiment (Ability and Random section).

The five questions and their results are shown in Figure Figure 3: Cadet Survey Results. The

responses for each question are divided between the Ability and Random groupings.

Additionally, the p-value associated with question indicates whether or not the difference

between the two groups is significantly different. For our purposes, a p-value less than 0.05

indicates statistical significance. A higher p-value indicates similarity between the responses of

the two groups. A p-value of 1 indicates identical responses.

The only question that has a significant difference between the Ability and Random groups

references whether the Cadets feel that their abilities were aligned with the rest of their class.

We expect this to be the case and it indicates that students in the ability group generally

recognize the structure of the experiment. The rest of the questions still offer some insight

even if the difference between the groups is not statistically significant. In general, the

randomly sectioned students feel they could understand complex problems more and are more

comfortable engaging in class than those sectioned by ability. This is a surprising finding and

perhaps suggests that ability sectioning does not achieve the hypothesized outcome. The same

percentage of students in both groups feel the pace of the course was just right with a few

more students in the random group feeling rushed. Finally, sentiment regarding whether

sectioning by ability would be advantageous was consistent across the two groups. 59% of

students feel that sectioning by ability would be either advantageous or strongly advantageous.

While the results of the student survey are inconclusive statistically when contrasting the

Ability and Random groups, there are still some interesting insights to be gained from them. It

appears as though students like the idea of sectioning by ability but the questions that aim to

measure how effective the method truly is are not as convincing and perhaps even indicate a

negative effect.

Figure 3: Cadet Survey Results

7.5. Faculty Results At the conclusion of the semester, we surveyed the MA206 faculty anonymously on their

inclinations with regards to ability group sectioning as well as their teaching pedagogy approach

from the semester. It should be noted that amongst the 29 sections there were only 13

instructors of which only nine instructors taught both ability and non-ability sections making

this the target population. Unfortunately, with only nine data points the sample size is

considered too small to truly provide analytical insights beyond just descriptive statistics.

Furthermore, of the nine instructors only seven responded to the survey (two of which are part

of this research group), further decreasing the sample size. Nevertheless, insight can be

observed from their feedback, specifically regarding their teaching approach and their

perception on the effectiveness of ability group sectioning. Preliminary findings, as shown in

the figure below, indicate that the faculty is split on whether they would prefer to teach in

ability grouped sections again in the future. One faculty member aptly summarizes the findings,

stating that while they have no preference in teaching by ability section in the future, there is a

substantial difference in the amount of additional instruction (AI) they provide in their lower

performing ability-grouped sections compared to their random sections. This disparity in

additional instruction is noted in almost all of the surveys, concluding that this effects their

workload, beyond normal class preparation and grading.

Figure 4. Instructor Feedback.

Across all surveys, the faculty unanimously note that the teaching pace was different in their

Ability sections, compared to their Random sections. This difference is also realized in those

that taught higher ability and lower ability sections. Due to this difference in pace, and to the

fact that instructors taught a mixture of ability and non-ability grouped sections, there is a

perceived difference in lesson preparation. Notably, instructors feel that they have to prepare

for two different classes. One instructor notes that the “randomized sections took less time in

lecture and less time in board work than in the low-ability sections. I could go further in detail

and explain more tangents.” Faculty also note that there is a difference in the level of

preparation between the higher ability sections and the lower and non-ability grouped sections,

adding to the evidence that preparation does differ between the varying groups.

Finally, with regards to the value of ability grouped sections, five of the seven instructors feel

that ability grouping is beneficial to the student. Similar to the literature review on the merits

and consequence of this approach some instructors feel strongly while others were rather non-

committal. Instructors also feel that the pace is consistent in ability sections and keeps all

students engaged, promoting peer pressure to keep up and fostering a learning environment.

Other faculty are not as convinced that a true benefit exists, thinking the learning responsibility

falls squarely on the students. One instructor stated, “I believe in theory that there is some

benefit to sectioning by ability. However, in application I think this is more of a function of

student desire. A student will perform based on how much work they put in. My goal, however,

was to fan the flame within them to get them to want to perform.”

8. Discussion The Research Team notes in the Section 6.4 that students did not perform as hypothesized,

namely that the ability grouped sections did not perform better than the random group. The

descriptive statistics support this findings and actually indicate that the ability grouped sections

performed slightly below the randomly assigned sections. The rationale for this is mixed and

requires a lookback at the Literature Review to help understand. Principally, the team

anticipated that students would enjoy being in the company of other like-minded students. As a

consequence, this should provide an atmosphere conducive to learning and that facilitates

discourse. Though the research does not indicate that this interaction did not occur the

quantitative results of higher grades did not materialize tangible results, or at least that are

significant enough to report. Therefore inspection of the student surveys are necessary and

thus indicate that the ability grouped section class pace was generally on the right side of “just

right”, or just a little too fast. This is equitable to the non-ability grouped sections and does not

provide as much insight as the Research Team desires. This can be viewed as there was not a

noticeable difference between the two sections and as a result, no comparison can be assessed,

though anecdotally the instructors could have made the adjustments of the class pace

throughout the course. This would support the literature, specifically literature that indicates

that instructors will adjust to where the mean of achievement of their classes are.19 This

adjustment is still left unrealized, as the student’s indicated that the pace was “too fast” in all

sections, though this may be more of a function of general student attitudes, specifically in

response to a course that is generally spread out between two semester, Probability and

Statistics, in other institutions.

However, it is still left unanswered as to why there was not any marked improvement between

the ability grouped sections. This is particularly odd as both the students and the faculty

identified that they recognize the positive potential for sectioning by ability. This benefit

19 Hopkins, 2006.

should have been realized through the culture of USMA, being competitive in nature, and

positive peer pressure, or the response of students to keep up with their peers. It, of course,

cannot be understated that there are additional factors that are not being considered, such as

the student’s course load, however, this would be marginal when considering all students

collectively. Therefore, this begs follow-on questions to whether the Research Team properly

grouped the sections by equivalent ability. It must be assumed yes, however, upon inspection

of student grades in the sections it can be seen that the grades have normal variability,

indicating that they were sectioned adequately. Inspection of the boxplots in Section 6.4,

which compares student averages on course-wide graded events, that compared each of the

sections indicates that there was very little variability between the sections. Therefore, the lack

of a net change cannot be determined topically. This again begs follow-on consideration and

possibly, in retrospect, the faculty surveys should have surveyed the faculty about their

perceived observations of student’s general participation and comfort within their sections.

This may have then been beneficial to compare to the student’s responses to determine if

perceptions of the sections were aligned, thus indicating benefit.

Finally, the results of this research does not mirror the same findings from LTC Hickman’s

research, which was conducted in a similar fashion as this research.20 Unfortunately, where his

research indicated positive findings in favor of ability group sectioning in Calculus courses the

same does not hold true to ability sectioning in Probability and Statistics courses. In comparison

the target populations are different (Plebes vs. Yearlings) though this slight deviation of

populations should not have caused such vastly different findings as all other variables are the

same. This may further tease out the question if there are other social behaviors at play or if

the results are purely academic.

9. Conclusion The ultimate question that the Research Team wants to answer is if there is benefit to grouping

by ability section and should the Department of Math and USMA continue to group by ability in

the future. The Research Team hypothesized that there would be a positive impact of

sectioning students by mathematical ability, however, based on the quantitative and qualitative

findings from this research the hypothesis cannot be supported as there is no significant

difference in academic performance between ability and randomly grouped sections. In

actuality, the ability grouped sections actually performed slightly worse overall on average

(81.04% vs. 80.60%), though the results were not significant. With these quantitative findings in

hand students still indicated an overall positive sentiment towards the benefit of sectioning by

ability without a noticeable difference in the pace of the classes. The faculty echoed this

sentiment and reported a weak sentiment to teach ability sections in the future only citing

length of preparation as the only discernable factor. Therefore, though the results are of this

research are not able to substantiate the research hypothesis it can still prove beneficial and

20 Hickman, 2007.

can help address future class development at the United States Military Academy by providing

a basis for future research in the matter.

References: Dietterich, T.G. “Ensemble Methods in Machine Learning. In: Multiple Classifier Systems.” MCS 2000. Lecture Notes in Computer Science, vol 1857. Springer, Berlin, Germany. 2000. Heitjan, Daniel F. & Basu, Srabashi “Distinguishing ‘Missing at Random’ and ‘Missing Completely at Random’.” The American Statistician, 2012, 50:3, 207-213. Hickman, Randal. “Ability Group Sectioning in an Undergraduate Calculus Curriculum.” West

Point, NY. 2007.

Hopkins, Gary, “Is Ability Grouping the Way to Go—or Should it Go Away,” Education World,

2006.

Kuhn, Max. “caret: Classification and Regression Training.” R Package version 6.0-78. Available

at https://CRAN.R-project.org/package=caret . 2017

Loveless, Tom. “Making Sense of the Tracking and Ability Group Debate.” 1998.

Office of the Dean, USMA. “Class profile of the Class of 2020.” Available at

https://www.usma.edu/oir/Class%20profiles/Class%20of%202020.pdf, 2016.

Princeton Review. “Most Accessible Professors”. Available at

https://www.princetonreview.com/college-rankings?rankings=most-accessible-professors,

2018.

Slavin, Robert E. “Achievement Effects of Ability Grouping in Secondary Schools: A Best-

Evidence Synthesis,” Review of Educational Research, fall of 1990, volume 60, 1990, pgs. 471-

499.

“The ACT Profile Report-National: Graduating Class 2016.” Available at

https://www.act.org/content/dam/act/unsecured/documents/P_99_999999_N_S_N00_ACT-

GCPR_National.pdf, 2016.

Tidwell, Gary. “Psychological Foundations of Teaching and Learning: Student Motivation by

Sectioning Students.” West Point, NY. 2007.

Venables, W. N. & Ripley, B. D. “Modern Applied Statistics with S. Fourth Edn. Springer, New

York. 2002.

Viar, William. “Tracking and Ability Grouping” 2008.

https://cran.r-project.org/package=caret

https://www.usma.edu/oir/Class%20profiles/Class%20of%202020.pdf

https://www.princetonreview.com/college-rankings?rankings=most-accessible-professors

https://www.act.org/content/dam/act/unsecured/documents/P_99_999999_N_S_N00_ACT-GCPR_National.pdf

https://www.act.org/content/dam/act/unsecured/documents/P_99_999999_N_S_N00_ACT-GCPR_National.pdf

pedagogical approach to ability group sectioning- a ... · pedagogical approach to ability group...

Documents