draft - jim gleason - jim gleason

Draft

STRUCTURAL VALIDITY AND RELIABILITY OF

TWO OBSERVATION PROTOCOLS IN

COLLEGE MATHEMATICS

by

LAURA ERIN WATLEY

JIM GLEASON, COMMITTEE CHAIRYUHUI CHEN

DAVID CRUZ-URIBEKABE MOEN

JEREMY ZELKOWSKI

A DISSERTATION

Submitted in partial fulfillment of the requirementsfor the degree of Doctor of Philosophy

in the Department of Mathematicsin the Graduate School of

The University of Alabama

TUSCALOOSA, ALABAMA

2017

Draft

Copyright Laura Erin Watley 2017ALL RIGHTS RESERVED

Draft

ABSTRACT

Undergraduate mathematics education is being challenged to improve, with peer eval-

uation, student evaluations, and portfolio assessment the primary methods of formative

and summative assessment used by instructors. Observation protocols like the Mathemat-

ics Classroom Observation Protocol for Practices (MCOP2) and the abbreviated Reformed

Teaching Observation Protocol (aRTOP) are another alternative. However, before these

observation protocols can be used in the classroom with confidence, a study needed to be

conducted to examine both the aRTOP and the MCOP2. This study was conducted at three

large doctorate-granting universities and eight masters and baccalaureate institutions. Both

the aRTOP and the MCOP2 were evaluated at 110 classroom observations in the Spring

2016, Fall 2016, and Spring 2017 semester. The data analysis allowed conclusions regarding

the internal structure, internal reliability, and relationship between the constructs measured

by both observation protocols.

The factor loadings and fit indices produced from a Confirmatory Factor Analysis (CFA)

found a stronger internal structure of the MCOP2. Cronbach’s alpha was also calculated to

analyze the internal reliability for each subscale of both protocols. All alphas were in the

satisfactory range for the MCOP2 and most were in the satisfactory range for the aRTOP.

Linear Regression analysis was also conducted to estimate the relationship between the

constructs of both protocols. We found a positive and strong correlation between each

pair of constructs with a higher correlation between subscales that do not contain Content

Propositional Knowledge. This leads us to believe that Content Propositional Knowledge

is measuring something completely different from the other subscales. As noted above and

detailed in the body of the work, we find support for the Mathematics Classroom Observation

Protocol for Practices MCOP2 as a useful assessment tool for undergraduate mathematics

classrooms with the addition of the Content Propositional Knowledge subscale of the aRTOP.

ii

Draft

DEDICATION

This dissertation is dedicated to my parents and my husband. To my parents, Douglas

and Edith: Thank you for your unconditional love, guidance, and support. You have always

believed in me and encouraged me to strive for my dreams. I would not be who I am today

without you. To my husband, Kyle: Thank you for the unwavering love, support, and

encouragement. You have made my dreams yours and given me the strength to accomplish

them.

iii

Draft

ACKNOWLEDGMENTS

The completion of this Dissertation would not have been possible without the support

and guidance of a few very special people in my life. I would first like to give thanks to our

Lord and Savior for leading me on this path. It is only through his grace and mercy for

without him none of this would be possible.

Next I would like to thank Dr. Jim Gleason for his endless support and encouragement.

You have been a patient and caring mentor during this process. I cannot tell you how much I

value the time and effort you have put into me and my aspirations. I would also like to thank

the other members of my dissertation committee: Dr. Yuhui Chen, Dr. David Cruz-Uribe,

Dr. Kabe Moen, and Dr. Jeremy Zelkowski. I am forever grateful for the invaluable input

that has led to a strong dissertation.

To the Mathematics Department at The University of Alabama, you hold my gratitude

for dedicating your time to sharing your passion for mathematics with students like me.

I would like to thank my entering Department Chair and Graduate Program Director, Dr.

Zijian Wu and Dr. Vo T. Liem, for accepting me into the program and encouraging me at the

beginning of this process. To the current Department Chair and Graduate Program Director,

Dr. David Cruz-Uribe and Dr. David Halpern, your encouragement and advisement in these

last years has been vital to my success.

To the MTLC instructors at The University of Alabama, it is because of you that I am

the teacher I am today. You have instilled in me a sense of what it is to love mathematics

and to share that love with others. I will never forget all you have taught me and shared

with me over the years.

To my fellow graduate students at The University of Alabama, I cannot imagine this

experience with anyone else. To Bryan Sandor and Anne Duffee, I am so glad we found each

other. You both have been there for me when the challenges of graduate school seemed too

great to overcome. The University of Alabama will always hold a special place in my heart.

iv

Draft

To the seventy-two mathematics instructors that selflessly allowed me to observe your

class for this study. You have done more than just open your classroom to me, you have

opened my eyes to new ideas and expanded my love for teaching. To the institutions that

allowed me to observe, I will always cherish the time I spent on your campus.

To the mathematics department at Troy University, you have instilled in me the foun-

dation that has led to my dissertation. You not only shared your passion of mathematics,

but you opened my eyes to the limitless possibilities in mathematics. I will never forget your

kind words and support. Troy University will always hold a special place in my heart.

I want to also acknowledge my family members who constantly supported me and be-

lieved that I could achieve my goals. To my parents, Douglas and Edith Watley, thank you

for your relentless encouragement, unfailing support, and unconditional love. None of this

would have been possible without you. Finally I want to thank my husband, Kyle Scarbrough

and our furry friend Wesley. You both have stood by me throughout this process. You have

been patient with me when I needed it, you celebrate with me when even the littlest things

went right, and you loved me through it all.

v

Draft

TABLE OF CONTENTS

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

CHAPTER 1 - INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

CHAPTER 2 - LITERATURE REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Student Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Reliability and Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Peer Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Observation Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Reformed Teaching Observation Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Mathematics Classroom Observation Protocol for Practices . . . . . . . . . . . . . . . . . . 26

CHAPTER 3 - METHODS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Aim of Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

CHAPTER 4 - RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Internal Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Internal Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Relationship between the Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

vi

Draft

CHAPTER 5 - DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50

Study Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Future Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57

APPENDICES

APPENDIX A. OVERVIEW OF OBSERVATION PROTOCOLS . . . . . . . . . . . . . . . . . . . . 71

APPENDIX B. INSTRUCTOR DEMOGRAPHICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76

APPENDIX C. INSTRUMENTS USED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

APPENDIX D. REGRESSION MODELS AND RESIDUAL PLOTS . . . . . . . . . . . . . . . . 93

APPENDIX E. IRB CERTIFICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100

vii

Draft

List of Tables

1 Subscales as Predictors of the RTOP Total Score . . . . . . . . . . . . . . . 23

2 Interpretation of the RTOP Factor Pattern . . . . . . . . . . . . . . . . . . 24

3 aRTOP Items and Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Brief Description of MCOP2 items . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Recommendations for Model Evaluation: Some Rules of Thumb . . . . . . . 40

6 Simple Linear Regression Results . . . . . . . . . . . . . . . . . . . . . . . . 48

7 Pearson’s Product-Moment Correlation . . . . . . . . . . . . . . . . . . . . . 49

8 Demographics Characteristics of the Sample . . . . . . . . . . . . . . . . . . 77

viii

Draft

List of Figures

1 Theoretical Model of aRTOP . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2 Theoretical Model of MCOP2 . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Confirmatory Factor Analysis Results: aRTOP . . . . . . . . . . . . . . . . 42

4 Confirmatory Factor Analysis Results: MCOP2 . . . . . . . . . . . . . . . . 43

5 Residual Plots of Regression Model 1 . . . . . . . . . . . . . . . . . . . . . . 45

6 Regression Model 1: Student Engagement and Inquiry Orientation . . . . . . 94

7 Regression Model 2: Student Engagement and Inquiry Orientation . . . . . . 95

8 Regression Model 3: Teacher Facilitation and and Inquiry Orientation . . . . 96

9 Regression Model 4: Teacher Facilitation and Content Propositional Knowledge 97

10 Regression Model 5: Inquiry Orientation and Content Propositional Knowledge 98

11 Regression Model 6: Student Engagement and Teacher Facilitation . . . . . 99

ix

Draft

CHAPTER 1

INTRODUCTION

The colleges and universities in the United States are being challenged to improve Sci-

ence, Technology, Engineering, and Mathematics (STEM) undergraduate education (Boyer

Commission on Educating Undergraduates in the Research University, 1998; National Re-

search Council, 1996, 1999, 2002, 2012; National Science Foundation, 1996, 1998), with

college and university STEM professors asked to bear the majority of the weight. These

same college and university professors are experts in their area of study, have received mul-

tiple degrees, and make contributions to their field resulting in awards and publications.

However, they have had little or no formal training in teaching and learning, and obtain

most, if not all, of their professional development in education during graduate school as

teaching assistants. Once they finish graduate school, the primary professional development

comes from reflection of the formative and summative assessments from peer observation,

student evaluations, and an assessment of their portfolio. Although each of these evaluation

methods can provide useful information, it is difficult to compare and analyze the infor-

mation obtained from these methods. In general these methods have broad questions that

create subjective information with low concurrence among raters.

A useful tool in the process of improving the quality of Science, Technology, Engineering,

and Mathematics (STEM) education is the development of aggregate methods to quantify the

state of teaching and learning in order to compare different teaching and learning strategies.

Observation protocols provide a quantifiable method useful for improving and strengthening

STEM undergraduate education. The two most common uses for observation protocols are to

support professional development and to evaluate teaching quality (Hora & Ferrare, 2013b).

Observation protocols provide a way to collect numerical data representing observed variables

1

Draft

describing the classroom environment and activities. This data can then be systematically

analyzed using statistical techniques and create meaningful ways to evaluate the scholarship

that professors use in their teaching.

The quantifiable understanding we gain from the use of observation protocols is in-

valuable. College and university professors can use this information to identify personal

strengths and weaknesses. They can easily compare and contrast the information obtained

from semester to semester to see a visual growth in their teaching effectiveness. The use of

observation protocols opens the door for professors to assess their teaching effectiveness in

different types of classrooms. The information that can be gained from observation protocols

is limitless at both the individual level and collectively for the university.

Although there are a multitude of observation protocols in use (see Appendix A), the

Mathematics Classroom Observation Protocol for Practices (MCOP2) and an abbreviated

Reformed Teaching Observation Protocol (aRTOP) are the most applicable toward the aim

of this study. The Mathematics Classroom Observation Protocol for Practices (MCOP2)

is used to measure the degree to which a mathematics classroom aligns with the Stan-

dards for Mathematical Practice from the Common Core State Standards in Mathematics

(National Governors Association Center for Best Practices, Council of Chief State School

Officers, 2010); recommendations from “Crossroads” and “Beyond Crossroads” of the Ameri-

can Mathematical Association of Two-Year Colleges (American Mathematical Association of

Two-Year Colleges (AMATYC), 1995, 2004); the Committee on the Undergraduate Program

in Mathematics Curriculum Guide from the Mathematical Association of America (Barker

et al., 2004); and the Process Standards of the National Council of Teachers of Mathematics

(National Council of Teachers of Mathematics, 2000). The MCOP2 is a 16 item protocol

that measures the two primary constructs of teaching facilitation and student engagement.

The Reformed Teaching Observation Protocol (RTOP) was designed by the Evaluation

Facilitation Group of the Arizona Collaborative for Excellence in the Preparation of Teachers

to measure “reformed” teaching and is said to be standards based, inquiry oriented, and stu-

2

Draft

dent centered (Piburn & Sawada, 2000). RTOP is a 25-item classroom observation protocol

on a 5 point Likert scale that measures the three primary constructs of lesson design and

implementation, content, and classroom culture. Although the RTOP has been the most

widely used observation protocol for mathematics classrooms during the past 10 to 15 years,

the review of literature revealed serious issues with the proposed structure and reliability

that led us to select the ten items we call the abbreviated Reformed Teaching Observation

Protocol (aRTOP).

There are limitations that we must account for in this study. One limitation is the use

of convenience sampling to collect the data. Time and travel costs have forced us to use this

sampling technique with our sample chosen strategically to include classroom observations

not likely to give us unusual data by sampling from a diverse range of institutions, based

on enrollment demographics and types of degrees offered, reasonably representing the larger

population of undergraduate institutions in the United States. The potential for observer

bias is another limitation to this project. Such biases could include gender, ethnicity, age,

teaching methodology, and course structure. Although it is impossible to remove the human

element from this study, we are well aware of the potential for observer bias and will make

every effort to avoid it. Being cognitive of potential biases and taking them into account is

the key strategy for avoiding researcher bias (Johnson & Christensen, 2014).

The goal of this project is to have a clear understanding of both the MCOP2 and the

aRTOP and the relationship between these two protocols as it relates to undergraduate

mathematics classrooms. Therefore, we hypothesize the following research questions:

1. What are the internal structures of the Mathematics Classroom Observation Protocol

for Practices (MCOP2) and the abbreviated Reformed Teaching Observation Protocol

(aRTOP) for the population of undergraduate mathematics classrooms?

2. What are the internal reliabilities of the subscales of the Mathematics Classroom Ob-

servation Protocol for Practices (MCOP2) and the abbreviated Reformed Teaching Ob-

servation Protocol (aRTOP) with respect to undergraduate mathematics classrooms?

3

Draft

3. What are the relationships between the constructs measured by the Mathematics Class-

room Observation Protocol for Practices (MCOP2) and the abbreviated Reformed

Teaching Observation Protocol (aRTOP)?

4

Draft

CHAPTER 2

LITERATURE REVIEW

Increased accountability in higher education has fostered a need to evaluate and develop

the effectiveness of undergraduate teaching in mathematics. The call for accountability

creates a demand in postsecondary institutions to provide quantifiable evidence of the effec-

tiveness of their academic programs (National Research Council, 2002). Unfortunately, there

is no widely accepted definition or agreed upon criteria of effective teaching at the under-

graduate level (Clayson, 2009). Student evaluations, peer evaluations, observation protocols,

and portfolios are the most common methods used in the current environment to evaluate

teaching effectiveness. This chapter will give a brief summary of each of the above listed

methods of measuring teaching effectiveness and review the benefits and barriers to each

evaluation method.

Content Knowledge for Teaching

What makes someone an effective teacher? Is having a strong understanding of teaching

procedures enough? Or is strong subject matter knowledge the key to effective teaching?

Shulman saw a strong disregard of the content being taught in the policies of the 1980s.

Shulman (1986) did not want to belittle the importance of the pedagogical skills being high-

lighted in these policies, but rather bring attention to the importance of content knowledge

for teachers by creating a theoretical framework to model the categories of content knowledge

which he identifies as subject matter content knowledge, pedagogical content knowledge, and

curricular knowledge.

5

Draft

Content knowledge for teaching refers to the amount and organization of knowledge in

the minds of the teachers. The understanding of facts and concepts is only a part of subject

matter content knowledge. It requires a much deeper understanding of the structure of the

subject matter. It is not enough for a teacher to merely understand something, but they

must also know why it is so, when it can be applied, and when is it weakened or no longer

applies (Shulman, 1986).

Subject matter knowledge is necessary, but not a sufficient condition for someone to be

an effective teacher. Shulman’s second category of content knowledge, pedagogical content

knowledge, is a combination of the teacher’s subject matter knowledge and the knowledge

utilized to teach that subject. A few of the examples Shulman provides of pedagogical

content knowledge are (a) the knowledge needed to represent and formulate the subject that

makes it comprehensible to others, (b) the knowledge of what makes a particular subject

difficult or easy to comprehend, (c) the knowledge of conceptions and misconceptions that

students bring with them from previous learning (Shulman, 1986).

Curricular knowledge is the knowledge of what programs are designed to teach a specific

subject to a given student level. It also includes the knowledge of the variety of materials

available to teach a specific subject. Most importantly, it is the knowledge teachers use to

select or reject a particular curriculum in a given circumstance. In addition to the knowledge

of curriculum materials, curricular knowledge includes lateral curriculum knowledge (rela-

tionship of the content to other subjects) and vertical curriculum knowledge (relationship of

the content to previous and future learning of the same subject).

Shulman’s theoretical framework was designed to focus on the nature and type of knowl-

edge needed for teaching a subject. He did not provide us with a list of necessary knowl-

edge for any particular subject areas, rather Shulman’s paper acted as a catalyst for other

researchers to expand on his ideas into their particular subjects. In 2008, Ball and her

colleagues examined and expanded Shulman’s ideas in the context of mathematics. Ball,

6

Draft

Thames, & Phelps (2008) developed in more detail Shulman’s idea of subject matter knowl-

edge for teachers in the context of mathematics.

Ball, Thames, & Phelps (2008) divided subject matter knowledge into three domains:

common content knowledge, specialized content knowledge, and horizon content knowledge.

Common content knowledge is the mathematical knowledge that teachers use, but is not

specialized to the work of teachers. Ball believes there is common content knowledge (CCK)

which teachers use that is also used in settings other than teaching.

Ball, Thames, & Phelps (2008) define specialized content knowledge (SCK) as the knowl-

edge and skills unique to teaching mathematics. Ball, Thames, & Phelps (2008) state, “this

work (SCK) involves an uncanny kind of unpacking of mathematics that is not needed – or

even desirable – in settings other than teaching” (p. 400). The distinction between common

content knowledge and specialized content knowledge, while clear in the elementary school

context, becomes more difficult to measure at the undergraduate level.

Horizon content knowledge is the third domain of Ball, Thames, & Phelps (2008) and

corresponds to portions of Shulman’s curricular knowledge. This includes the knowledge

of how to introduce a specific topic with the prior and future understandings of this topic

in mind. There is still some concern over whether this should be solely in the category of

subject matter knowledge or if it should be included in other categories.

Ball also expanded Shulman’s idea of pedagogical content knowledge into three domains.

Ball, Thames, & Phelps (2008) tell us, “two domains - knowledge of content and students

(KCS) and knowledge of content and teaching(KCT)coincide with the two central dimensions

of pedagogical content knowledge identified by Shulman” (p.402). KCS is the combination

of the knowledge teachers know about their students and mathematics. Teachers must

understand how their student will approach a particular problem and the struggles they

will encounter. Alternatively KCT is the combination of the knowledge teachers know about

mathematics and teaching. For example, teachers have to know the order to introduce topics

and what to spend more time on. Ball also included Shulman’s curriculum knowledge as

7

Draft

a domain of pedagogical content knowledge based on the work of Grossman, Wilson, &

Shulman (1989). Although Ball placed curriculum knowledge under Pedagogical Content

Knowledge, there is still some concern if it should only be there or in several different

categories.

Shulman (1986) poses the question of how expert students become novice instructors.

We must ask ourselves, how do teachers acquire knowledge of teaching? Most college and

university professors are experts in the content they are teaching, but most do not have any

formal background in education. Most professors have completed little to no preparation

teaching programs and typically have not taken any education courses (Speer & Hald, 2008).

The majority of training is based on the limited supervised training professors obtain during

their graduate teaching assistantship in graduate school.

The research of Speer & Hald (2008) assert that mathematics education research in K-12

has sought to document the extent teachers possess pedagogical content knowledge and the

effect it has on students learning and teaching practices. Similar research in higher education

is just now emerging and is relatively scarce. The research available on higher education

pedagogical content knowledge is focusing on Graduate Teaching Assistants (GTA) and their

training programs. The dissertation of Ellis (2014) gives us a wealth of information on GTA

professional development programs and GTA beliefs and practices. Another dissertation

focused on the differences in the beliefs and practice of international and U.S. domestic

mathematical teaching assistants (Kim, 2011). Kung & Speer (2007) focus their research on

the need for professional development activities for GTAs and the empirical research needed

to create these activities.

Being knowledgeable in mathematics is necessary, but alone is not a sufficient condition

for an instructor to create good learning opportunities for students (Speer & Hald, 2008). If

we could improve the mathematic instructor’s knowledge of student thinking, Kung & Speer

(2007) believe this will foster better learning opportunities for students. The hope is this

will, in turn, lead to improved student achievement.

8

Draft

Student Evaluations

With the increase in accountability within higher education, student evaluations are

becoming even more widely used as a measure of quality in university teaching. Clayson

(2009) brings to our attention that student evaluations of teaching are one of the most

well researched, documented, and long lasting debates in the academic community. In fact,

d’Apollonia and Abrami (1997) stated “most postsecondary institutions have adopted stu-

dent ratings of instruction as one (often the most influential) measure of instructional effec-

tiveness” (p. 1198). Chen and Hoshower (2003), as well as, Benton, Cashin, and Kansas

(2012) propose that student evaluations are commonly used to provide formative feedback to

faculty for improving teaching, course content and structure; a summary measure of teach-

ing effectiveness for promotion and tenure decisions; and information to students for the

selection of courses and teachers.

Student evaluations of instruction were first introduced to the United States in the

mid 1920’s (Algozzine et al., 2004). Since then there have been waves of research, including

studies which have verified the validity and effectiveness of student ratings. However, student

evaluations have not always been met with complete acceptance, and so we think it is

important to now discuss some of the most common misconceptions.

The literature on student evaluations varies widely. While others believe some of these

items are factual, Benton (2012), Feldman (2007), and Kulik (2001) believe student evalu-

ations are (a) only a measure of showmanship, (b) indicators of concurrence only at a low

level, (c) unreliable and invalid, (d) time and day dependent, (e) student grade dependent,

(f) not useful in the improvement of teaching, and (g) affected by leniency in grading result-

ing in high evaluations. These myths seem to persist even though there is over fifty years of

credible research showing the reliability and validity of student evaluations. This research

has been ignored for reasons that include personal biases, suspicion, fear, ignorance, and

general hostility towards any evaluation process (Feldman, 2007; Benton & Cashin, 2012).

9

Draft

Since teaching is comprised of many characteristics, Spooren, Brockx, and Mortelmans

(2013) believe it is widely accepted that student evaluations are considered multidimensional.

Jackson et al. (1999) warns there has been a dispute in the research as to the number of

dimensions or the nature of these dimensions. This causes the student evaluation instruments

to vary greatly in the item content and the number of items.

In the 1990’s, researchers including Abrami & D’apollonia (1990) debated the use of

global constructs for the evaluation of teaching effectiveness. Eventually they came to a

compromise that the use of both specific dimensions and global measures could be used for

an overall rating. More recent research supports the multidimensionality of teaching, by re-

porting that higher order factors can reflect general teaching effectiveness (Apodaca & Grad,

2005; Burdsal & Harrison, 2008; Cheung, 2000). The research of Burdsal and Harrison(2008)

and Spooren et al. (2013) provides evidence that both a multidimensional profile and an

overall evaluation are valid indicators of students’ perception of teacher effectiveness.

Reliability and Validity

Reliability refers to consistency, stability, and generalizability of data, and in the context

of student evaluations, most often refers to the consistency of the data (Cashin, 1995). The

consistency of student evaluations is highly influenced by the number of raters. In general,

the more raters the more dependable the ratings are. Also multiple classes provide more

reliable information than a single class. Benton et al. (2012) suggests the use of more than

one class if there are fewer than 10 raters in order to improve reliability.

The validity of student evaluations have been extensively debated over the years with

researchers often disagreeing as to the extent to which student evaluations measure the

construct of teaching effectiveness. A primary driver in this debate is the lack of agreement on

what defines effective teaching. One method to determine the validity of student evaluations

involves its relation to other forms of evaluation. The agreement or disagreement of these

other evaluation methods can give us greater insight into the validity of student ratings.

10

Draft

Logically, the best way to measure effective teaching would be to base it on the re-

sulting student learning and understanding. One would assume that a teacher who has a

high student evaluations would also have students that are highly successful. Davis (2009)

states, “Ratings of overall teaching effectiveness are moderately correlated with independent

measures of student learning and achievement. Students of highly rated teachers achieve

higher final exam scores, can better apply course material, and are more inclined to pursue

the subject subsequently” (p. 534).

In a study at Minot State University, Ellis, Burke, Lomire, and McCormack (2003)

found that courses with the highest average grades were taught by teachers who received

the highest rating from their students. The study was composed of 165 undergraduate

courses taught by 24 instructors. Ellis reported a weak, but significant positive correlation

(r = 0.35, p < .01) existing between average ratings of teachers and average grades received

by students. Ellis et al. (2003) warns this relation maybe due to numerous factors, but

it is most likely that giving higher grades to students results in more favorable student

evaluations. Clayson (2009) affirms that “as statistical sophistication has increased over

time, the reported learning/SET (student evaluation of teaching) relationship has generally

become more negative” (p. 26).

Despite the comparison of student evaluations to colleague ratings, expert judges ratings,

graduating seniors and alumni ratings and student learning provide evidence of validity,

many researchers are still concerned with the ability of students to be easily swayed by

superficialities (Socha, 2013). Also, the concern that students have the ability to be effective

evaluators of teaching competency plagues researchers. Algozzine et al. (2004) warns that

student ratings should only be influenced by characteristics that represent effective teaching

and not by sources of bias. Marsh (1984) defines a bias of student ratings as “substantially

and causally related to the ratings and relatively unrelated to other indicators of effective

teaching” (p. 709).

11

Draft

One of the most controversial and often most discussed concern is that high ratings can

be solely based on the faculty member’s “entertaining” ability. The Dr. Fox Effect, as it

is also known, is a study where an actor lectures (Ware Jr & Williams, 1975). Although

“Dr. Fox” did not cover any material, he received a high rating because of his “entertaining”

value. Wachtel (1998) states, “This was thought to demonstrate that a highly expressive

and charismatic lecturer can seduce the audience into giving undeservedly high ratings” (p.

200). Since the original study, Marsh (1982) cites several experts in the field that have raised

question as to its validity.

In classrooms where there are incentives to understand the material, earlier studies found

that content covered has a much greater impact on student ratings then expressiveness. So-

jka, Gupta, & Deeter-Schmelz (2002) found that students and teachers have a different

perception of how a faculty member’s “entertaining” ability effects student ratings. Faculty

believed that the ability to entertain has a great influence on ratings, while students strongly

disagreed. Shevlin, Bamyard, Davies, and Griffiths (2000) state that expressiveness of teach-

ers is positively correlated with student evaluations regardless of the content taught. They

found that the charisma factor accounted for 69% of the variation in the rating of a teacher’s

ability as determined by student ratings (Shevlin et al., 2000).

The relationship between gender and student evaluations remains undetermined. One

study by Ellis et al. (2003) found that the gender of the instructor was not significantly

correlated with the student ratings. While another study by Centra (2009) indicated gender

preferences, mainly the ratings of female instructors by female students. The research study

of Centra and Gaubatz (2000) agreed with this conclusion but warned that even though

these ratings are statistically significant, they have little practical importance.

In comparison to other instructor variables, there is a relatively small amount of quan-

titative data exploring this relationship. According to Merritt (2008), there is a lack of em-

pirical research examining the relationship between race and student evaluations. A study

conducted by Hamermesh & Parker (2005) of 436 classes reported minority faculty members

12

Draft

received lower teaching evaluations than majority instructors. Non-native English speakers

also received substantially lower ratings than their native speaking counterparts.

Logically, faculty ranking will have an impact on student evaluations. In a study con-

ducted by Centra (2009) with 1539 teaching assistants the overall evaluation of the quality

of teaching in a course had a mean score of 3.83 on a 5 point scale. While their higher rank-

ing colleagues, assistant professors and above, scored about a third of a standard deviation

higher on the overall evaluation. There is some question as to whether ranking or years of

experience are being represented in this study since the two correlate. However, Ellis et al.

(2003) found there was no significant correlation between years taught and the ratings of the

same instructor by students.

Like instructor variables, individual student variables can also influence evaluations of

teaching. Variables studied include age, gender, motivation, and personality of the student.

Also, individual academic characteristics of the student have been studied. Some of these

variables include scholastic level of the student, GPA, and reason for taking the course. Age

(Centra, 1993), gender (Feldman, 1977, 1993), and the level of students (McKeachie, 1979)

are not currently being researched, but have been in the past.

Student GPA and college required classes are two of the individual academic character-

istics that are currently being researched. In “Tools for Teaching”, Davis (2009) summarizes

the research on the relationship between student evaluations and student GPA. Citing sev-

eral authors, Davis (2009) concludes that there is little to no relationship noted for this

particular variable(Marsh & Roche, 2000; Abrami, 2001). Conversely, research has found

a slight bias against college-required courses. This is understandable given students may

be required to take a class in which they have little interest or background. Centra (2009)

suggests even though there is only a slight bias, institutions should take this into account

when reviewing student evaluation data.

The expected grade is probably the most researched student variable related to student

evaluation of instruction. Eiszler (2002) found that student evaluations are a small con-

13

Draft

tributor to grade inflation over time. Centra (2009) reports the correlation of .20 between

expected grades and teacher effectiveness. While Ellis et al. (2003) states, “the magnitude

of the correlation has been in the range of .35 to .50, meaning that roughly 12% to 25%

of the variance in ratings might be accounted for by varying grading standards” (p. 39),

they mention several researchers, including Mehdizadeh (1990) and Krautmann & Sander

(1999), that found a positive correlation between the expected (or received) course grade

and student evaluations.

We also recognize that the actual courses have variables that the instructor cannot influ-

ence. For instance, class size, topic difficulty, and the level of the course are all characteristics

of the course beyond the control of the instructor. The time of day a class is taught is another

course variable that has been of interest to researchers in the past (Aleamoni, 1981; Feldman,

1978). The relationship between student evaluations of teaching and course characteristics

has been the subject of research over the years, and the results have been inconsistent.

Research on course variables varies widely. Student evaluations did not significantly

correlate with the level of the course according to Ellis et al. (2003). However, lower level

classes generally receive lower ratings than higher level classes. This is especially true for

graduate level classes, however this difference tends to be small (Benton & Cashin, 2012).

Benton et al. (2012) suggests the development of local comparative data to help control for

this difference. Class size can also have an effect on student evaluations. Most researchers

have found that larger classes cause instructors to receive lower evaluations. Ellis et al.

(2003) reports that class size was correlated significantly, while Hoyt and Lee (2002) found

that it was not always statistically significant.

The academic discipline of the class being taught can affect the student ratings. In a

study by Centra (2009), an average mean of 3.87 on a 5 point scale was found for courses in

natural sciences, mathematics, engineering, and computer science. While the overall rating

for humanities (English, history, language) was a mean of 4.04. This was about a third of a

standard deviation less. Some have attributed this difference to the growth of knowledge in

14

Draft

these natural sciences causing teachers to cover increasing amounts of material. The meta-

analysis by Clayson (2009) supported these differences and stated that academic disciplines

are important variables to consider when reviewing student evaluation data.

Course load and difficulty are correlated to student evaluations, but not largely. Sur-

prisingly the correlation is positive, because students tend to give higher ratings to more

difficult courses that call for hard work (Marsh, 2001). Centra (2003) used a large data base

of classes and not surprisingly found that both too elementary and too difficult classes were

rated poorly. The findings indicated that classes balanced in the middle were the highest

rated classes.

Consideration must be given to the manner (paper vs. electronic) in which the stu-

dent evaluations are collected. Ballantyne (2003), Bullock (2003), Spooren et al. (2013),

and Tucker, Jones, Straker, and Cole (2003) offer us the following reasons for the move

from paper to electronic student evaluations; timely and accurate feedback, no interruption

in class time, more accurate analysis of data, ease of access to students, greater student

anonymity, decreased faculty influence, more detailed written comments, and lower cost and

time demand for administrators.

One of the major concerns of online student evaluations is the response rate. Online

survey response is much lower than that of traditional paper surveys with Dommeyer, Baum,

Hanna, and Chapman (2004) reporting an average response rate of 70% for in-class surveys

and 29% for online surveys. Although there is a decrease in number of surveys being re-

ceived, studies by Leung and Kember (2005) show no significant differences between the data

obtained from paper and electronic evaluations. These results lead us to conclude that the

differences in manner (paper vs. electronic) did not affect the validity of student evaluations.

Since the very first reports on student evaluations by Remmers and Brandenburg (1928,

1930; 1927), there have been thousands of reports covering various topics on these evalua-

tions. Student evaluations can provide useful information about the instructor’s knowledge,

organization and preparation, and ability to communicate clearly. According to Chen and

15

Draft

Hoshower (2003), “while the literature supports that students can provide valuable informa-

tion on teaching effectiveness given that the evaluation is properly designed, there is a great

consensus in the literature that students cannot judge all aspects of faculty performance” (p.

73). Despite the controversies, student evaluations are still the most widely used evaluation

method. In general, researchers are in agreement that no single source of evaluation, includ-

ing student evaluations, can provide sufficient information in order to make valid judgments

on effective teaching.

Peer Evaluations

Compared to the extensive research on student evaluations of teaching, few studies exist

on peer evaluations and are limited in scope. The National Research Council (2002) found

that direct observation of teachers over an extended period of time by their peers can be

a highly effective means of evaluating an individual instructor. Even though professional

accountability in higher education has grown over the years, peer evaluations are not a

dominant practice in the assessment of teaching at most colleges and universities (Thomas,

Chie, Abraham, Raj, & Beh, 2014).

The scope of peer evaluations is not limited to what can be observed in a classroom, but

can include course outlines, syllabi, and teaching materials. Hatzipanagos and Lygo-Baker

(2006) suggest that peer reviews include observation of lectures and tutorials, monitoring

online teaching, examining curriculum design, and the use of student assessments. Peer

evaluations also create ways to improve on the adherence of the ethical standards set forth

by the university. Based on the above, we note that peer evaluations are more than just

classroom observations and can be instrumental in curriculum and professional development.

There are many benefits of peer review in developing faculty members. Peer reviews fur-

ther the development of teachers through the expert input from colleagues’ experience and

knowledge (Kohut, Burnap, & Yon, 2007). Peer evaluations are not just about identifying

places that need improvement, but also strengths. The benefits concluded from the literature

16

Draft

by Thomas et al. (2014) include the validation of teaching practices already being imple-

mented, inspiring different teaching perspectives, fostering learning about teaching methods,

and development of peer respect. Both the observer and the teacher being observed can use

this evaluation process to reflect on how to improve their teaching methods (Kohut et al.,

2007).

According to Bernstein, Jonson, and Smith (2000), only feedback gained from knowl-

edgeable peers leads to growth of teaching to its greatest potential. However, Thomas et

al. (2014) warns that peer evaluations are most beneficial towards quality teaching develop-

ment if the peer review program includes clear, straightforward, and transparent structure;

engagement in professional discussion and debate among participants; focus on the develop-

ment of teaching and learning to maintain the motivation and commitment toward the peer

review process; and willingness to consider that difficulties that may arise when engaging in

professional development activities.

Unfortunately, there are also many barriers to peer review of teaching unless the observa-

tions are part of a carefully conceived, systematic process (Wachtel, 1998). One of the major

barriers of peer observation is the low level of concurrence among observers due to personal

bias of teaching behaviors and inexperienced observers. Although faculty are experts in their

area of study, most do not have any formal training in education. Another barrier is that

peer evaluations generally are not a part of the culture of teaching and learning. Researchers

seem to agree that peer evaluation must be coupled with other evaluation methods in order

to provide accurate information.

Despite these reservations, peer evaluations are still an effective way to improve teaching.

Peer evaluation can provide the opportunity for faculty to learn how to be more effective

teachers, to get regular feedback on their classroom performance and to receive support from

colleagues. Educators advocate multiple sources for teaching improvement or for teaching

evaluation, and classroom observations provide a source of input that can be balanced against

some of the other more common forms of instructional feedback such as student evaluations

17

Draft

(Wachtel, 1998). Most importantly peer evaluation can provide a third party observation of

what is occurring in a college classroom. This visualization can foster a renewed satisfaction

in teaching.

It is becoming obvious to increasing numbers of faculty that successful teachers are

not only experts in their fields of study but also knowledgeable about teaching strategies

and learning theories and styles, committed to the personal and intellectual development

of their students, cognizant of the complex contexts in which teaching and learning occur,

and concerned about colleagues’ as well as their own teaching (Keig & Waggoner, 1994).

The use of peer evaluations can provide a wealth of information that can lead to enhanced

teaching. Although there are numerous problems that cause concern to the validity of peer

evaluations, it can provide a vast amount of knowledge when coupled with other evaluation

methods.

Portfolios

Unlike other evaluation methods, which can only shed light on a small part of a teacher’s

effectiveness, portfolios have the ability to convey a broad range of a teacher’s skills, attitude,

philosophies, and achievement. Seldon and Miller (2009) define a portfolio as a reflective,

evidence-based collection of materials that document teaching, research, and service. A

professor’s portfolio usually includes an assertion about their teaching effectiveness along

with supporting documentation (Burns, 2000). This could include a sample syllabi, student

work, student ratings, and comments from both students and colleagues.

There are many benefits of portfolios. Portfolios are not simply an exhaustive collection

of all the documents and materials a teacher has, but rather a balance listing of professional

activities that provide evidence of teacher effectiveness (Seldin & Miller, 2009). They can

allow faculty to exhibit their teaching accomplishments to colleagues (Laverie, 2002). Burns

(2000) states that some institutions are beginning to require a portfolio as part of their

18

Draft

post tenure review. The key benefits of a portfolio, according to Seldin (2000), are that it

encourages faculty to reflect on their teaching and to improve their teaching.

Portfolios also have many negative qualties. Although there are numerous researchers

that praise the portfolio’s ability to improve teaching, Burns (2000) affirms that there are

no experiments that support this claim and goes on to even state, “The only experiment

that I could locate that compared teaching ratings before and after portfolio construction

concluded that these ratings did not improve significantly” (p. 45). When the impact of a

mandatory portfolio was studied by some researchers, the concern was that the creation of

the portfolio was the focus and not the improvement of teaching. Some of the other concerns

of faculty are: Is the time and energy that is takes to prepare a portfolio worth it? Does

the administration know how to use the information collected from the portfolio? For new

faculty, would a portfolio not be counterproductive?

With all these questions being posed, there is little research being conducted to answer

these questions. Although a portfolio has the ability to be a very useful tool in the assessment

of teaching effectiveness, without the reliability and validity of this instrument being known,

what do they really represent? Given the research that exists we have to view portfolios

with some reservation. Like all other evaluation methods, portfolios cannot stand alone but

is one more tool that if combined with other methods can be useful in evaluating teacher

effectiveness.

Observation Protocols

Classroom observations are direct observations of teaching practices, where the observer

either takes notes and/or codes teaching data live in the classroom or from a recorded video

lesson. The two most common uses for observation protocols are to support professional

development and to evaluate teaching quality (Hora & Ferrare, 2013b). We note that while

classroom observations are a very common practice in K-12 schools, observations are less

common in postsecondary settings with further theoretical development and testing needed.

19

Draft

Observation protocols development for K-12 is more advanced due in part to policies govern-

ing teaching evaluations (Hora, 2013). Postsecondary observation protocols are traditionally

less developed in terms of psychometric testing and conceptual development(Hora & Fer-

rare, 2013b). Unfortunately observation protocols in higher education trail far behind that

of K-12 (Pianta & Hamre, 2009). The most recently developed and currently utilized obser-

vation protocols in colleges and universities center on science, technology, engineering, and

mathematics (STEM) teaching (Hora & Ferrare, 2013b).

The development of aggregate methods of improvement in the quality of STEM edu-

cation is on the minds of institutions, disciplines, and national agencies (Seymour, 2002).

Smith, Jones, Gilbert, and Wieman (2013) cite several of these agencies that stress more

effective teaching in STEM courses, such as the President’s Council of Advisors on Science

and Technology Engage to Excel report (2012) and the National Research Council Discipline-

Based Education Research report (2012). The shift in teaching and learning of science and

mathematics towards student centered instruction and active learning is increasing (Freeman

et al., 2014; Gasiewski, Eagan, Garcia, Hurtado, & Chang, 2012; Michael, 2006).

In The Greenwood Dictionary of Education (Collins & O’Brien, 2003), student-centered

learning (SCL) is defined as an “approach in which students influence the content, activities,

materials, and pace of learning” (p. 338-339). If SCL is applied correctly it can lead to a

growth in student enthusiasm to learn, retention of knowledge, understanding, and attitude

towards the subject being taught. Michael (2006) defies active learning as engaging the

students in activities that require some sort of reflection on the ideas. Students should be

actively gathering information, thinking, and problem solving during a class that uses active

learning. The meta-analysis by Freeman et al. (2014) of classrooms using active learning

reported that the average examination score improved by 6 % over traditional lecturing. He

also reported that students in traditional lecturing classes were 1.5 times more likely to fail

than those in active learning classes.

20

Draft

Sawada et al.(2002) warns that the development and use of an evaluation instrument

that supports these efforts is problematic and controversial and higher education institutions

find it difficult to identify alignment of teaching to this construct. Walkington et al. (2012)

believes that classroom observations are one of the best methods to combine with student

achievement to get a measure of teaching effectiveness. However, “generic observation in-

struments aimed at all disciplines and employed by observers without disciplinary knowledge

are not sufficient” (p. 3). A protocol that is generic enough to be useful in a mathematics

and history class will lack complete understanding of the learning and teaching process (Hora

& Ferrare, 2013b). It is not reasonable to expect that a protocol can be useful and generic

enough to work for all different types of subject matter given the obvious differences between

disciplines.

There are two main types of observation protocols; unstructured (open-ended) and struc-

tured (Hora & Ferrare, 2013b). Unstructured protocols may not even indicate what the ob-

server should be looking for and in general do not have fixed responses. Although responses

to open-ended questions can be very useful to the observer and the instructor, the data is

very dependent on the observer and cannot easily be standardized (Smith et al., 2013). This

leads to difficulty to compare this data across multiple classrooms.

On the other hand, observers respond to a structured protocol with a common set of

statements or codes (Smith et al., 2013). The data that is produced is easily standardized

and can be used to compare multiple classrooms. The drawback to most structured protocols

is the requirement of some sort of multi-day training in order to have inter rater reliability

(Sawada et al., 2002). Observers must also pay close attention to the behavior of the teacher

and/or the students to assess the predetermined classroom dynamics.

It is impossible to include all the observation protocols that are used to evaluate under-

graduate courses, but Appendix A presents a brief summary of some of the existing protocols.

The two protocols used for this study are described in more detail below.

21

Draft

Reformed Teaching Observation Protocol

The Reformed Teaching Observation Protocol (RTOP) is probably the most widely used

STEM-specific observation protocol to date. This instrument was designed by the Evaluation

Facilitation Group of the Arizona Collaborative for Excellence in the Preparation of Teachers

(ACEPT) to measure “reformed” teaching. Sawada et al. (2002) tells us that during the

development of the RTOP the Evaluation Facilitation Group (EFG) affirmed that “the

instrument would have to be focused on both science and mathematics, standards based,

focused exclusively on reform rather than the generic characteristics of good teaching, easy

to administer, appropriate for classrooms K-20, valid, and reliable” (p. 246).

RTOP is a 25-item classroom observation protocol on a 5 point Likert scale that is said

to be standard based, inquiry oriented, and student centered. The items are divided into

three subsets: Lesson Design and Implementation (5), Content (10), and Classroom Culture

(10). The first subset containing items 1-5 are designed to capture what the reference manual

calls the ACEPT model for reformed teaching. The second subset focuses on content and

is divided into two parts. These are Propositional Pedagogic Knowledge (items 6-10) and

Procedural Pedagogic Knowledge (items 11-15). The third subset is also divided into two

equal parts that analyze the classroom culture called Communication Interaction (items

16-20) and Student/Teacher Relationships (items 21-25).

After the initial development testing and redesign, a team of nine trained observers col-

lected 287 RTOP forms from the observation of over 141 mathematics and science classrooms.

The team consisted of seven graduate students and two faculty members. The classrooms

included ranged from middle school, high school, community colleges, and universities. Of

the 141 classrooms observed, only 38 (27%) were mathematics classrooms. Of the math-

ematics classrooms included only 13 (34%) came from community college and university

observations. Since less than 10% of the sample focused on the undergraduate mathematics

classroom, and since these were exclusively mathematics courses designed for pre-service

22

Draft

elementary teachers, a more thorough analysis is necessary to determine the reliability and

structure of the instrument for general undergraduate mathematics classrooms.

Using the data collected by the nine trained observers, the inter rater reliability was

obtained by computing a best-fit linear regression of the observation of one observer on

those of the other with a correlation coefficient of 0.98 giving a shared variance between

observers of 95%. Additionally, Cronbach’s alpha for the whole instrument was reported to

be an astonishing 0.97, implying a high degree of uniformity across items, with the sub-scale

alphas ranging from 0.80 to 0.93 (Piburn & Sawada, 2000; Sawada et al., 2002). This verifies

that the RTOP has extremely strong internal consistency and can likely retain a reasonable

reliability with significantly fewer items.

The RTOP is divided into 5 sub-scales in order to test the hypothesis that “Inquiry-

Orientation” is a major part of the structure of RTOP (Piburn & Sawada, 2000). The

subscales and their R-squared totals are in Table 1. Piburn & Sawada note that the high

R-squares offer a very strong support of the construct validity. However, such high pre-

dictability of the total score by four of the sub-scales implies, at most, a two factor structure.

Table 1

Subscales as Predictors of the RTOP Total Score

R-squared as aPredictor of Total

Subscale 1: Lesson Design and Implementation 0.956Subscale 2: Content Propositional Pedagogic Knowledge 0.769Subscale 3: Content Procedural Pedagogic Knowledge 0.971Subscale 4: Classroom Culture Communicative Interactions 0.967Subscale 5: Classroom Culture Student/Teacher Relationships 0.941(Piburn & Sawada, 2000, p. 12)

An exploratory factor analysis was also conducted by Piburn & Sawada (2000) of the

25 items on the RTOP protocol using a database containing 153 classroom observations

and reported that an earlier reliability study implied the number of principal components

to be very small. Two strong factors and one weak factor were found to be appropriate

23

Draft

and interpretable. Component 1 had an eigenvalue of 14.72, while component 2 and 3

have significantly lower eigenvalues of 2.08 and 1.18, respectively. These low factor loadings

indicate how weakly component 2 and 3 influence the measured factor. This was proven

further by the introduction of the following chart (Table 2) in the RTOP reference manual.

Table 2

Interpretation of the RTOP Factor Pattern

FactorRTOP Items 1 2 3

1. The instructional strategies and activities respected studentsprior knowledge and the preconceptions inherent therein.

**

2. The lesson was designed to engage students as members of alearning community.

****

3. In this lesson, student exploration preceded formal presenta-tion.

****

4. This lesson encouraged students to seek and value alternativemodes of investigation or of problem solving.

****

5. The focus and direction of the lesson was often determined byideas originating with students.

***

6. The lesson involved fundamental concepts of the subject. ****7. The lesson promoted strongly coherent conceptual under-

standing.***

8. The teacher had a solid grasp of the subject matter contentinherent in the lesson.

**

9. Elements of abstraction (i.e., symbolic representations, theorybuilding) were encouraged when it was important to do so.

*

10. Connections with other content disciplines and/or real worldphenomena were explored and valued.

**

11. Students used a variety of means (models, drawings, graphs,concrete materials, manipulatives, etc.) to represent phenom-ena.

**

12. Students made predictions, estimations and/or hypothesesand devised means for testing them.

****

13. Students were actively engaged in thought-provoking activitythat often involved the critical assessment of procedures.

***

14. Students were reflective about their learning. ***15. Intellectual rigor, constructive criticism, and the challenging

of ideas were valued.***

16. Students were involved in the communication of their ideas toothers using a variety of means and media.

***

17. The teachers questions triggered divergent modes of thinking. **

24

Draft

FactorRTOP Items 1 2 3

18. There was a high proportion of student talk and a significantamount of it occurred between and among students.

***

19. Student questions and comments often determined the focusand direction of classroom discourse.

**

20. There was a climate of respect for what others had to say. * **21. Active participation of students was encouraged and valued. ** *22. Students were encouraged to generate conjectures, alternative

solution strategies, and ways of interpreting evidence.**

23. In general the teacher was patient with students. ****24. The teacher acted as a resource person, working to support

and enhance student investigations.****

25. The metaphor “teacher as listener” was very characteristic ofthis classroom.

***

* (0.5 - 0.59), ** (0.60 0.69), *** (0.70 0.79), **** (0.80 0.99)(Piburn & Sawada, 2000, p. 16)

Factor 1 named “inquiry orientation” draws heavily on all five sub-scales with the ex-

ception of sub-scale 2. While factor 2 labeled “content propositional knowledge” draws

exclusively on sub-scale 2. Factor 3, which is labeled “student/teacher relationship”, ac-

counts for less than five percent of the variance and only has three items that load on it. As

such, it is believed that a subset of the items from the RTOP could be used as an abbreviated

protocol measuring the same constructs as the original. Therefore, for the current study we

will use an abbreviated instrument (aRTOP) composed of items with large loadings onto the

two primary factors (See Table 3).

For the second factor of the aRTOP, focused on the content knowledge related to the

lesson, we include all 5 items from the original Subscale 2, as these items are likely to measure

something different from the remaining 20 items of the original RTOP. For the first factor

of this abbreviated instrument, focused on the inquiry orientation of the lesson, we chose

items that had significant loadings on the first factor, making sure to get items from each of

the related subscales. We also limited this factor to 5 items to match the content knowledge

factor in size in order to keep this factor from dominating the total scale score.

25

Draft

Table 3

aRTOP Items and Design

Inquiry Orientation Content Propositional Knowledge1. The lesson was designed to engage

students as members of a learningcommunity.

6. The lesson involved fundamentalconcepts of the subject.

2. Intellectual rigor, constructive crit-icism, and the challenging of ideaswere valued.

7. The lesson promoted strongly coher-ent conceptual understanding.

3. This lesson encouraged students toseek and value alternative modes ofinvestigation or of problem solving.

8. The teacher had a solid grasp of thesubject matter content inherent inthe lesson.

4. Students made predictions, estima-tions and/or hypotheses and devisedmeans for testing them.

9. Elements of abstraction (i.e., sym-bolic representations, theory build-ing) were encouraged when it wasimportant to do so.

5. The teacher acted as a resource per-son, working to support and enhancestudent investigations.

10. Connections with other content dis-ciplines and/or real world phenom-ena were explored and valued.

Mathematics Classroom Observation Protocol for Practices

The science specific language of the RTOP is a major disadvantage when used to ob-

serve mathematics classroom. This, along with the need for an observation protocol that is

supported in recent standards, led to the design of the Mathematics Classroom Observation

Protocol for Practices (MCOP2). The MCOP2 is designed to be implemented in K-16 math-

ematics classrooms to measure the degree to which a mathematics classroom aligns with the

Standards for Mathematical Practice from the Common Core State Standards in Mathemat-

ics (National Governors Association Center for Best Practices, Council of Chief State School

Officers, 2010); “Crossroads” and “Beyond Crossroads” from the American Mathematical

Association of Two-Year Colleges (American Mathematical Association of Two-Year Colleges

(AMATYC), 1995, 2004); the Committee on the Undergraduate Program in Mathematics

Curriculum Guide from the Mathematical Association of America (Barker et al., 2004) ;

26

Draft

and the Process Standards of the National Council of Teachers of Mathematics (National

Council of Teachers of Mathematics, 2000).

A test of the content was conducted with 164 identified experts in mathematics teaching

education. The first survey provided feedback on the initial 18 MCOP2 items and their

usefulness in measuring various components of mathematics classrooms (Gleason, Livers, &

Zelkowski, 2017). Over 94% of the experts rated the items as either “essential” or “not

essential, but useful.” rather than “not useful.” After adjusting the MCOP2 items based on

the expert feedback, a second survey was conducted with 26 experts that agreed to provide

additional information. This survey provided the experts with more details about each item,

the theoretical constructs, and the intended purpose of the MCOP2. Gleason, Livers, and

Zelkowski (2017) report all items were retain with mininal revisions, because they all loaded

on at least one of the factors. With the information gained from the experts, the structure

of the MCOP2 instrument was revised.

A pilot study was conducted by a graduate student in mathematics and a mathemat-

ics professor at a large southern university to determine if the data collected aligned with

the theoretical constructs and the verification of the expert survey. Based upon instructor

approval, 36 classrooms with 28 different instructors were observed throughout a semester.

The instructors varied widely from graduate teaching assistants to tenured professors. The

classes they taught also varied from college algebra to upper division mathematics.

The MCOP2 that was used in the pilot study was initially designed to measure three

primary components; student engagement; lesson design and implementation; and class cul-

ture and discourse. Seventeen items of the original eighteen item with full descriptions are

used to measure these three components. Student Engagement contained Items 1-5, Lesson

Content contained Items 6-11 and Classroom Culture and Discourse contained Items 12-17

(Gleason & Cofer, 2014).

After all the data was collected, Gleason and Cofer conducted exploratory factor anal-

ysis (EFA) and classical test theory analysis with some unexpected results. The orginal

27

Draft

assumption of three components was reexamined after a low eigenvalue was found for the

third factor. Gleason and Cofer report that a factor matrix of a potential 3 Factor Model

indicated Student Engagement and Classroom Culture and Discourse were both loading on

the same factor. These two were combined to create Student Engagement and Classroom

Discourse. The 2 Factor Model explained over 50% of the total variance.

Cronbach’s alpha was also calculated for the entire protocol and both factors. The entire

protocol had a Cronbach’s alpha of 0.898. The sub-scales of “Lesson Content” and “Student

Engagement and Classroom Discourse” had Cronbach’s alpha reliabilities of 0.779 and 0.907,

respectively. Gleason and Cofer (2014) state, “the internal reliabilities are high enough for

both sub-scales and the entire instrument to be used to measure at the group level, either

multiple observations of a single classroom or single observations of multiple classrooms” (p.

99). The overall high alpha coefficient demonstrates that MCOP2 is measuring something,

and the EFA clearly produces a 2 factor model of “Lesson Content” and “Student Engage-

ment and Classroom Discourse”. Overall this pilot study was very promising, but it was

truly in its beginning stage.

Gleason, Livers, and Zelkowski (2017) also conducted the inter-rater reliability of the

instrument to look at the response processes. Five raters were chosen with a variety of

educational and professional backgrounds. Two of the raters have doctorates in mathematics

education, one rater has a doctorate in mathematics and is heavily involved in mathematics

education, one rater is a mathematics specialist that works with secondary teachers and has

taught at both the secondary and introductory college level, and the fifth rater is a graduate

student in mathematics with minimal background in education other than teaching some

introductory college math classes.

Five different classroom videos were scored by the five raters. All were give the detailed

descriptors of the items with the rubric prior to the viewing of the videos. All videos were

watched independently by each rater, and no formal training was conducted. To make sure

there was a good representation of different levels of students and instructors, one video was

28

Draft

Table 4

Brief Description of MCOP2 items

1. Students engaged in exploration/investigation/problem solving.2. Students used a variety of means (models, drawings, graphs, concrete materials,

manipulatives, etc.) to represent concepts.3. Students were engaged in mathematical activities.4. Students critically assessed mathematical strategies.5. Students persevered in problem solving.6. The lesson involved fundamental concepts of the subject to promote relational/

conceptual understanding.7. The lesson promoted modeling with mathematics.8. The lesson provided opportunities to examine mathematical structure. (symbolic

notation, patterns, generalizations, conjectures, etc.)9. The lesson included tasks that have multiple paths to a solution or multiple solutions.10. The lesson promoted precision of mathematical language.11. The teachers talk encouraged student thinking.12. There were a high proportion of students talking related to mathematics.13. There was a climate of respect for what others had to say.14. In general, the teacher provided wait-time.15. Students were involved in the communication of their ideas to others (peer-to-peer).16. The teacher uses student questions/comments to enhance mathematical understand-

ing.

chosen from each of K-2, 3-5, 6-8, 9-12, and undergraduate. Gleason, Livers, and Zelkowski

(2017) used the sub-scale score to calculate the intra-class correlation (ICC) and report that

the inter-rater reliability was within acceptable levels.

The MCOP2 was revised after the second round of external experts to include only 16

items (Table 4) measuring the two primary constructs of teacher facilitation, focusing on the

interactions that are primarily dependent upon the teacher, and student engagement, focus-

ing on the interactions that are primarily dependent upon the students. The MCOP2 needs

to be extended to other mathematics classrooms at multiple higher education institutions.

The type of institution needs to be diversified to include community colleges, liberal arts

schools, and other research universities. This expansive review, including multiple institu-

tions, will yield much more significant results to be used in future analysis.

29

Draft

CHAPTER 3

METHODS

Aim of Study

The aim of this study was to investigate the structural validity and reliability of the

abbreviated Reformed Teaching Observation Protocol and the Mathematic Classroom Ob-

servation Protocol for Practices in the setting of undergraduate mathematics classrooms.

With the goal of answering the following research questions:

1. What are the internal structures of the Mathematics Classroom Observation Protocol

for Practices (MCOP2) and the abbreviated Reformed Teaching Observation Protocol

(aRTOP) for the population of undergraduate mathematics classrooms?

2. What are the internal reliabilities of the subscales of the Mathematics Classroom Ob-

servation Protocol for Practices (MCOP2) and the abbreviated Reformed Teaching Ob-

servation Protocol (aRTOP) with respect to undergraduate mathematics classrooms?

3. What are the relationships between the constructs measured by the Mathematics Class-

room Observation Protocol for Practices (MCOP2) and the abbreviated Reformed

Teaching Observation Protocol (aRTOP)?

Sample Description

The procedure for selecting the population is a crucial step in any study. Since the

study used Structural Equation Modeling (SEM) in the analysis, we aimed for a sample size

of 150-200 classroom observations at a variety of institutions (Kline, 2011; Weston & Gore,

2006) with a final sample size of 110 classroom observations.

30

Draft

Although this was on the smaller end of the sample size of 100-150 originally intended,

the literature supports a smaller sample size when necessary. In a Monte Carlo study con-

ducted by Boomsma (1982), she found the widely cited recommendation for sample size to

be at least 100, but 200 was desirable. In the study conducted by Marsh, Hau, Balla, &

Grayson (1998), it was found that a sample size of 100 was sufficient when there were at least

four items per factor and more was better. Ding, Velicer, & Harlow (1995) recommends a

minimum of 3 indicators per factor and a minimum sample size of 100. Schumacker & Lomax

(2016) suggest a sample size of 100 to 150 for small models with well-behaved data. You can

also find studies that suggest 5 or 10 observations per estimated parameter (Bentler & Chou,

1987) and 10 cases per variable (Nunnally, 1978, p. 355). These rules are convenient, but

do not take into account the specifics of the model and may lead to over or underestimation

of the minimum sample size. Structural Equation Modeling (SEM) flexibility makes it hard

to create generalizations on the sample size required.

Although there are numerous studies of sample size, the study conducted by Wolf is most

analogous to our study. Wolf, Harrington, Clark, & Miller (2013) used the Monte Carlo data

simulation techniques to evaluate sample size requirements using maximum likelihood (ML)

estimator. The study compares a Confirmatory Factor Analysis (CFA) conducted with

different number of factors, indicators, and loadings to see what the minimum sample size

is required to “achieve minimal bias, adequate statistical power, and overall propriety of a

given model”(p. 920). The two-factor model with 6 to 8 indicators is most closely aligned

with our study, because the MCOP2 is a two-factor model with 9 indicators per factor and

the aRTOP is a two-factor model with 5 indicators per factor.

Although increasing the number of latent variables in the model resulted in an increased

minimum sample size, models required a smaller sample size when there were more indicators

per factor and stronger factor loadings. According to Wolf et. al (2013), a two-factor model

with 6 indicators required a sample size of 120, and 100, at factor loadings of .65, and .80,

respectively. Similarly, a two-factor model with 8 indicators required a sample size of 120,

31

Draft

and 90, at factor loadings of .65, and .80, respectively. Wolf et. al (2013) conclude that a

“one size fits all” approach has problems because of the variability in SEM.

This study used the non-probability sampling method of convenience sampling in or-

der to reduce the relative travel cost and time required to achieve the sample size desired

(Johnson & Christensen, 2014). The chosen sample, to a large degree, represents the general

population of undergraduate mathematics classrooms, because the sample includes a large

number of classroom observations with a wide variety in class type, class size, institution,

demographics, etc.

The investigators observed 110 college mathematics classrooms at the undergraduate

level, with consent of the instructor, representing a wide variety of college and university

classrooms with faculty members of the classrooms observed range in age from 22 and up,

with a mixture of gender and ethnic backgrounds.

The American Mathematical Society’s Annual Survey of the Mathematical Sciences pro-

vides a way to group colleges and universities into two distinct classifications based upon the

highest mathematics degree offered at the institution: doctorate-granting universities and

master’s and baccalaureate colleges and universities. For our study, we will include three

large southern doctorate-granting universities with enrollments of approximately 18,000 to

35,000. The percent of full time students range from 64 to 85 and the percent of undergrad-

uate students range from 62 to 85. These universities are comprised of approximately 49 to

60 percent female students. All three of these universities have students with a wide variety

of ethnic backgrounds.

We also include eight southern masters and baccalaureate colleges and universities with

enrollments between 1,100 to 15,000 students. The percent of full time students range from

43 to 90 and the percent of undergraduate students range from 45 to 100. These colleges

and universities are comprised of approximately 50 to 72 percent female students. All twelve

colleges and universities have students with a wide variety of ethnic backgrounds.

32

Draft

We purposefully chose the institutions in this study to avoid atypical demographics.

Any institution with a high representation of one specific demographic was excluded. For

example, student populations composed exclusively or almost exclusively of women were

excluded from this study because it does not represent a typical college population. With

these selections of institutions, approximately 50-70 mathematics classrooms at each category

of institution are included in the study to overcome any potential bias due to the convenience

sampling (Johnson & Christensen, 2014). The actual classrooms chosen for observation were

selected from faculty members at each institution who elected to participate in the study.

The college and universities in this study were chosen to avoid the overrepresentation or

underrepresentation of a specific group, recognizing we have no control over the instructors

who chose to participate. We had some instructors that did not respond or chose not to

participate in this study. The main concern was if this group of non-responders or non-

participants will affect the validity of our study results (Hartman, Fuqua, & Jenkins, 1986).

In the case of our study, there could be a substantial differences in the personal or professional

demographics of those who chose not to respond or participate. The self-selected nature of

this sample will most likely include instructors who have an interest in teaching and learning

issues (Hora & Ferrare, 2013a). For instance, teachers who have a student-centered classroom

were more likely to respond than teachers that hold direct lecture only.

Seventy-two mathematics faculty members agreed to participate in this study. Since

some instructors teach two or more completely different courses, a total of 110 observations

were conducted in the Spring 2016, Fall 2016, and Spring 2017 semester. Only 86 of the 110

observations have instructor demographics data, because 15 instructors did not complete

the demographics survey. Of the 110 classroom observations, 50 were taught by a female

instructor and 60 were taught by a male instructor. The instructors self-identified between

the ages of 18-65 and older, 2% were 18-24 years old, 48% were 25-34 years old, 17% were 35-

44 years old, 12% were 45-54 years old, 16% were 55-64 years old, and 5% were 65 years and

33

Draft

over. 13% identified as Asian/Pacific Islander, 5% identified as Black or African American,

2% identified as Hispanic American, and 80% identified as White/Caucasian.

Of the 86 classrooms on which we have full demographic data about the instructor, 13

were taught by a Graduate Teaching Assistant, 24 were taught by an Adjunct/Instructor,

24 were taught by an Assistant Professor, 8 were taught by an Associate Professor, and 17

were taught by a Full Professor. The instructors were asked to self-identified their highest

level of education, 2% had a Bachelor’s degree, 31% had a Master’s degree, 63% had a PhD,

and 3% had other advanced degree beyond a Master’s degree.

They were also asked to identify how many years they have taught at the high school

level and college level. Over 75% reported only teaching at the high school level for less than

one year. The range of years spent teaching at the college level varied, with 1% teaching for

less than one year, 27% teaching for 1-5 years, 31% teaching for 6-10 years, 13% teaching

for 11-15 years, and 28% teaching over 15 years. A complete list of instructor demographics

is included in Appendix B.

The use of convenience sampling is one of the limitations of this study requiring toler-

ances. Convenience sampling can lead to the under-representation or over-representation of

a particular group of the sample. Another sampling issue that needs to be accounted for

is the presence of outliers, since convenience sampling is particularly influenced by outliers.

Our sample was chosen to avoid including classroom observations likely to give us unusual

data by collecting a large sample from a diverse range of institutions based on enrollment

demographics and types of degrees offered that reasonably represent the larger population

of undergraduate institutions in the United States.

Another limitation of this study is observer bias. Unfortunately researchers are suscep-

tible to obtaining the results they want to find. Observer biases can be positive or negative.

These biases can be a product of personal experience, environment and/or social and cultural

conditioning. Reflexivity, self reflection by the researcher on their biases and predispositions,

is the key strategy for avoiding researcher bias (Johnson & Christensen, 2014). Although it

34

Draft

is not possible to remove this potential bias completely, the observer is aware of the influence

that these biases may cause and will make every effort to avoid their influence. Johnson &

Christensen (2014) comment, “Complete objectivity being impossible and pure subjectivity

undermining credibility, the researchers focus is on balance - understanding and depicting

the world authentically in all its complexity while being self-analytical, politically aware,

and reflexive in consciousness” (p. 420).

Instruments

From the review of the literature, we see there are many ways that we can evaluate

college mathematics instruction. It is impossible to include all the observation protocols

used to evaluate undergraduate classes and so two protocols were chosen to align with the

research questions, an abbreviated form of the Reformed Teaching Observation Protocol for

its widely known use and the Mathematics Classroom Observation Protocol for Practices for

its mathematics specific design.

Reformed Teaching Observation Protocol (RTOP) is a 25 item protocol designed to be

used for both science and mathematics classroom observations. Piburn et al. (2000) divide

the RTOP into 5 sub-scales in order to test the hypothesis that “Inquiry-Orientation” is

a major part of the structure of the RTOP. One of the sub-scales, procedural pedagogic

knowledge, is a very high predictor of the total score with an R-squared of 0.971, and thus

97.1% of the variance accounted for by the predictor. This result, along with an exploratory

factor analysis finding two strong factors, solidified our idea that an abbreviated version of

the RTOP would produce a similar amount of information as the full instrument. This led

to the creation of the abbreviated Reformed Teaching Observation Protocol (aRTOP) to be

used in this study. (See Table 3 and Appendix C.2)

The theoretical structure of the aRTOP is depicted in Figure 1 and is based on the

results of the RTOP. Inquiry Orientation and Content Propositional Knowledge are the two

theoretical constructs that will be measured with the 10 observed variables. A double arrow

35

Draft

accounts for these two constructs being correlated. The model also contains a stochastic

error term accounting for the influence of unobserved factors.

Figure 1: Theoretical Model of aRTOP

The other observation protocol that we will focus on is the Mathematics Classroom

Observation Protocol for Practices (MCOP2). It is designed to be implemented in K-16

mathematics classrooms to measure mathematics classroom interactions. The MCOP2 mea-

sures the two primary constructs: teacher facilitation and student engagement. Sixteen

items with full descriptions are used to measure these two components. The validity and

reliability of the Mathematics Classroom Observation Protocol for Practices (MCOP2) has

been assessed in numerous ways. A survey of 164 experts in mathematics education was

conducted to test the content of the MCOP2. The results from this survey and a second

follow up survey were used to revise the original 18 MCOP2 items to 16 items. Inter-rater

reliably was also calculated with a panel of five raters of various backgrounds without any

formal training. This resulted in the intra-class correlation (ICC) of 0.669 for the Teacher

36

Draft

Facilitation Sub-scale and 0.616 for the Student Engagement Sub-scale (Gleason et al., 2017).

(See Table 4 and Appendix C.3)

The theoretical structure of the MCOP2 is depicted in Figure 2. There are two theoretical

constructs, student engagement and teacher facilitation that will be measured with the 16

observed variables. The double arrows between the two theoretical constructs represent the

correlation of these two factors. The model also included residual error terms to account for

the unmeasured variation in the model.

Figure 2: Theoretical Model of MCOP2

37

Draft

Procedures

After receiving approval from the University of Alabama Institutional Review Board

(IRB), we began the recruitment process of the selected institutions of higher education

through their local Institutional Review Boards. Upon approval of the institution to par-

ticipate, an email was sent to all undergraduate mathematics instructors at that institution

informing them that participation in this study is strictly voluntary, but we would like to

observe a class they teach to understand the current status of mathematics instruction at

the undergraduate level. The email will also explain that there is no foreseen risk associated

with this study and no individual benefits for the participants. For those that agree to al-

low us to observe their classrooms, we will confirm a classroom observation at the teacher’s

discretion.

The instructors of the mathematics courses were only asked to allow the investigators

to observe their classroom in order to complete the observation protocol forms. The time

commitment for the participants is to allow the researchers to observe one class period

(usually 50 or 75 minutes) during the Spring 2016, Fall 2016 and Spring 2017 semester, plus

about 5 minutes to read and complete the consent form and demographics form. There are

no other responsibilities for the participant.

For each classroom observation, the investigator arrived early and sat in a seat near the

back of the classroom. The goal as the observer was to blend in with the surroundings so the

students and instructor were not disturbed. The observer completed the Note Taker Form

(See Appendix C) during the lecture. At the conclusion, the observer used the information

collected on the Note Taker Form to complete both protocols. The observer alternated the

order the observation protocols were completed to avoid any bias that might be created.

Each classroom observation was given a number (1-200) that corresponds to the sequence

in which it was completed. Classroom observations labeled with an odd number (1,3,5,...)

were categorized as A indicating that the aRTOP protocol will be completed first. Classroom

38

Draft

observations labeled with an even number (2,4,6,...) were characterized as B indicating that

the MCOP2 protocol was completed first. This process was repeated until both protocols

were collected for all classroom observations.

Once all data was collected, we tested the theoretical structure for the MCOP2 and

the aRTOP using the statistical language and environment, R. All results are reported in

the aggregate to protect the confidentiality of the teachers. We use the linear regression

coefficients and the fit statistics we gain from the Confirmatory Factor Analysis (CFA) to

test the internal structure of the MCOP2 and the aRTOP with respect to undergraduate

mathematics classrooms. Cronbach’s alpha are used to test internal reliability of the MCOP2

and the aRTOP with respect to undergraduate mathematics classrooms. We use regression

to assess the association between the constructs measured by the MCOP2 and the aRTOP.

Although Hu & Bentler (1999) “rule of thumb” cutoff criteria for fit indexes are widely

used today, Marsh, Hua, & Wen (2004) warns of the overgeneralization of Hu and Bentlers

findings. Schermelleh-Engel et al. (2003) includes a table of recommendations for model

evaluation, but suggests it is clear that these cutoff criteria should not be taken too seriously.

Table 5 provided an overview of some of the rules of thumb. Hu & Bentler (1998) suggest

fit indices can be effected by model misspecification, small-sample bias, effects of violation

of normality and independence, and estimation methods. It is safe to conclude that a model

may fit the data, but having one or more fit measures suggest bad fit.

The acceptance level of Cronbach’s alpha depends upon the instrument being used in

the early stages of research, for basic research tools, or as a scale of an individual for a

clinical situation (Nunnally, 1978). Alpha values of .7 to .8 are regarded as satisfactory in

our case, because we are not looking at the scale of the individual, but with such values, the

instruments should be used for preliminary research to guide further understanding of the

constructs (Bland & Altman, 1997; Nunnally, 1978). According to Streiner (2003), Nunnally

was correct for acceptable alpha for research tools, but warns of an alpha over .90 most likely

indicates unnecessary redundancy.

39

DraftTable 5

Recommendations for Model Evaluation: Some Rules of Thumb

Fit Measure Good Fit Acceptable Fit

χ2/df 0 ≤ χ2/df ≤ 2 2 < χ2/df ≤ 3

RMSEA 0 ≤ RMSEA ≤ .05 .05 ≤ RMSEA ≤ .08

SRMR 0 ≤ SRMR ≤ .05 .05 < SRMR ≤ .10

CFI .97 ≤ CFI ≤ 1.00 .95 ≤ CFI < .97

GFI .95 ≤ GFI ≤ 1.00 .90 ≤ GFI < .95

Schermelleh-Engel, Moosbrugger, & Muller (2003)

40

Draft

CHAPTER 4

RESULTS

Internal Structure

A confirmatory factor analysis (CFA) was conducted on the data gathered to analyze

the internal structure of the Mathematics Classroom Observation Protocol for Practices

(MCOP2) and the abbreviated Reformed Teaching Observation Protocol (aRTOP) for the

population of undergraduate mathematics classrooms using R version 3.3.0 (2016) with the

lavaan package (Rosseel, 2012). CFA allows us to examine the relationship between the

observed variables and their underlying latent constructs for both the MCOP2 and the

aRTOP. The analysis of the fit indices for the CFA allows us to inspect the model fit for

both observation protocols.

As mentioned earlier, the aRTOP has two theoretical constructs, Inquiry Orientation

and Content Propositional Knowledge. Measured with the 10 observed variables. Items x1,

x2, x3, x4, and x5 load on Inquiry Orientation and x6, x7, x8, x9, and x10 load on Content

Propositional Knowledge. The aRTOP model with standardized factor loadings as well as

the standardized variance and covariance are included in Figure 3, with the factor loadings

relatively high for eight of the items. The goodness of fit indices for the aRTOP reveal almost

acceptable fits (χ2/df =3.478, RMSEA=.150, SRMR=.115, GFI=.831, and CFI=.820).

Although some indicator variables were low and modification indices existed, theory led

to the inclusion of these observed variables. For example, item x10, “connection with other

content disciplines and/or real world phenomena were explored and valued,” in the aRTOP

has a factor loading of .03. This tells 3% of the variance in Content Propositional Knowledge

is explained by item x10. Although this is low and modification indices suggest removal of

41

Draft

Figure 3: Confirmatory Factor Analysis Results: aRTOP

this item, theory tells us that content being connected to the real world or other disciplines is

important to Content Propositional Knowledge. Schermelleh-Engel, Moosbrugger, & Muller

(2003) support this idea stating, “one should never modify a model solely on the basis of

modification indices, although the program might suggest to do so” (p. 61).

The MCOP2 has two constructs, Student Engagement and Teacher Facilitation mea-

sured by 16 items. Items y1, y2, y3, y4, y5, y12, y13, y14, and y15 load on Student

Engagement, while items y4, y6, y7, 78, y9, y10, y11, y13, and y16 load on Teacher Facil-

itation. The MCOP2 model with standardized factor loadings as well as the standardized

variance and covariance are included in Figure 4, with the standardized loadings relatively

high for most items. The goodness of fit indices for the MCOP2 reveal an acceptable fit

for two indices (χ2/df =1.185 and SRMR=.078), and an almost acceptable fit for the other

indices (RMSEA=.094, GFI=.805, and CFI=.895). See Table 5 for recommendations for

model evaluation.

42

Draft

Figure 4: Confirmatory Factor Analysis Results: MCOP2

Internal Reliability

Cronbach’s alpha (1951) was calculated to analyze the internal reliability of the Math-

ematics Classroom Observation Protocol for Practices (MCOP2) and the abbreviated Re-

formed Teaching Observation Protocol (aRTOP) with respect to undergraduate mathematics

classrooms using R version 3.3.0 (2016) with the Rcmdr package (Fox, 2005, 2017; Fox &

Bouchet-Valat, 2017).

The alpha values for the subscales of the aRTOP, were .753 for the Inquiry Orientation

Subscale and .605 for the Content Propositional Knowledge Subscale. The Cronbach‘s alpha

43

Draft

for the first subscale is near the satisfactory range for basic research given by Nunnally (1978,

p. 245-246), while the second subscale is in the range for preliminary research.

Similarly, the Cronbach’s alpha values for the subscales of the MCOP2 were .888 for

the Student Engagement Subscale and .812 for the Teacher Facilitation Subscale. Both of

these subscales are therefore in the satisfactory range for basic research (Nunnally, 1978, p.

245-246) and are near acceptable levels for individual measurement.

Relationship between the Constructs

Simple Linear Regression analysis was conducted to estimate the relationship between

the constructs measured by the Mathematics Classroom Observation Protocol for Practices

(MCOP2) and the abbreviated Reformed Teaching Observation Protocol (aRTOP). Before

we could conduct linear regression we must first check the linear regression assumptions. The

assumptions that must be satisfied are (a) linearity of the model is good, (b) distribution of

the error has constant variance (homoscedasticity), (c) the errors are normally distributed,

(d) independent variables are determined without error, and (e) errors are independent

(Mathews, 2005).

Weisberg (Weisberg, 2005) suggests plots of residuals with other quantities are useful in

finding failures of assumptions. The residual plot of Regression Model 1 (See Figure 5) has

been included below to help aid in the discussion. You will find a complete list of all residual

plots in Appendix D for each of the models. The first plot “Residuals versus Fitted” and

the second plot “Normal Q-Q” are most useful in simple regression to determine if these

assumptions are met. Notice in the “Residual verses Fitted” plot there is no pattern and the

red line is fairly flat. This implies we have meet assumption of linearity and homoscedasticity.

In the “Normal Q-Q” plot, we see the points lie on the diagonal line pointing to normal

distribution. The last two assumptions are satisfied by the data collection and study design.

A simple linear regression was calculated to predict Regression Model 1: Student En-

gagement based on Inquiry Orientation (See Figure 6 in Appendix D). A significant regres-

44

Draft

Figure 5: Residual Plots of Regression Model 1

sion equation was found (F(1,108)=271.8, p < .001), with an R2 of .716 and adjusted R2

of .731. Roughly 72% of the variation in Student Engagement can be explained by Inquiry

Orientation. The linear regression equation predicted

(Student Engagement) = 6.671 + 1.11(Inquiry Orientation).

Student Engagement increased 1.11 for each one point increase in Inquiry Orientation.

Strongly correlated is defined by (Cohen, 1988) as Pearsons Product-Moment Correlation of

|r| > .5. Based on the results of the study, Student Engagement is strongly and positively

related to Inquiry Orientation with a r = .846, p < .001.

For Regression Model 2: Student Engagement based on Content Propositional Knowl-

edge (See Figure 7 in Appendix D) a simple linear regression was calculated. A significant

regression equation was found (F(1,108)= 44.8, p < .001), with an R2 of .293 and adjusted R2

of .287. Roughly 29% of the variation in Student Engagement can be explained by Content

45

Draft

Propositional Knowledge. The linear regression equation predicted

(Student Engagement) = 3.93 + 0.957(Content Propositional Knowledge).

Student Engagement increased 0.957 for each one point increase in Content Propositional

Knowledge. Based on the results of the study, Student Engagement is strongly and positively

related to Content Propositional Knowledge with a r= .541, p < .001.

For Regression Model 3: Teacher Facilitation based on Inquiry Orientation (See Figure

8 in Appendix D) a simple linear regression was calculated. A significant regression equation

was found (F(1,108)= 213.6, p < .001), with an R2 of .664 and adjusted R2 of .661. Roughly

66% of the variation in Teacher Facilitation can be explained by Inquiry Orientation. The

linear regression equation predicted

(Teacher Facilitation) = 8.73 + .926(Inquiry Orientation).

Teacher Facilitation increased .926 for each one point increase in Inquiry Orientation. Based

on the results of the study, Teacher Facilitation is strongly and positively related to Inquiry

Orientation with a r =. 815, p < .001.

Also a simple linear regression was calculated to predict Regression Model 4: Teacher

Facilitation based on Content Propositional Knowledge (See Figure 9 in Appendix D). A

significant regression equation was found (F(1,108)= 142.4, p < .001), with an R2 of .569 and

adjusted R2 of .565. Roughly 57% of the variation in Teacher Facilitation can be explained

by Content Propositional Knowledge. The linear regression equation predicted

(Teacher Facilitation) = 1.86 + 1.15(Content Propositional Knowledge).

46

Draft

Teacher Facilitation increased in 1.15335 for each one point increase in Content Propositional

Knowledge. Based on the results of the study, Teacher Facilitation is strongly and positively

related to Content Propositional Knowledge with a r =. 75, p < .001.

To predict Regression Model 5: Inquiry Orientation based on Content Propositional

Knowledge (See Figure 10 in Appendix D) a simple linear regression was calculated. A

significant regression equation was found (F(1,108)= 48.8, p < .001), with an R2 of .311 and

adjusted R2 of .305. Roughly 31% of the variation in Inquiry Orientation can be explained

by Content Propositional Knowledge. The linear regression equation predicted

(Inquiry Orientation) = −1.05 + .751(Content Propositional Knowledge).

Inquiry Orientation increased .751 for each one point increase in Content Propositional

Knowledge. Based on the results of the study, Inquiry Orientation is strongly and positively

related to Content Propositional Knowledge with a r= .558, p < .001.

A simple linear regression was calculated to predict Regression Model 6: Student Engage-

ment based on Teacher Facilitation (See Figure 11 in Appendix D). A significant regression

equation was found (F(1,108)= 217.3, p < .001), with an R2 of .668 and adjusted R2 of

.665. Roughly 67% of the variation in Student Engagement can be explained by Teacher

Facilitation. The linear regression equation predicted

(Student Engagement) = .462 + .945(Teacher Facilitation).

Student Engagement increased .945 for each 1 point increase in Teacher Facilitation. Based

on the results of the study, Student Engagement is strongly and positively correlated to

Teacher Facilitation with a r = .817, p < .001. A complete list of all the simple linear

regression results can be found in Appendix ??.

In summary, we found linear regression models for each pair of constructs based on the

Residual Plots meeting the linear regression assumptions (For a summary of the results,

47

Draft

see Table 6). A stronger amount of the variance was explained in the Regression Model 1,

Regression Model 3, and Regression Model 6. Content Propositional Knowledge was the

common construct between the models that had lower variance explained. The F-statistic

supports these findings with values greater than 200 for Regression Model 1, Regression

Model 3, and Regression Model 6. In Table 7 we see the correlations of over .80 correspond

to Regression Model 1, Regression Model 3, and Regression Model 6.

Table 6

Simple Linear Regression Results

Model PredictorStandardizedRegressionCoefficient

t valuedf=108

R2 F-statistic

Regression Model 1(Intercept)

inquiry6.67 ***1.11 ***

10.3616.49

.716 271.8


content3.93 *.957 ***

2.076.69

.293 44.8


inquiry8.73 ***.926 ***

14.4114.61

.664 213.6


content1.861.15 ***

1.4511.93

.569 142.4


content-1.05.751 ***

-0.736.99

.311 48.8

Regression Model 6(Intercept)facilitation

.462

.945 ***0.4214.74

.668 271.3

* p<0.05, ** p<.01, *** p<.001Regression Model 1: Student Engagement and Inquiry Orientation,Regression Model 2: Student Engagement and Content Propositional Knowledge,Regression Model 3: Teacher Facilitation and Inquiry Orientation,Regression Model 4: Teacher Facilitation and Content Propositional Knowledge,Regression Model 5: Inquiry Orientation and Content Propositional Knowledge,Regression Model 6: Student Engagement and Teacher Facilitation

48

DraftTable 7

Pearson’s Product-Moment Correlation

Inquiry Content Engagement FacilitationInquiry -Content .5578761 -Engagement .8459433 .5414396 -Facilitation .8149469 .7540780 .8173191 -p < .001 for all correlations.

49

Draft

CHAPTER 5

DISCUSSION

The improvement of Science, Technology, Engineering, and Mathematics (STEM) un-

dergraduate education is on the minds of faculty and staff at colleges and universities around

the United States. Every day we, as educators, are challenged by our departments and uni-

versities to make advances in the classroom, but how do we know if the changes we make

positively impact our students. Peer evaluation, student evaluations, and assessment of your

portfolio are the primary methods of formative and summative assessment instructors have

to evaluate their classroom. Although each of these methods are useful, they can be rid-

dled with subjective information that can skew the window into what is happening in an

undergraduate classroom.

Observation protocols like the Mathematics Classroom Observation Protocol for Prac-

tices (MCOP2) and the abbreviated Reformed Teaching Observation Protocol (aRTOP) are

a more objective way for an instructor to analyze their classroom. Before these observa-

tion protocols could be brought into the classroom with confidence, a study needed to be

conducted to examine both the aRTOP and the MCOP2. Although this study needs to be

repeated and extended to further validate the use of observation protocols in the classroom,

the findings have led to some conclusions on the internal structure, internal reliability, and

the relationship between the constructs measured by the observation protocols.

Study Limitations

While the current study provides useful information, there are several limitations that

must be mentioned. The use of convenience sampling is one limitation of this study. This

50

Draft

sampling technique was unavoidable because of time and financial constraints. One major

concern with the use of a convenience sample is the inclusion of outliers that may skew the

data. Our sample was chosen to avoid including classroom observations likely to give us

unusual data. Every effort was made to include colleges and universities that are from a

diverse range of institutions based on enrollment demographics and types of degrees offered

that reasonably represent the larger population of undergraduate institutions in the United

States.

Positive or negative observer bias is another limitation of this study. Reflexivity was

used by the observer as outlined by Johnson & Christensen (2014). The observer spent

time reflecting about her own biases and predispositions to include a strategy for avoidance.

Although it is not possible to remove the potential for biases completely, the observer made

a conscious effort to evade its influences.

Another limitation to this study is the effect of sample size on fit indices. The studies

conducted by Hu and Bentler (1999, 1998) shows how different fit indices are affected by

sample size with a true-populations and mis-specified models. As an unavoidable limitation,

we were careful when using the fit indices to decide if a model was supported by the data.

Conclusion

Confirmatory Factor Analysis (CFA) was conducted on the data gathered to analyze

the internal structure of the Mathematics Classroom Observation Protocol for Practices

(MCOP2) and the abbreviated Reformed Teaching Observation Protocol (aRTOP) for the

population of undergraduate mathematics classrooms. Factor loadings for the aRTOP were

relatively high for eight of the items. Although two items of the aRTOP did not have

high factor loadings, we included these items in our final model because of the theoretical

support for what should be happening in an undergraduate mathematics classroom based

upon national recommendations. The aRTOP goodness of fit indices produced from the

CFA reveal an almost acceptable fit. The factor loadings for the MCOP2 were relatively

51

Draft

high for most items and the items that did not have high loadings were included because

of the theoretical support by undergraduate mathematics education knowledge of a typical

classroom. The goodness of fit indices reveal an almost acceptable fit for the MCOP2. Our

findings point to a more consistent internal structure for the MCOP2 than the aRTOP.

Therefore, the Confirmatory Factor Analysis supports the previous Exploratory Factor

Analysis on the MCOP2 (Gleason & Cofer, 2014). We can clearly see that the MCOP2 is a

two factor model with almost all observed variables having high factor loadings. The almost

acceptable fit indices show the measure of Student Engagement and Teacher Facilitation

are consistent with our theoretical understanding of the model. Although we would have

liked higher factor loadings and fit indices, we can still confirm the theoretical model for the

MCOP2.

The CFA for the abbreviated Reformed Teaching Observation Protocol (aRTOP) did not

alignment with the original design of the Reformed Teaching Observation Protocol (RTOP)

(Piburn & Sawada, 2000). We could see from the original design that a two factor model

with a reduced number of items would produce the same results. The factor loadings of the

current study support a two factor model with most observed variables having high factor

loadings. The almost acceptable fit indices show the measure of Inquiry Orientation and

Content Propositional Knowledge are somewhat consistent with our theoretical model of the

aRTOP. Although we would have liked higher factor loadings and fit indices, we can to some

extent confirm the theoretical model of the aRTOP.

To analyze the internal reliability, strength of that consistency, Cronbach’s alpha (1951)

was calculated for each subscale of both the Mathematics Classroom Observation Protocol for

Practices (MCOP2) and the abbreviated Reformed Teaching Observation Protocol (aRTOP)

with respect to undergraduate mathematics classrooms. Using Nunnally’s (1978) acceptable

range for Cronbach’s alpha, we were able to assess the alpha for each subscale. When we

examined the aRTOP, we found Inquiry Orientation to have satisfactory internal reliability,

and Content Propositional Knowledge was just outside the satisfactory range. We found

52

Draft

both Student Engagement and Teacher Facilitation to have satisfactory internal reliability

when we inspected the alpha.

Therefore, for each each subscale, the satisfactory internal reliability of the Mathematics

Classroom Observation Protocol for Practices ((MCOP2) demonstrates that the instrument

is measuring something and producing similar scores. When we look at each factor indi-

vidually, the Student Engagement part of the MCOP2 instrument successfully gauges the

role of the student in an undergraduate mathematics classroom and their engagement in the

classroom environment. The high internal reliability of the Teacher Facilitation part of the

MCOP2 indicate the instrument is also successfully measuring the role of the instructor in

creating the structure and guidance in the classroom.

Although the abbreviated Reformed Teaching Observation Protocol (aRTOP) did not

have as high of an internal reliability, we can still see from the moderately satisfactory

alphas that each subscale is measuring something and is somewhat consistent in its scores.

The satisfactory internal reliability of the first factor, Inquiry Orientation, indicates the

instrument is successfully measuring the role of the instructor to act as a resource person and

help foster a community of learners. The second factor, Content Propositional Knowledge,

is also doing a fairly satisfactory job at measuring the lessons attention to fundamental

concepts and conceptual understanding. The data analysis conclude the MCOP2 has more

internal reliability than the aRTOP.

Theoretically Inquiry Orientation, Content Propositional Knowledge, Student Engage-

ment, and Teacher Facilitation are related, but distint, with respect to undergraduate mathe-

matics classrooms. To validate this theory a Simple Linear Regression analysis was conducted

to estimate the relationship between the constructs measured by the Mathematics Classroom

Observation Protocol for Practices (MCOP2) and the abbreviated Reformed Teaching Obser-

vation Protocol (aRTOP). The relationship between (MCOP2) and (aRTOP) was also found

to be significant. The Pearson’s Product-Moment Correlations for each pair of constructs

were found to be strongly correlated.

53

Draft

Therefore, for the constructs that have highest correlation, we can make some strong

conclusions. Mathematically, we have found that the student engagement is directly re-

lated to the idea of an inquiry oriented classroom. Theoretically, only when the students

are engaged would an inquiry oriented classroom be possible. And conversely, an inquiry

oriented classroom means the students are actively engaged in the learning community. Sim-

ilarly, there is a high correlation between teacher facilitation and inquiry oriented classroom.

Without the instructor facilitation, a classroom could not be a community of learners and

the converse is also true. Since both student engagement and teacher facilitation are highly

correlated with inquiry orientation, it is not hard to see why mathematically we found that

student engagement and teacher facilitation are also strongly correlated. Theoretically, the

facilitation of the teacher leads to an engaged body of students and the converse also follows.

We noticed the subscale, Content Propositional Knowledge, was the common construct

between the regression models that had lower variance explained. This leads us to believe

that Content Propositional Knowledge is measuring something completely different from

the other subscales. Given the data shows the MCOP2 has a better internal structure

and internal reliability, we infer the content subscale of the aRTOP could be added to the

(MCOP2) and the aRTOP could be no longer necessary.

Despite some limitations to the current study, this study produced some important

findings. The internal structure of the aRTOP and MCOP2 were measured using the factor

loadings and goodness of fit indices. Both protocols had relatively high factor scores for

most items. The goodness of fit indices for both protocols were found to be almost in the

acceptable range. A decision was made not to modify the theoretical model because the

deletion of items from each protocol would lead to a decrease in the information gained

from the undergraduate mathematics classroom. The internal reliability of the aRTOP has

been found to be fairly satisfactory and the internal reliability of the MCOP2 has been

found to be highly satisfactory. We found a positive and strong correlation between each

pair of constructs with a higher correlation between subscales that do not contain Content

54

Draft

Propositional Knowledge. We found that the MCOP2 had a stronger internal structure and

internal reliability than the aRTOP. We also found that the theoretical relationships we had

assumed between each construct was supported by the linear regression we conducted.

Therefore, the support of the structure of the aRTOP allows us to feel somewhat con-

fident with what the protocol is measuring, but we find higher confidence in the support

for the structure of the MCOP2. The internal reliability was also found to be higher for

the MCOP2, pointing to the protocol consistency. A high or low observation protocol score

does not just happen by chance. The high correlation between subscales that do not in-

clude the subscale, Content Propositional Knowledge, tell us that it is reasonable to infer

that the two observation protocols are measuring the same classrooms the same way except

for the Content Propositional Knowledge subscale of the aRTOP. This leads us to believe

that Content Propositional Knowledge is measuring something completely different from the

other subscales. With confidence in what we are measuring with the MCOP2, consistency

in the MCOP2, and correlation among the subscales, we find support for the Mathematics

Classroom Observation Protocol for Practices MCOP2 as a useful assessment tool for under-

graduate mathematics classrooms with the addition of the Content Propositional Knowledge

subscale of the aRTOP.

Future Direction

Future research should seek to extend the current study to a broader sampling commu-

nity. Although the current sample size was adequate, a larger sample with more colleges

and universities included from a broader geographic region could lead to a deeper under-

standing of Mathematics Classroom Observation Protocol for Practices (MCOP2) and the

abbreviated Reformed Teaching Observation Protocol (aRTOP). Increasing the sample size

will allow the researcher to answer more comparative questions about the populations and

institutions included. For example, it would be interesting to compare how different types of

institutions preform with both observation protocols. With a larger sample size, you could

55

Draft

also compare how different job titles, highest level of education, genders, age, and years

of teaching relate to the constructs. Although we focused on undergraduate mathematics

education in this study, with a larger sample size you could look at how these observation

protocols preform at additional education levels as both protocols are designed to be used for

K-16. The applications of an extension of this study are limitless and would help contribute

to a better understanding of the undergraduate mathematics classroom.

56

Draft

References

Abrami, P. C. (2001). Improving judgments about teaching effectiveness using teacher rating

forms. New Directions for Institutional Research, 2001 (109), 59-87.

Abrami, P. C., & d’Apollonia, S. (1990). The dimensionality of ratings and their use in

personnel decisions. New Directions for Teaching and Learning , 1990 (43), 97-111.

Aleamoni, L. M. (1981). Student ratings of instruction. In J. Millman (Ed.), Handbook of

Teacher Evaluation (p. 110-145). Beverly Hills, CA: Sage.

Algozzine, B., Gretes, J., Flowers, C., Howley, L., Beattie, J., Spooner, F., . . . Bray, M.

(2004). Student evaluation of college teaching: A practice in search of principles.

College Teaching , 52 (4), 134-141.

Allen, J., Gregory, A., Mikami, A., Lun, J., Hamre, B., & Pianta, R. (2013). Observations

of effective teacher-student interactions in secondary school classrooms: Predicting

student achievement with the classroom assessment scoring system-secondary. School

Psychology Review , 42 (1), 76-98.

American Mathematical Association of Two-Year Colleges (AMATYC). (1995). Cross-

roads in mathematics: Standards for introductory college mathematics before calculus.

(D. Cohen, Ed.). Memphis, TN: American Mathematical Association of Two Year

Colleges.

American Mathematical Association of Two-Year Colleges (AMATYC). (2004). Beyond

Crossroads: Implementing mathematics standards in the first two years of college

(R. Blair, Ed.). Memphis, TN: American Mathematical Association of Two Year

Colleges.

57

Draft

Apodaca, P., & Grad, H. (2005). The dimensionality of student ratings of teaching: In-

tegration of uni-and multidimensional models. Studies in Higher Education, 30 (6),

723-748.

Ball, D. L., Thames, M. H., & Phelps, G. (2008). Content knowledge for teaching: What

makes it special? Journal of Teacher Education, 59 (5), 389-407.

Ballantyne, C. (2003). Online evaluations of teaching: An examination of current practice

and considerations for the future. New Directions for Teaching and Learning , 2003 (96),

103-112.

Barker, W., Bressoud, D., Epp, S., Ganter, S., Haver, B., & Pollatsek, H. (2004). Under-

graduate programs and courses in the mathematical sciences: CUPM curriculum guide,

2004. Washington, D.C.: Mathematical Association of America.

Bentler, P. M., & Chou, C.-P. (1987). Practical issues in structural modeling. Sociological

Methods & Research, 16 (1), 78–117.

Benton, S. L., & Cashin, W. E. (2012). Student ratings of teaching: A summary of research

and literature (IDEA Paper No. 50). The Idea Center. Retrieved 12/07/2015, from

http://ideaedu.org/wp-content/uploads/2014/11/idea-paper 50.pdf

Bernstein, D. J., Jonson, J., & Smith, K. (2000). An examination of the implementation of

peer review of teaching. New Directions for Teaching and Learning , 2000 (83), 73-86.

Bland, J. M., & Altman, D. G. (1997). Statistics notes: Cronbach’s alpha. BMJ , 314 (7080),

572.

Boomsma, A. (1982). The robustness of LISREL against small sample sizes in factor analysis

models. In K. G. Jreskog & H. Wold (Eds.), Systems under indirect observation:

Causality, structure, prediction (Vol. 1, pp. 149–173). North-Holland.

Bowes, A. S., & Banilower, E. R. (2004). LSC classroom observation study: An analysis of

data collected between 1997 and 2003. Chapel Hill, NC: Horizon Research, Inc.

Boyer Commission on Educating Undergraduates in the Research University. (1998). Rein-

venting undergraduate education: A blueprint for America’s research universities.

58

Draft

(Tech. Rep.). Stony Brook, NY: State University of New York at Stony Brook for

the Carnegie Foundation for the Advancement of Learning.

Bullock, C. D. (2003). Online collection of midterm student feedback. New Directions for

Teaching and Learning , 2003 (96), 95-102.

Burdsal, C. A., & Harrison, P. D. (2008). Further evidence supporting the validity of both a

multidimensional profile and an overall evaluation of teaching effectiveness. Assessment

& Evaluation in Higher Education, 33 (5), 567-576.

Burns, C. W. (2000). Teaching portfolios: Another perspective. Academe, 86 (1), 44-47.

Cashin, W. E. (1995). Student ratings of teaching: The research revisited (IDEA Paper

No. 32). The Idea Center. Retrieved 12/07/2015, from http://www.clemson.edu/

oirweb1/CourseEvalHelp/StudentRatingsResearch1995.pdf

Centra, J. A. (1993). Reflective faculty evaluation: Enhancing teaching and determining

faculty effectiveness. San Francisco: Jossey-Bass.

Centra, J. A. (2003). Will teachers receive higher student evaluations by giving higher grades

and less course work? Research in Higher Education, 44 (5), 495-518.

Centra, J. A. (2009). Differences in responses to the student instructional report: Is it bias?

Princeton, NJ: Educational Testing Service.

Centra, J. A., & Gaubatz, N. B. (2000). Is there gender bias in student evaluations of

teaching? The Journal of Higher Education, 71 (1), 17-33.

Chen, Y., & Hoshower, L. B. (2003). Student evaluation of teaching effectiveness: An

assessment of student perception and motivation. Assessment & Evaluation in Higher

Education, 28 (1), 71-88.

Cheung, D. (2000). Evidence of a single second-order factor in student ratings of teaching

effectiveness. Structural Equation Modeling , 7 (3), 442-460.

Clayson, D. E. (2009). Student evaluations of teaching: Are they related to what students

learn? A meta-analysis and review of the literature. Journal of Marketing Education,

31 (1), 16-30.

59

Draft

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,

N.J. : L. Erlbaum Associates.

Collins, J. W., & O’Brien, N. P. (2003). The Greenwood dictionary of education. Westport,

Connecticut: Greenwood Press.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika,

16 (3), 297–334.

d’Apollonia, S., & Abrami, P. C. (1997). Navigating student ratings of instruction. American

Psychologist , 52 (11), 1198-1208.

Davis, B. G. (2009). Tools for teaching (2nd ed.). San Francisco, CA: Jossy-Bass.

Dayton Regional STEM Center. (2011). Reformed Teaching Observation Protocol

(RTOP) with accompanying Dayton Regional STEM Center rubric. Retrieved

12/7/2015, from http://daytonregionalstemcenter.org/wp-content/uploads/

2012/09/rtop\ with\ rubric\ smp-1.pdf

Ding, L., Velicer, W. F., & Harlow, L. L. (1995). Effects of estimation methods, number

of indicators per factor, and improper solutions on structural equation modeling fit

indices. Structural Equation Modeling: A Multidisciplinary Journal , 2 (2), 119–143.

Dommeyer, C. J., Baum, P., Hanna, R. W., & Chapman, K. S. (2004). Gathering faculty

teaching evaluations by in-class and online surveys: Their effects on response rates and

evaluations. Assessment & Evaluation in Higher Education, 29 (5), 611-623.

Eiszler, C. F. (2002). College students’ evaluations of teaching and grade inflation. Research

in Higher Education, 43 (4), 483-501.

Ellis, J. F. (2014). Preparing future college instructors: The role of Graduate Student Teach-

ing Assistants (GTAs) in successful college calculus programs (Unpublished doctoral

dissertation). University of California, San Diego.

Ellis, L., Burke, D. M., Lomire, P., & McCormack, D. R. (2003). Student grades and average

ratings of instructional quality: The need for adjustment. The Journal of Educational

Research, 97 (1), 35-40.

60

Draft

Feldman, K. A. (1977). Consistency and variability among college students in rating their

teachers and courses: A review and analysis. Research in Higher Education, 6 (3),

223-274.

Feldman, K. A. (1978). Course characteristics and college students’ ratings of their teachers:

What we know and what we don’t. Research in Higher Education, 9 (3), 199-242.

Feldman, K. A. (1993). College students’ views of male and female college teachers: Part II

-Evidence from students’ evaluations of their classroom teachers. Research in Higher

Education, 34 (2), 151-211.

Feldman, K. A. (2007). Identifying exemplary teachers and teaching: Evidence from stu-

dent ratings. In R. P. Perry & J. C. Smart (Eds.), The Scholarship of Teaching and

Learning in Higher Education: An Evidence-Based Perspective (p. 93-143). Springer

Netherlands.

Flick, L. B., Sadri, P., Morrell, P. D., Wainwright, C., & Schepige, A. (2009). A cross

discipline study of reformed teaching by university science and mathematics faculty.

School Science and Mathematics , 109 (4), 197-211.

Fox, J. (2005). The R Commander: A basic statistics graphical user interface to R. Journal

of Statistical Software, 14 (9), 1–42. Retrieved from http://www.jstatsoft.org/v14/

i09

Fox, J. (2017). Using the R Commander: A point-and-click interface for R. Boca Raton

FL: Chapman and Hall/CRC Press. Retrieved from http://socserv.mcmaster.ca/

jfox/Books/RCommander/

Fox, J., & Bouchet-Valat, M. (2017). Rcmdr: R Commander [Computer software man-

ual]. Retrieved from http://socserv.socsci.mcmaster.ca/jfox/Misc/Rcmdr/ (R

package version 2.3-2)

Freeman, S., Eddy, S. L., McDonough, M., Smith, M. K., Okoroafor, N., Jordt, H., & Wen-

deroth, M. P. (2014). Active learning increases student performance in science, engi-

61

Draft

neering, and mathematics. Proceedings of the National Academy of Sciences , 111 (23),

8410–8415.

Gasiewski, J. A., Eagan, M. K., Garcia, G. A., Hurtado, S., & Chang, M. J. (2012).

From gatekeeping to engagement: A multicontextual, mixed method study of student

academic engagement in introductory STEM courses. Research in Higher Education,

53 (2), 229–261.

Gleason, J., & Cofer, L. D. (2014). Mathematics classroom observation protocol for practices

results in undergraduate mathematics classrooms. In T. Fukawa-Connelly, G. Karakok,

K. Keene, & M. Zandieh (Eds.), Proceedings of the 17th Annual Conference on Research

on Undergraduate Mathematics Education, 2014, Denver, CO (p. 93-103).

Gleason, J., Livers, S., & Zelkowski, J. (2015). Mathematics Classroom Observation

Protocol for Practices: Descriptors manual. Retrieved 12/7/2015, from http://

jgleason.people.ua.edu/mcop2.html

Gleason, J., Livers, S., & Zelkowski, J. (2017). Mathematics Classroom Observation Protocol

for Practices (MCOP2): A validation study. Investigations in Mathematics Learning ,

9 .

Grossman, P. L., Wilson, S. M., & Shulman, L. S. (1989). Teachers of substance: Sub-

ject matter knowledge for teaching. In M. Reynolds (Ed.), The Knowledge Base for

Beginning Teachers (p. 23-36). New York: Pergamon.

Hamermesh, D. S., & Parker, A. (2005). Beauty in the classroom: Professors’ pulchritude

and putative pedagogical productivity. Economics of Education Review , 24 (4), 369–

376.

Hartman, B. W., Fuqua, D. R., & Jenkins, S. J. (1986). The problems of and remedies

for nonresponse bias in educational surveys. The Journal of Experimental Education,

54 (2), 85–90.

Hatzipanagos, S., & Lygo-Baker, S. (2006). Teaching observations: Promoting development

through critical reflection. Journal of Further and Higher Education, 30 (4), 421-431.

62

Draft

Hill, H. C., Blunk, M. L., Charalambous, C. Y., Lewis, J. M., Phelps, G. C., Sleep, L., &

Ball, D. L. (2008). Mathematical knowledge for teaching and the mathematical quality

of instruction: An exploratory study. Cognition and Instruction, 26 (4), 430-511.

Hora, M. T. (2013). Exploring the use of the Teaching Dimensions Observation Protocol to

develop fine-grained measures of interactive teaching in undergraduate science class-

rooms (WCER Working Paper No. 2013-6). Retrieved 12/10/2015, from http://www

.wcer.wisc.edu/publications/workingpapers/Working Paper No 2013 06.pdf

Hora, M. T., & Ferrare, J. J. (2013a). Instructional systems of practice: A multidimensional

analysis of math and science undergraduate course planning and classroom teaching.

Journal of the Learning Sciences , 22 (2), 212–257.

Hora, M. T., & Ferrare, J. J. (2013b). A review of classroom observation techniques

in postsecondary settings (WCER Working Paper No. 2013-01). Wisconsin Center

for Education Research. Retrieved 12/7/2015, from http://www.wcer.wisc.edu/

publications/workingpapers/Working Paper No 2013 01.pdf

Hora, M. T., Oleson, A., & Ferrare, J. J. (2013). Teaching Dimensions Observation Pro-

tocol (TDOP) user’s manual. Madison, WI. Retrieved 12/7/2015, from http://

tdop.wceruw.org/

Hoyt, D. P., & Lee, E.-J. (2002). Basic data for the revised IDEA system (IDEA Tech-

nical Report No. 12). Retrieved 12/7/2015, from http://ideaedu.org/wp-content/

uploads/2014/11/techreport-12.pdf

Hu, L.-t., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity

to underparameterized model misspecification. Psychological Methods , 3 (4), 424.

Hu, L.-t., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure

analysis: Conventional criteria versus new alternatives. Structural Equation Modeling:

A Multidisciplinary Journal , 6 (1), 1–55.

63

Draft

Jackson, D. L., Teal, C. R., Raines, S. J., Nansel, T. R., Force, R. C., & Burdsal, C. A.

(1999). The dimensions of students’ perceptions of teaching effectiveness. Educational

and Psychological Measurement , 59 (4), 580-596.

Johnson, B., & Christensen, L. (2014). Educational research: Quantitative, qualitative, and

mixed approaches (5th ed.). Thousand Oaks, CA: Sage.

Keig, L., & Waggoner, M. D. (1994). Collaborative peer review: The role of faculty in

improving college teaching. Washington, D.C.: The George Washington University

School of Education and Human Development. (ASHE-ERIC Higher Education Report

No. 2.)

Kim, M. (2011). Differences in beliefs and teaching practices between international and US

domestic mathematics teaching assistants (Unpublished doctoral dissertation). The

University of Oklahoma.

Kline, R. (2011). Principles and practice of structural equation modeling (3rd ed.). New

York: Guilford Press.

Kohut, G. F., Burnap, C., & Yon, M. G. (2007). Peer observation of teaching: Perceptions

of the observer and the observed. College Teaching , 55 (1), 19-25.

Krautmann, A. C., & Sander, W. (1999). Grades and student evaluations of teachers.

Economics of Education Review , 18 (1), 59-63.

Kulik, J. A. (2001). Student ratings: Validity, utility, and controversy. New Directions for

Institutional Research, 2001 (109), 9-25.

Kung, D., & Speer, N. (2007). Mathematics teaching assistants learning to teach: Recast-

ing early teaching experiences as rich learning opportunities. In M. Oehrtman (Ed.),

Proceedings of the 10th annual Conference on Research in Undergraduate Mathematics

Education.

Laverie, D. A. (2002). Improving teaching through improving evaluation: A guide to course

portfolios. Journal of Marketing Education, 24 (2), 104-113.

64

Draft

Leung, D. Y., & Kember, D. (2005). Comparability of data gathered from evaluation

questionnaires on paper and through the internet. Research in Higher Education,

46 (5), 571-591.

Marsh, H. W. (1984). Students’ evaluations of university teaching: Dimensionality, relia-

bility, validity, potential baises, and utility. Journal of Educational Psychology , 76 (5),

707-754.

Marsh, H. W. (2001). Distinguishing between good (useful) and bad workloads on students’

evaluations of teaching. American Educational Research Journal , 38 (1), 183-212.

Marsh, H. W., Hau, K.-T., Balla, J. R., & Grayson, D. (1998). Is more ever too much? the

number of indicators per factor in confirmatory factor analysis. Multivariate Behavioral

Research, 33 (2), 181–220.

Marsh, H. W., Hau, K.-T., & Wen, Z. (2004). In search of golden rules: Comment

on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers

in overgeneralizing Hu and Bentler’s (1999) findings. Structural Equation Modeling ,

11 (3), 320–341.

Marsh, H. W., & Roche, L. A. (2000). Effects of grading leniency and low workload on

students’ evaluations of teaching: Popular myth, bias, validity, or innocent bystanders?

Journal of Educational Psychology , 92 (1), 202.

Marsh, H. W., & Ware, J. E. (1982). Effects of expressiveness, content coverage, and

incentive on multidimensional student rating scales: New interpretations of the Dr.

Fox effect. Journal of Educational Psychology , 74 (1), 126.

Marshall, J. C., Smart, J., & Horton, R. M. (2010). The design and validation of EQUIP:

An instrument to assess inquiry-based instruction. International Journal of Science

and Mathematics Education, 8 (2), 299-321.

Mathews, P. G. (2005). Design of experiments with MINITAB. Milwaukee, WI: ASQ Quality

Press.

McKeachie, W. J. (1979). Student ratings of faculty: A reprise. Academe, 65 (6), 384-397.

65

Draft

Mehdizadeh, M. (1990). Loglinear models and student course evaluations. The Journal of

Economic Education, 21 (1), 7-21.

Merritt, D. J. (2008). Bias, the brain, and student evaluations of teaching. St. John’s Law

Review , 82 (1), 235–287.

Michael, J. (2006). Where’s the evidence that active learning works? Advances in Physiology

Education, 30 (4), 159–167.

Morrell, P. D., Wainwright, C., & Flick, L. (2004). Reform teaching strategies used by

student teachers. School Science and Mathematics , 104 (5), 199-213.

National Council of Teachers of Mathematics. (2000). Principles and standards for school

mathematics. Reston, VA: National Council of Teachers of Mathematics.

National Governors Association Center for Best Practices, Council of Chief State School

Officers. (2010). Common Core State Standards Mathematics. Washington D.C.: Na-

tional Governors Association Center for Best Practices, Council of Chief State School

Officers. Retrieved 12/7/2015, from http://www.corestandards.org/Math

National Research Council. (1996). From analysis to action: Undergraduate education

in science, mathematics, engineering, and technology. Washington, D.C.: National

Academies Press.

National Research Council. (1999). Transforming undergraduate education in science, mathe-

matics, engineering, and technology. Washington, D.C.: The National Academy Press.

National Research Council. (2002). Evaluating and improving undergraduate teaching in

science, technology, engineering, and mathematics (M. A. Fox & N. Hackerman, Eds.).

Washington, D.C.: National Academies Press.

National Research Council. (2012). Discipline-based education research: Understand-

ing and improving learning in undergraduate science and engineering (S. R. Singer,

N. R. Nielsen, H. A. Schweingruber, et al., Eds.). Washington, D.C.: National

Academies Press.

66

Draft

National Science Foundation. (1996). Shaping the future: New expectations for undergrad-

uate education in science, mathematics, engineering, and technology. Arlington, VA:

Author. (NSF 96-139)

National Science Foundation. (1998). Information technology: Its impact on undergradu-

ate education in science, mathematics, engineering, and technology. Arlington, VA:

Author. (NSF 98-82)

Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill.

Pianta, R. C., & Hamre, B. K. (2009). Conceptualization, measurement, and improvement

of classroom processes: Standardized observation can leverage capacity. Educational

Researcher , 38 (2), 109–119.

Piburn, M., & Sawada, D. (2000). Reformed Teaching Observation Protocol (RTOP): Ref-

erence manual. Tempe, Arizona. Retrieved 12/7/2015, from http://files.eric.ed

.gov/fulltext/ED447205.pdf

President’s Council of Advisors on Science and Technology. (2012). Engage to excel: Pro-

ducing one million additional college graduates with degrees in science, technology,

engineering, and mathematics. report to the President. Washington, D.C.: Executive

Office of the President.

R Core Team. (2016). R: A language and environment for statistical computing [Computer

software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/

Remmers, H. H. (1928). The relationship between students’ marks and student attitude

toward instructors. School & Society(28), 759-760.

Remmers, H. H. (1930). To what extent do grades influence student ratings of instructors?

The Journal of Educational Research(21), 314-316.

Remmers, H. H., & Brandenburg, G. C. (1927). Experimental data on the Purdue ratings

scale for instructors. Educational Administration and Supervision(13), 519-527.

67

Draft

Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of

Statistical Software, 48 (2), 1–36. Retrieved from http://www.jstatsoft.org/v48/

i02/

Sawada, D., Piburn, M. D., Judson, E., Turley, J., Falconer, K., Benford, R., & Bloom,

I. (2002). Measuring reform practices in science and mathematics classrooms: The

Reformed Teaching Observation Protocol. School Science and Mathematics , 102 (6),

245-253.

Schermelleh-Engel, K., Moosbrugger, H., & Muller, H. (2003). Evaluating the fit of struc-

tural equation models: Tests of significance and descriptive goodness-of-fit measures.

Methods of Psychological Research Online, 8 (2), 23–74.

Schumacker, R., & Lomax, R. (2016). A beginner’s guide to structural equation modeling

(4th ed.). Taylor & Francis.

Seldin, P. (2000). Teaching portfolios: A positive appraisal. Academe, 86 (1).

Seldin, P., & Miller, J. E. (2009). The academic portfolio: A practical guide to documenting

teaching, research, and service (Vol. 132). John Wiley & Sons.

Seymour, E. (2002). Tracking the processes of change in US undergraduate education in

science, mathematics, engineering, and technology. Science Education, 86 (1), 79-105.

Shevlin, M., Banyard, P., Davies, M., & Griffiths, M. (2000). The validity of student

evaluation of teaching in higher education: Love me, love my lectures? Assessment &

Evaluation in Higher Education, 25 (4), 397-405.

Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational

Researcher , 4-14.

Smith, M. K., Jones, F. H., Gilbert, S. L., & Wieman, C. E. (2013). The classroom obser-

vation protocol for undergraduate STEM (COPUS): A new instrument to characterize

university STEM classroom practices. CBE-Life Sciences Education, 12 (4), 618-627.

Socha, A. (2013). A hierarchical approach to students’ assessments of instruction. Assess-

ment & Evaluation in Higher Education, 38 (1), 94-113.

68

Draft

Sojka, J., Gupta, A. K., & Deeter-Schmelz, D. R. (2002). Student and faculty perceptions

of student evaluations of teaching: A study of similarities and differences. College

Teaching , 50 (2), 44-49.

Speer, N., & Hald, O. (2008). How do mathematicians learn to teach? Implications from

research on teachers and teaching for graduate student professional development. In

M. P. Carlson & C. Rasmussen (Eds.), Making the connection: Research and practice in

undergraduate mathematics education (p. 305-218). Washington, D.C.: Mathematical

Association of America.

Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation

of teaching: The state of the art. Review of Educational Research, 83 (4), 598-642.

Retrieved from http://dx.doi.org/10.3102/0034654313496870

Streiner, D. L. (2003). Starting at the beginning: An introduction to coefficient alpha and

internal consistency. Journal of Personality Assessment , 80 (1), 99–103.

Thomas, S., Chie, Q. T., Abraham, M., Raj, S. J., & Beh, L.-S. (2014). A qualitative

review of literature on peer review of teaching in higher education: An application of

the SWOT framework. Review of Educational Research, 84 (1), 112-159.

Tucker, B., Jones, S., Straker, L., & Cole, J. (2003). Course evaluation on the web: Facili-

tating student and teacher reflection to improve learning. New Directions for Teaching

and Learning , 2003 (96), 81-93.

Wachtel, H. K. (1998). Student evaluation of college teaching effectiveness: A brief review.

Assessment & Evaluation in Higher Education, 23 (2), 191-212.

Walkington, C., Arora, P., Ihorn, S., Gordon, J., Walker, M., Abraham, L., & Marder,

M. (2012). Development of the UTeach observation protocol: A classroom observation

instrument to evaluate mathematics and science teachers from the UTeach preparation

program (Tech. Rep.). Retrieved 12/7/2015, from htt://uteach.utexas.edu

Ware Jr, J. E., & Williams, R. G. (1975). The Dr. Fox effect: A study of lecturer effectiveness

and ratings of instruction. Academic Medicine, 50 (2), 149-56.

69

Draft

Weisberg, S. (2005). Applied linear regression (3rd ed.). John Wiley & Sons.

Weston, R., & Gore, P. A. (2006). A brief guide to structural equation modeling. The

Counseling Psychologist , 34 (5), 719-751.

Wieman, C., & Gilbert, S. (2014). The teaching practices inventory: A new tool for charac-

terizing college and university teaching in mathematics and science. CBE-Life Sciences

Education, 13 (3), 552-569.

Wolf, E. J., Harrington, K. M., Clark, S. L., & Miller, M. W. (2013). Sample size requirements

for structural equation models an evaluation of power, bias, and solution propriety.

Educational and Psychological Measurement , 73 (6), 913–934.

70

Draft

APPENDIX A

OVERVIEW OF OBSERVATION PROTOCOLS

Mathematics Classroom Observation Protocol for Practices (MCOP2)

Subject: Sample Size: Validated Grades:Mathematics 127 Classroom Observations K-16Brief Description: MCOP2 contains 16 items intended to measure two primaryconstructs student engagement and teacher facilitation. Each item contains afull description of the item with specific requirements for each rating level.Documented Drawbacks: Does not produce a fine-grained analysis. TheMCOP2 was not designed to evaluate a teacher on a single observation due tothe nature and complexity of the teaching.(Gleason & Cofer, 2014; Gleason et al., 2017)

Reformed Teaching Observation Protocol (RTOP)

Subject: Sample Size: Validated Grades:Mathematics andScience

87 observations of 141classrooms

Secondary andPostsecondary (2yr and4yr)

Brief Description: RTOP is a 25 item classroom observation protocol that isstandard based, inquiry oriented, and student centered. Requires a trainedobserver to rate on a Likert scale.Documented Drawbacks: “Though a Likert scale may be helpful to a researcherin quantifying an observation, it is difficult for teachers to know what theyneed to do to improve from a 4 to 5.” (Marshall, Smart, & Horton, 2010)“Exploratory factor analysis showed that some but not all of the individualitems within a given construct loaded together.” (Piburn & Sawada, 2000;Sawada et al., 2002) “RTOP places little emphasis on the accuracy and depthof the content being conveyed during a lesson.” (Walkington et al., 2012) “Theobservers must complete a multiday training program to achieve acceptableinterrater reliability.” (Smith et al., 2013)(Piburn & Sawada, 2000; Sawada et al., 2002)

71

Draft

Oregon Teacher Observation Protocol (OTOP)


123 observations of 41classes and 50 classroomobservations

Postsecondary (public andprivate) and Secondary

Brief Description: OTOP is a 10 item protocol designed to generate a profileof what is happening across instructional settings rather than assigning a scoreto a particular lesson. Items are treated as nominal data.Documented Drawbacks: “Despite its supposed reliability in Faculty Fellowsmathematics classes, the OTOP’s scientific nature and lack of recent mathe-matical standards make it undesirable for use in college mathematics courses.”(Gleason & Cofer, 2014)(Flick, Sadri, Morrell, Wainwright, & Schepige, 2009; Morrell, Wainwright, &Flick, 2004)

UTeach Observation Protocol (UTOP)


83 observations of 36teachers

Secondary

Brief Description: “The UTOP includes 32 classroom observation indicatorsorganized into four sections: Classroom Environment, Lesson Structure, Im-plementation, and Math/Science Content. The indicators are rated by ob-servers on a 7-point scale: 1 to 5 Likert with Don’t Know (DK) and NotApplicable (NA) options (for some items).” (Walkington et al., 2012)Documented Drawbacks: “Besides the science-specific language, another draw-back to the UTOP is it is solely based off of NCTM standards from 1991.”(Gleason & Cofer, 2014)(Walkington et al., 2012)

72

Draft

Classroom Observation Protocol (COP)


1,610 lesson observations K-12

Brief Description: The COP contains several sections where observers describeand classify the major activities, materials, and purposes of a math or sciencelesson, and then it provides four sections where observers rate various aspectsof classroom instruction using a Likert (1-5) scale.Documented Drawbacks: “Due to the large number of evaluators, inter-raterreliability was an issue for classroom observation data. Also, this study wascrosssectional in nature so there are limitations in the design of this study.”(Bowes & Banilower, 2004)(Bowes & Banilower, 2004)

Classroom Observation Protocol for Undergraduate STEM (COPUS)

Subject: Sample Size: Validated Grades:Mathematics andScience (listed asSTEM but nomention of engineeror technologyclassroom testing

30 classroom observations Postsecondary

Brief Description: “COPUS documents classroom behaviors in 2-min intervalsthroughout the duration of the class session. It does not require observers tomake judgments of teaching quality, and it produces clear graphical results.COPUS is limited to 25 codes in two categories (“What the students are doing”and “What the instructor is doing”) and can be reliably used by universityfaculty with only 1.5 hours of training.” (Smith et al., 2013)Documented Drawbacks: “COPUS observations provided a measurement foronly a single class period. From multiple COPUS observations of a singlecourse, we know that it is not unusual to have substantial variations from oneclass to another.” (Wieman & Gilbert, 2014)(Smith et al., 2013)

73

Draft

Teaching Dimensions Observation Protocol (TDOP)


Inter-rater reliability resultsfrom TDOP training in thespring of 2012 does notinclude a sample size.

Postsecondary(nonlaboratory courses)

Brief Description: Six dimensions of practice comprise the TDOP: Teachingmethods, Pedagogical strategies, Cognitive demand, Student-teacher interac-tions, Student engagement, and Instructional technology. Observers documentwith 46 codes the classroom behaviors in 2-min intervals throughout the classsession.Documented Drawbacks: “Requires substantial training, as one might expectfor a protocol that was designed to be a complex research instrument.” (Smithet al., 2013) “TDOP does not aim to measure latent variables such as instruc-tional quality, and it is not tied to external criterion such as reform-basedteaching standards.” (Hora, Oleson, & Ferrare, 2013)(Hora et al., 2013)

Classroom Assessment Scoring System - Secondary (CLASS-S)

Subject: Sample Size: Validated Grades:General 1482 lessons observations

(video)6-11

Brief Description: CLASS is a tool for observing and assessing the effective-ness of interactions among teachers and students in classrooms. It measuresthe emotional, organizational, and instructional supports provided by teachersthat have contribute to childrens social, developmental, and academic achieve-ment.Documented Drawbacks: “Does not take into account teaching behaviors spe-cific to the disciplines of mathematics and science, such as placing contentin the “big picture” of the domain, supporting sense-making about conceptsthrough real world connections, and appropriately and powerfully making useof tools of abstraction.” (Walkington et al., 2012)(Allen et al., 2013)

74

Draft

Mathematical Quality of Instruction (MQI)

Subject: Sample Size: Validated Grades:Mathematics 10 teacher observations 2-6Brief Description: MQI is designed to provide scores for teachers on importantdimensions of classroom mathematics instruction. These dimensions includethe richness of the mathematics, student participation in mathematical reason-ing and meaning-making, and the clarity and correctness of the mathematicscovered in class.Documented Drawbacks: “Although there is a significant, strong, and positiveassociation between levels of MKT (mathematical knowledge for teaching)and the mathematical quality of instruction, we also find that there are anumber of important factors that mediate this relationship, either supportingor hindering teachers’ use of knowledge in practice.” (Hill et al., 2008)(Hill et al., 2008)

75

Draft

APPENDIX B

DEMOGRAPHIC CHARACTERISTICS OF THE SAMPLE

A total of 110 observations of 72 instructors was conducted. Only 86 of the 110 ob-

servations have instructor demographics data, because 15 instructors did not complete the

demographics survey.

76

Draft

Table 8

Demographics Characteristics of the Sample

Frequency %GenderMale 60 55Female 50 45Age Range18-24 years old 2 225-34 years old 41 4835-44 years old 15 1745-54 years old 10 1255-64 years old 14 1665 years and over 4 5Race/EthnicitiyAmerican Indian or Alaskan Native 0 0Asian / Pacific Islander 11 13Black or African American 4 5Hispanic American 2 2White / Caucasian 69 80Multiple ethnicity / Other (please specify) 0 0Level of EducationBachelor’s degree 2 2Master’s degree 27 31PhD 54 63Other advanced degree beyond a Master’s degree 3 3Job TitleGraduate Teaching Assistant 13 15Adjunct/Instructor 24 28Assistant Professor 24 28Associate Professor 8 9Full Professor 17 20Number of Years Teaching at High School LevelLess than one year 65 761-5 years 10 126-10 years 5 611-15 years 1 1Over 15 years 5 6Number of Years Teaching at College LevelLess than one year 1 11-5 years 23 276-10 years 27 3111-15 years 11 13Over 15 years 24 28

77

Draft

APPENDIX C

INSTRUMENTS USED

Background Information

1. Institution:

2. Description of course (Calculus I, College Algebra, Analysis, etc.):

3. Gender of instructor:

4. Date of observation:

5. Time of observation:

78

Draft

Abbreviated Reformed Teaching Observation Protocol1

Inquiry Orientation

1. The lesson was designed to engage students as members of a learning community.

Score Description4 Lesson is designed to include both extensive teacher-student and student-

student interactions.3 Lesson is designed for continual interaction between teacher and students.2 Classroom interactions are only teacher-student or student-student.1 Lesson has limited opportunities to engage students. (e.g., rhetorical ques-

tions or shout out opportunities).0 This lesson is completely teacher-centered, lecture only.

2. Intellectual rigor, constructive criticism, and the challenging of ideas were valued.

Score Description4 Students debate ideas through a negotiation of meaning that results in

strong use of evidence/ arguments to support claim.3 Students engaged in a teacher-guided but student driven discussion (“de-

bate”) involving one or more of the following: a variety of ideas, alternativeinterpretations, or alternative lines of reasoning.

2 Students participate in a teacher directed whole-class discussion (debate)involving one or more of the following: a variety of ideas, alternative inter-pretations, or alternative lines of reasoning.

1 At least once the students respond (perhaps by “shout out”) to teach-ers queries regarding alternate ideas, alternative reasoning, or alternativeinterpretations.

0 Students were not asked to demonstrate rigor, offer criticisms, or challengeideas.

3. This lesson encouraged students to seek and value alternative modes of investigationor of problem solving.

Score Description4 Lesson was designed for students to engage in alternative modes and a

clear discussion of these alternatives occurs.3 Lesson was designed for students to engage in alternative modes of inves-

tigation, but without subsequent discussion.2 Lesson was designed for students to ask divergent questions, but not in-

vestigate.1 Lesson was designed for instructor to ask divergent questions (Teacher

directed).0 No alternative modes were explored during the lesson.

1Adapted from (Dayton Regional STEM Center, 2011; Walkington et al., 2012)

79

Draft

4. Students made predictions, estimations and/or hypotheses and devised means for test-ing them.

Score Description4 The students explicitly make, write down or depict, and explain their pre-

diction, estimation and/or hypothesis. Students devise a means for testingtheir prediction, estimation and/or hypothesis.

3 Students discuss predictions. Means for testing is highly suggested.2 Teacher may ask students to predict and wait for input (class as a whole

or as pairs, etc). No means for testing.1 Teacher may ask class to predict as a whole, but doesnt wait for a response.

No means for testing.0 No opportunities for any predictions (students explaining what happened,

does not mean predicting)

5. The teacher acted as a resource person, working to support and enhance student in-vestigations.

Score Description4 Students are actively engaged in learning process, students determine what

and how, teacher is available to help. The teacher uses student investiga-tion or questions to direct the inquiry process.

3 Students have freedom, but within confines of teacher directed boundaries.Student lead. Teacher answers questions instead of directing inquiry.

2 Primarily directed by teacher with occasional opportunities for students toguide the direction.

1 Very teacher directed, limited student investigation, very rote.0 No investigations (activity that engages students to apply content through

problem solving). Lecture based.

80

Draft

Content Propositional Knowledge

6. The lesson involved fundamental concepts of the subject.

Score Description4 The content covered and/or tasks, examples or activities chosen by the

teacher were clearly and explicitly related to significant concepts to gaina deeper understanding and make worthwhile connections to the mathe-matical or scientific ideas.

3 The content covered and/or tasks, examples or activities chosen by theteacher were clearly related to the significant content of the course, andthe tasks, examples or activities that were used allowed for developmentof worthwhile connections to the mathematical or scientific ideas.

2 The content covered was significant and relevant to the content of thecourse, but the presentation, tasks, examples or activities chosen were pre-scriptive, superficial or contrived and did not allow the students to makemeaningful connections to mathematical or scientific ideas.

1 The content covered and/or tasks, examples or activities chosen by theteacher were distantly or only sometimes related to the content of thecourse. This item should also be rated a 1 if the content chosen wasdevelopmentally inappropriate: either too low-level or too advanced forthe students.

0 The content covered and/or tasks, examples or activities chosen by theteacher were unrelated to the content of the course.

7. The lesson promoted strongly coherent conceptual understanding.

Score Description4 Lesson is presented in a clear and logical manner, relation of content to

concepts is clear throughout and it flows from beginning to end.3 Lesson is predominantly presented in a clear and logical fashion, but rela-

tion of content to concepts is not always obvious.2 Lesson may be clear and/or logical but relation of content to concepts is

very inconsistent (or vice versa).1 Lesson is disjointed and not consistently focused on the concepts.0 Not presented in any logical manner, lacks clarity and no connections be-

tween material.

81

Draft

8. The teacher had a solid grasp of the subject matter content inherent in the lesson.

Score Description4 The teacher clearly understood the content and how to successfully com-

municate the content to the class. The teacher was able to present inter-esting and relevant examples, explain concepts in multiple ways, facilitatediscussions, connect it to the big ideas of the discipline, use advanced ques-tioning strategies to guide student learning, and identify and use commonmisconceptions or alternative ideas as learning tools.

3 The teacher clearly understood the content and how to successfully com-municate the content to the class. The teacher used multiple examples andstrategies to engage students with the content.

2 There were no issues with the teachers understanding of the content and itsaccuracy, but the teacher was not always fluid or did not try to present thecontent in multiple ways. When students appeared confused, the teacherwas unable to re-teach the content in a completely clear, understandable,and/or transparent way such that most students understood.

1 There were several smaller issues with the teachers understanding and/orcommunication of the content that sometimes had a negative impact onstudent learning.

0 There was a significant issue with the teachers understanding and/orcommunication of the content that negatively impacted student learningduring the class.

9. Elements of abstraction (i.e., symbolic representations, theory building) were encour-aged when it was important to do so.

Score Description4 Abstraction is being used for a relevant and useful purpose. Variety of

representation were used to build the lesson and used to support/developthe content. The abstractions are presented in a way such that they areunderstandable and accessible to the class.

3 Teacher uses a variety of abstractions throughout the lesson, and occa-sionally explains them in a manner that supports/develops the content.Perhaps there was a small missed opportunity with respect to facilitatingstudents understanding of abstraction.

2 The teachers use of abstraction was adequate. Teacher uses a variety ofabstractions throughout the lesson, but does not explain them in a mannerthan supports/develops the content

1 The teacher neglects important explanation and discussion of abstractionthat is being used during the class, and this missed opportunity has anegative impact on student learning.

0 There was a major issue with the teachers use of abstraction or no abstrac-tion was presented. This had a negative impact on student learning duringthe class.

82

Draft

10. Connections with other content disciplines and/or real world phenomena were exploredand valued.

Score Description4 Throughout the class, the content was taught in the context of its use in

other disciplines, other areas of mathematics/science, or in the real andthe teacher clearly had deep knowledge about how the content is used inthose areas.

3 The teacher included one or more connections between the content andanother discipline/real world, and the teacher engaged the students in anextended discussion or activity relating to these connections.

2 The teacher connected the content being learned to another discipline/realworld, and the teacher explicitly brought this connection to students at-tention.

1 A minor connection was made to another area of mathematics/science, toanother discipline, or to real-world contexts, but generally abstract or nothelpful for content comprehension. (For example, word problems that canbe solved without the context of the problem.)

0 No connections were made to other areas of mathematics/science or toother disciplines, or connections were made that were inappropriate orincorrect.

83

Draft

Mathematics Classroom Observation Protocol for Practices Descriptors 2

1. Students engaged in exploration/investigation/problem solving.

Score Description3 Students regularly engaged in exploration, investigation, or problem solving.

Over the course of the lesson, the majority of the students engaged inexploration/investigation/problem solving.

2 Students sometimes engaged in exploration, investigation, or problem solv-ing. Several students engaged in problem solving, but not the majority ofthe class.

1 Students seldom engaged in exploration, investigation, or problem solving.This tended to be limited to one or a few students engaged in problemsolving while other students watched but did not actively participate.

0 Students did not engage in exploration, investigation, or problem solving.There were either no instances of investigation or problem solving, or theinstances were carried out by the teacher without active participation byany students.

2. Students used a variety of means (models, drawings, graphs, concrete materials, ma-nipulatives, etc.) to represent concepts.

Score Description3 The students manipulated or generated two or more representations to

represent the same concept, and the connections across the various repre-sentations, relationships of the representations to the underlying concept,and applicability or the efficiency of the representations were explicitlydiscussed by the teacher or students, as appropriate.

2 The students manipulated or generated two or more representations torepresent the same concept, but the connections across the various repre-sentations, relationships of the representations to the underlying concept,and applicability or the efficiency of the representations were not explicitlydiscussed by the teacher or students.

1 The students manipulated or generated one representation of a concept.0 There were either no representations included in the lesson, or represen-

tations were included but were exclusively manipulated and used by theteacher. If the students only watched the teacher manipulate the represen-tation and did not interact with a representation themselves, it should bescored a 0.

2Reprinted by permission from (Gleason, Livers, & Zelkowski, 2015)

84

Draft

3. Students were engaged in mathematical activities.

Score Description3 Most of the students spend two-thirds or more of the lesson engaged in

mathematical activity at the appropriate level for the class. It does notmatter if it is one prolonged activity or several shorter activities. (Notethat listening and taking notes does not qualify as a mathematical activityunless the students are filling in the notes and interacting with the lessonmathematically.)

2 Most of the students spend more than one-quarter but less than two-thirdsof the lesson engaged in appropriate level mathematical activity. It doesnot matter if it is one prolonged activity or several shorter activities.

1 Most of the students spend less than one-quarter of the lesson engaged inappropriate level mathematical activity. There is at least one instance ofstudents’ mathematical engagement.

0 Most of the students are not engaged in appropriate level mathematicalactivity. This could be because they are never asked to engage in anyactivity and spend the lesson listening to the teacher and/or copying notes,or it could be because the activity they are engaged in is not mathematicalsuch as a coloring activity.

4. Students critically assessed mathematical strategies.

Score Description3 More than half of the students critically assessed mathematical strategies.

This could have happened in a variety of scenarios, including in the contextof partner work, small group work, or a student making a comment duringdirect instruction or individually to the teacher.

2 At least two but less than half of the students critically assessed math-ematical strategies. This could have happened in a variety of scenarios,including in the context of partner work, small group work, or a studentmaking a comment during direct instruction or individually to the teacher.

1 An individual student critically assessed mathematical strategies. Thiscould have happened in a variety of scenarios, including in the context ofpartner work, small group work, or a student making a comment duringdirect instruction or individually to the teacher. The critical assessmentwas limited to one student.

0 Students did not critically assess mathematical strategies. This could hap-pen for one of three reasons: 1) No strategies were used during the lesson;2) Strategies were used but were not discussed critically. For example, thestrategy may have been discussed in terms of how it was used on the spe-cific problem, but its use was not discussed more generally; 3) Strategieswere discussed critically by the teacher but this amounted to the teachertelling the students about the strategy(ies), and students did not activelyparticipate.

85

Draft

5. Students persevered in problem solving.

Score Description3 Students exhibited a strong amount of perseverance in problem solving.

The majority of students looked for entry points and solution paths, mon-itored and evaluated progress, and changed course if necessary. Whenconfronted with an obstacle (such as how to begin or what to do next), themajority of students continued to use resources (physical tools as well asmental reasoning) to continue to work on the problem.

2 Students exhibited some perseverance in problem solving. Half of stu-dents looked for entry points and solution paths, monitored and evaluatedprogress, and changed course if necessary. When confronted with an obsta-cle (such as how to begin or what to do next), half of students continuedto use resources (physical tools as well as mental reasoning) to continue towork on the problem.

1 Students exhibited minimal perseverance in problem solving. At least onestudent but less than half of students looked for entry points and solutionpaths, monitored and evaluated progress, and changed course if necessary.When confronted with an obstacle (such as how to begin or what to donext), at least one student but less than half of students continued to useresources (physical tools as well as mental reasoning) to continue to workon the problem. There must be a road block to score above a 0.

0 Students did not persevere in problem solving. This could be because therewas no student problem solving in the lesson, or because when presentedwith a problem solving situation no students persevered. That is to say,all students either could not figure out how to get started on a problem, orwhen they confronted an obstacle in their strategy they stopped working.

86

Draft

6. The lesson involved fundamental concepts of the subject to promote relational/conceptual understanding.

Score Description3 The lesson includes fundamental concepts or critical areas of the course, as

described by the appropriate standards, and the teacher/lesson uses theseconcepts to build relational/conceptual understanding of the students witha focus on the “why” behind any procedures included.

2 The lesson includes fundamental concepts or critical areas of the course,as described by the appropriate standards, but the teacher/lesson missesseveral opportunities to use these concepts to build relational/conceptualunderstanding of the students with a focus on the “why” behind any pro-cedures included.

1 The lesson mentions some fundamental concepts of mathematics, but doesnot use these concepts to develop the relational/conceptual understandingof the students. For example, in a lesson on the slope of the line, theteacher mentions that it is related to ratios, but does not help the studentsto understand how it is related and how that can help them to betterunderstand the concept of slope.

0 The lesson consists of several mathematical problems with no guidanceto make connections with any of the fundamental mathematical concepts.This usually occurs with a teacher focusing on procedure of solving certaintypes of problems without the students understanding the “why” behindthe procedures.

7. The lesson promoted modeling with mathematics.

Score Description3 Modeling (using a mathematical model to describe a real-world situation) is

an integral component of the lesson with students engaged in the modelingcycle (as described in the Common Core State Standards).

2 Modeling is a major component, but the modeling has been turned into aprocedure (i.e. a group of word problems that all follow the same form andthe teacher has guided the students to find the key pieces of informationand how to plug them into a procedure.); or modeling is not a majorcomponent, but the students engage in a modeling activity that fits withinthe corresponding standard of mathematical practice.

1 The teacher describes some type of mathematical model to describe real-world situations, but the students do not engage in activities related tousing mathematical models.

0 The lesson does not include any modeling with mathematics.

87

Draft

8. The lesson provided opportunities to examine mathematical structure. (Symbolic no-tation, patterns, generalizations, conjectures, etc.)

Score Description3 The students have a sufficient amount of time and opportunity to look for

and make use of mathematical structure or patterns.2 Students are given some time to examine mathematical structure, but are

not allowed adequate time or are given too much scaffolding so that theycannot fully understand the generalization.

1 Students are shown generalizations involving mathematical structure, buthave little opportunity to discover these generalizations themselves or ad-equate time to understand the generalization.

0 Students are given no opportunities to explore or understand the mathe-matical structure of a situation.

9. The lesson included tasks that have multiple paths to a solution or multiple solutions.

Score Description3 A lesson which includes several tasks throughout; or a single task that takes

up a large portion of the lesson; with multiple solutions and/or multiplepaths to a solution and which increases the cognitive level of the task fordifferent students.

2 Multiple solutions and/or multiple paths to a solution are a significantpart of the lesson, but are not the primary focus, or are not explicitlyencouraged; or more than one task has multiple solutions and/or multiplepaths to a solution that are explicitly encouraged.

1 Multiple solutions and/or multiple paths minimally occur, and are not ex-plicitly encouraged; or a single task has multiple solutions and/or multiplepaths to a solution that are explicitly encouraged.

0 A lesson which focuses on a single procedure to solve certain types of prob-lems and/or strongly discourages students from trying different techniques.

10. The lesson promoted precision of mathematical language.

Score Description3 The teacher “attends to precision” in regards to communication during the

lesson. The students also “attend to precision” in communication, or theteacher guides students to modify or adapt non-precise communication toimprove precision.

2 The teachers “attends to precision” in all communication during the lesson,but the students are not always required to also do so.

1 The teacher makes a few incorrect statements or is sloppy about mathe-matical language, but generally uses correct mathematical terms.

0 The teacher makes repeated incorrect statements or incorrect names formathematical objects instead of their accepted mathematical names.

88

Draft

11. The teacher’s talk encouraged student thinking.

Score Description3 The teacher’s talk focused on high levels of mathematical thinking. The

teacher may ask lower level questions within the lesson, but this is notthe focus of the practice. There are three possibilities for high levels ofthinking: analysis, synthesis, and evaluation. Analysis: examines/ inter-prets the pattern, order or relationship of the mathematics; parts of theform of thinking. Synthesis: requires original, creative thinking. Evalu-ation: makes a judgment of good or bad, right or wrong, according to thestandards he/she values.

2 The teacher’s talk focused on mid-levels of mathematical thinking. In-terpretation: discovers relationships among facts, generalizations, defini-tions, values and skills. Application: requires identification and selectionand use of appropriate generalizations and skills.

1 Teacher talk consists of “lower order” knowledge based questions andresponses focusing on recall of facts. Memory: recalls or memorizes infor-mation. Translation: changes information into a different symbolic formor situation.

0 Any questions/ responses of the teacher related to mathematical ideas wererhetorical in that there was no expectation of a response from the students.

12. There were a high proportion of students talking related to mathematics.

Score Description3 More than three quarters of the students were talking related to the math-

ematics of the lesson at some point during the lesson.2 More than half, but less than three quarters of the students were talking

related to the mathematics of the lesson at some point during the lesson.1 Less than half of the students were talking related to the mathematics of

the lesson.0 No students talked related to the mathematics of the lesson.

13. There was a climate of respect for what others had to say.

Score Description3 Many students are sharing, questioning, and commenting during the les-

son, including their struggles. Students are also listening (active), clarify-ing, and recognizing the ideas of others.

2 The environment is such that some students are sharing, questioning, andcommenting during the lesson, including their struggles. Most studentslisten.

1 Only a few share as called on by the teacher. The climate supports thosewho understand or who behave appropriately. Or Some students are shar-ing, questioning, or commenting during the lesson, but most students areactively listening to the communication.

0 No students shared ideas.

89

Draft

14. In general, the teacher provided wait-time.

Score Description3 The teacher frequently provided an ample amount of “think time” for the

depth and complexity of a task or question posed by either the teacher ora student.

2 The teacher sometimes provided an ample amount of “think time” forthe depth and complexity of a task or question posed by either the teacheror a student.

1 The teacher rarely provided an ample amount of “think time” for thedepth and complexity of a task or question posed by either the teacher ora student.

0 The teacher never provided an ample amount of “think time” for thedepth and complexity of a task or question posed by either the teacher ora student.

15. Students were involved in the communication of their ideas to others (peer-to-peer).

Score Description3 Considerable time (more than half) was spent with peer to peer di-

alog (pairs, groups, whole class) related to the communication of ideas,strategies and solution.

2 Some class time (less than half, but more than just a few minutes)was devoted to peer to peer (pairs, groups, whole class) conversations re-lated to the mathematics.

1 peer to peer (pairs, groups, whole class) conversations. A few instancesdeveloped where this occurred during the lesson but only lasted less than5 minutes.

0 No peer to peer (pairs, groups, whole class) conversations occurred duringthe lesson.

16. The teacher uses student questions/comments to enhance conceptual mathematicalunderstanding.

Score Description3 The teacher frequently uses student questions/ comments to coach stu-

dents, to facilitate conceptual understanding, and boost the conversation.The teacher sequences the student responses that will be displayed in an in-tentional order, and/or connects different students’ responses to key math-ematical ideas.

2 The teacher sometimes uses student questions/ comments to enhanceconceptual understanding.

1 The teacher rarely uses student questions/ comments to enhance con-ceptual mathematical understanding. The focus is more on proceduralknowledge of the task verses conceptual knowledge of the content.

0 The teacher never uses student questions/ comments to enhance concep-tual mathematical understanding.

90

Draft

Note Taking Form

Observation number

Random Order

Date and Time:

Class name/description:

Number of Students:

1. Are students engaged? How many students are actively participating in the lesson?

(a) Exploring and problem solving

(b) Using a variety of means (abstractions)

(c) Assessing mathematical strategy

(d) Overcoming road blocks

2. What is the interaction between student and teacher? Between student peers?

(a) Talking related to mathematics (How many?)

(b) Respecting others’ ideas (How many sharing and/or listening?)

3. How is the content presented?

(a) Lesson structure (Direct lecture, discussion/debate, student led)

(b) Alternative methods (Multiple paths to a solution or multiple solutions)

(c) Abstractions connected

(d) Wait time provided to reason, make sense, and articulate

4. What is the content covered or task, examples, and activities?

(a) Fundamental (What to do and Why?)

91

Draft

(b) Added value and relevant

(c) Examined math structure (generalizations examined)

(d) connected and flowed smoothly

(e) Connected with other areas of mathematics, other disciplines, or real world

5. Did the instructor have a solid grasp of the material?

(a) Used precision of mathematical language

(b) Enhanced content with student comments

(c) Talk encouraged student thinking (level)

92

Draft

APPENDIX D

REGRESSION MODELS AND RESIDUAL PLOTS

93

Draft

Figure 6: Regression Model 1: Student Engagement and Inquiry Orientation

94

Draft

Figure 7: Regression Model 2: Student Engagement and Inquiry Orientation

95

Draft

Figure 8: Regression Model 3: Teacher Facilitation and and Inquiry Orientation

96

Draft

Figure 9: Regression Model 4: Teacher Facilitation and Content Propositional Knowledge

97

Draft

Figure 10: Regression Model 5: Inquiry Orientation and Content Propositional Knowledge

98

Draft

Figure 11: Regression Model 6: Student Engagement and Teacher Facilitation

99

Draft

APPENDIX E

IRB CERTIFICATIONS

See following pages for copies of IRB Certifications.

100

Draft

101

Draft

102

Draft

103

Draft

104

Draft

105

Draft

106

draft - jim gleason - jim gleason

Documents