running head: modes of assessment · time frame with the last day consisting of the summative...

1

Running Head: MODES OF ASSESSMENT

Modes of Assessment:

Will the Mode of Assessment, (Paper-and-pencil vs.Computer based assessments)

Affect Student Performance on a Summative Assessment?

Holly Read, Dalia Chavez & Karmin Ramirez

California State University, San Bernardino

2 MODES OF ASSESSMENT

Introduction:

A frequent question in the mind of a classroom teacher is whether to use the traditional

paper-and-pencil method for assessments or to attempt to use computer-based assessment tools.

With the integration of technology becoming more and more present, one may think that

students’ performance may be affected by the way they take an assessment. Both of these

methods have been used and have provided results that showcase students learning, but the

question that we, as researchers, wanted to explore was whether the mode of assessment would

influence, shape or affect the scores that the students receive.

Our plan was to focus on a group of 24 kindergarten students and assess their

performance on two separate assessments with the same amount of questions in each. Half of the

students will be taking the first assessment using the paper-and-pencil method, Appendix A,

while the other half will be taking the same assessment on the computer, Appendix B. For the

second assessment, the groups will be reversed, so the original paper-based assessment group

will be using a computer for their assessment, Appendix D, while the other group will now be

using the paper-and-pencil method, Appendix C.

Each assessment consists of six questions, with some of the questions being read aloud

to the students, due to their young age. We will use the data from both assessments and their

comparisons as our quantitative data. We will also conduct qualitative data by asking the

students to give feedback on whether they preferred the paper-and-pencil assessment versus

computer-based assessment which will be recorded by the teacher, Appendix E. Our goal in this

research project is to see if the mode of assessment will affect the students’ scores and will help

shape our future decisions on whether to use the traditional paper-and-pencil method (PPT) or


the more modern method of computer-based assessment (CBA) when conducting summative

assessments in our classrooms.

Our data will show the gender and age of the student, the scores from each assessment,

analysis on which assessment they performed better, and which mode they preferred, Table 1.

The mode they preferred is on another table along with their reason. Ultimately this will help us

determine whether the mode of assessment has any effect on whether the student performed

better or worse than previously.

This study will include insights to our research project, as well as a review of related

literature that help shape our research question, a description of our methodology, the results of

our data and thoughtful discussion regarding said results, and ultimately our conclusions and

recommendations for further action. The literature review will include the findings from our

initial research and will showcase the questions and decisions that we had to make as researchers

to both determine what our question should focus on and what factors we should eliminate in our

research. This study also details the methodology we used in order to conduct our research by

showcasing the “how” of our research. Finally, our results and findings will be presented and we

will give our recommendations on whether the students performed better on the paper-pencil

tests or on the computer-based assessments.

Literature Review:

After conducting a thorough literature review, we have found many resources which have

become useful in this study. While working on this study, we had to make decisions on what

exactly we were going to focus on in terms of data. Reading and reviewing previous studies


helped us to determine what was necessary to make note of and what was not. For example,

many of the studies that we found focused on multiple aspects of the participants, including

gender, class size, computer familiarity, anxiety, age, etc. Some of these pieces of literature

address many of these factors, such as the study conducted by Clariana and Wallace, in which

the researchers noted gender, competitiveness and computer familiarity. On the other hand, other

pieces were more straightforward and had a more narrow vision, such as Guimarães, Ribeiro,

Cruz, and Ferreira, who looked into simply the performance increase and student attitudes. When

conducting our research, it was imperative that we know what specific information we should

include in our data collection and what information was unnecessary to include.

The literature that had a focus on gender were particularly interesting because there were

many studies that proved that gender was a determining factor on whether the students

performed well or not. Researchers, such as Csapó, Molnár, and Nagy attempted to duplicate a

paper-based test onto a computer-based learning system and found that the females scored higher

than the males on both tests. Although this was not their main focus, the possibility that gender

played a role in their responses and data was interesting and worth noting. Additionally, Anakwe

used the independent variables of gender and class in her research, but ultimately they were

proven to not show any significant change when analyzing her data. When reviewing this

literature, we, as a team, had to decide whether gender was something we thought was necessary

to look into. Ultimately, we included the gender of the students into our table but decided that we

would not use gender as our primary focus of study.

Another factor that we were interested in exploring was the idea of a pre-test, post-test, or

even a survey at the end of our study in order to get more information from the students about


how the felt during their experience of both tests. As a group, we found an abundance of research

to help us make this decision which ranged from a pre-test to a survey after the initial research

was conducted. We found a study from Johnson and Green wherein they interviewed students

after an assessment to ask how they felt about both the pencil-paper version and the computer

based assessments. Nikou and Economides, on the other hand, provide multiple avenues for

qualitative data, such as questionnaires, pre and post-tests, surveys, etc. This amount of data

instruments proved to be slightly overwhelming, but overall the study was useful. These

researchers and their insights helped our team decide that we wanted to get student feedback in

the form of a survey after both of the assessments were given. This was helpful to our research

because it gave more insight into how the students felt about both modes.

Since we are living in the age when computers and technology are making their way, not

only into the classrooms but also into the home at an alarming rate, we also considered whether

computer familiarity and computer anxiety should be considered a factor in our research. Many

students have access to the internet at home, but the inherent anxiety of taking an assessment

could still exist for our students, in addition to the students’ knowledge or lack of knowledge

about how to use computers and technology. For instance, the study that McDonald conducted

looked into the inconsistencies that arise from assessment data due to the lack of student

individuality and personal preferences. However, Wang, Jiao, Young, Brooks, and Olson suggest

that students may be more comfortable with using the computer-based assessments due to their

familiarity with using technology. This gave us great insight because one of our students needed

translation that we provided through the teacher that would not have been possible for the

computer based assessment.


Additionally, the way in which we conducted our research was dependent on the “how”

and “when.” There proved to be an abundance of literature available on the subject of

paper-and-pencil versus computer based assessment, but we did find that Nissen, Jariwala, Close,

and Dusen provided a set up that was similar to the one that we would like to emulate. In their

study, they split their groups into two sections, a detail that we would end up using in our study,

as well. This allowed our groups to be more manageable and the data was easier to understand.

Lastly, it was proven to be very difficult to find information or previous research studies

on the particular age group in which we were conducting our research. For instance, the study

conducted by Prisacari and Danielson dealt with chemistry students in a higher grade level and

limited information was available for the kindergarten age group, so we had to make do with the

resources and insight that was brought forth into our realm of the topic. Although this was a

challenge, it gave us more motivation to use our group as subjects for this study and ultimately

the literature that was discovered helped our team make tough decisions that led to a successful

research study.

Methodology:

Participants

The participants in this study were 24 kindergarteners all in the same full-day

kindergarten classroom. The participants range from 5 to 6 years old. There is also an equal

number of female students as there are male students, 12 each, in the classroom. The students

were numbered off randomly to create the two groups. Group 1 was made up of numbers 1

through 12 and consisted of 7 females and 5 males all at different academic levels (high, medium


and low.) Group 2 was made up of numbers 13 through 24 and consisted of 7 males and 5

females all at different academic levels. For the second assessment group 2, had 1 student unable

to complete the assessment due to absence.

Instruments

The instruments are math summative assessments for topics 4 and 5 in the curriculum

provided by the school site. Both assessments were premade by Pearson and tailored to the

textbook used during instruction. All assessments that were done in the classroom prior to this

research was solely done on a paper-pencil version of the assessment.

Topic 4 focuses on comparing numbers 0 to 10. The topic was taught in a week and a half

time frame with the last day consisting of the summative assessment. The Paper-Pencil version

of the assessment consisted of 6 questions. Questions 1 through 4 were worth 1 point each. Each

question was graded based on if the question had all the components correct. Questions 5 and 6

had a point value of 2 points each. Each of these questions required the student to draw and

answer. The computer-based version consisted of 6 questions as well in the same format. The

point value was the same as the paper-pencil version.

Topic 5 focuses on classifying and counting data and was taught in the same amount of

time as topic 4, a week and a half. Both the paper-pencil version and the computer-based both

consisted of 6 questions. For the PPA version, questions 1, 2 and 6 were worth 1 point each and

questions 3,4 and 5 were worth 2 points each. As for the CBA version, questions 1 through 3

were worth 1 point each and questions 4 through 6 were worth 2 points each.

A survey was also conducted after both assessments were completed for students to

answer. The survey is completed one on one with the student and the teacher. It consists of an


oral question with both assessments, paper-pencil and computer, laid out in front of the student.

The questions asked will be “ We have taken our test on paper and on the iPad, which one do

you like to take your test on more?” After the student chooses then the following question will be

asked: “Why did you like that one?” The results will be written down on a survey form along

with the student’s reason written verbatim.

Procedures

After teaching the content of Topic 4, the teacher conducted the assessment during small

group “center” time. Students 13 through 18 were called first to the back table. Students were

given the paper-pencil version of the Topic 4 assessment, Appendix A. The teacher read each

question and waited for every student to show that they were done answering the question to

move on to the next. When all the students were done, they returned to their centers and the next

group (students 19 through 24) were called. These students received the same paper version and

the teacher conducted the test in the same manner as before. When those students were done, it

was then time for students 1 through 12 to complete their computer-based version, Appendix B.

The teacher called two students to the back table at a time to complete the computer-based

assessment. Students were given headphones and the teacher manually logged each student into

their own account on an iPad. The students were shown how to press play to hear each answer

and how to click on the answer that they picked. The teacher then showed the students how to

select next when they were ready to move to the next question. When the student reached the end

of the test, the teacher made sure that every question received an “Answered” status. If there

were unanswered questions the teacher instructed the student to go back to the question and

answer it. If all questions were answered then the teacher would click “Submit Test.” The


teacher then told the student their score when they were done. This continued until all students

completed the CBA.

The teacher then took another week and a half to teach Topic 5. The teacher then

conducted the assessments during center time. During this assessment the students 1 through 12

completed their assessment on a paper-pencil version first, Appendix C. Six were called at a time

to complete. Student 7 did receive the instructions in Spanish as well due to a language barrier.

Students 13 through 24 completed the computer-based assessment in the same manner as the

other students did for Topic 4. Student 19 did not complete the assessment due to his absence.

The next day, during center time, the students were called to the back table by their

teacher one by one to complete the survey, Appendix E. The teacher laid out a paper version of

Topic 5 and an iPad. The teacher then asked “ We have taken our test on paper and on the iPad,

which one do you like to take your test on more? Why did you like that one?” The teacher then

circled their answer on the survey form and wrote down their reasoning. The teacher reread their

reasoning back to the student to make sure that it was what they said.

Data Analysis:

Quantitative: Assessment

Once all of the data was collected, recorded, and organized by age, gender, assessment

type, assessment score, and student performance, a few patterns emerged. Of the students that

completed both assessments, it was discovered that the students performed equally well. 11 of

the students had higher scores on the CBA, while the other 11 performed better on the PPA. It

can be noted that on the CBA assessment students were required to choose an answer that could


result in a correct answer. On the paper-pencil assessment, students could leave the question

blank. The student’s assessment scores were collected and reported in Table 1. The results of the

statistical analysis of the student’s scores appear in Table 2.

Table 2 Summary Statistics

Column n Mean Varian

ce Std.

dev. Std.

err. Median Range Min Max

CBA 23 37.7391

3 462.656

13 21.5094

43 4.48502

88 33 88 0 88

PPA 24 43.3333

33 1552.31

88 39.3994

78 8.04238

47 38 100 0 100

The students had a mean assessment score of 37.73% on the CBA, while the PPA mean

score was 43.33%. If we remove the outlier, (student 5 scored 0 for both assessments), we have

an increase in the mean score of 1.88% for PPA and 1.71% for CBA.

Qualitative: Survey

Overall, the students reported preferring the CBA 62.5 %, 15 students over PPA. This

could be due to the novelty of the mode, this was a new form of assessment for all of the

students. Once the student responses were coded, there were seven categories that students

responses fell under: Fun (F), Easy (E), Tactile (T), Draw (D), Like It (LI), Learn (L), and Have

To (H). Of the 20 students that were able to provide responses to the survey, the preferred

reason for selecting one mode over the other fell under Tactile (T), 7 students, followed by

Fun(F) & Like It (LI) with 3, Draw(D) with 2, and finally Learn (L) & Have To (H) with 1. In

looking at the students’ preferred mode of assessment and their actual scores, 13 of the 19

students actually performed better on the assessment mode they indicated that they preferred.


Their reasonings were both similar and different. The 5 students who performed better on PPA

their reasonings ranged from H, T, and D. While the 8 students who performed better on CBT

their reasons ranged from LI, T, L, and F. The students responses support the idea that

assessment novelty had an effect on their preference/selection of preferred assessment type.

Results:

As a result, we as a group surmised that the results of our data analysis are inconclusive

considering our original research question. Our question asks whether the mode of assessment

had any affect on the success of the student and due to average score, we can say that it is not

conclusive as to whether the mode of assessment allowed students to perform better or worse.

However, we can theorize that the novelty of the CBA and the tactile aspects of the PPT could


have some effect on the students’ preferred method. Overall our group can infer that there is no

definitive answer for our original research question; however, students prefer the CBA to the

PPT by 12.5%.

Conclusion and Recommendation:

In reviewing the results and data, it was determined that at this time, mode of assessment

had no conclusive effect on performance. Although there was a difference in mean scores, that

was not sufficient to state that there had been a significant difference in student performance

based on the mode of assessment. This was similar to the results gathered in the study:


Comparability of computer-based and paper-and-pencil testing in K–12 reading assessments,

conducted by Shudong Wang, Hong Jiao, Young, Brooks, & Olson, 2007.

Shudong Wang et al., 2007 similarly found “that the administration mode had no statistically

significant effect.”

Limitations:

There were a number of factors that need to be looked at for future studies. One thing to

note was the small sample size, as well as the student’s age and the limits placed on the study

due to these factors. Some of the children reported difficulty with the PPA, simply because at

this time they have not developed sufficient hand dexterity to fill in the answers; writing the

response took an additional step for students. The computer assessment was provided by the

publisher and not an exact duplicate of the assessment provided to the students completing the

paper and pencil assessment, this could also have had an effect on responses. Students who

completed the computer assessment did not have access to review previous questions since once

the question is answered and submitted, the screen changes to the next question. Students did

have access to review previous questions, but at this age are unable to comprehend going back

and checking. They had a “back” button. An unanticipated factor was also noted in the study;

there was a student that needed a translation of the assessment before being able to respond to

the questions.

Further Study:


For future recommendations, gender, as well as age may be factors that need to be

explored further. Also, student’s familiarity with the assessment tools could be looked into as

well, and in doing so perhaps explore the novelty of the experience when taking the assessment

in a new format.


References

Anakwe, Bridget. “Comparison of Student Performance in Paper-Based Versus Computer-Based Testing.” Journal of Education for Business, vol. 84, no. 1, 2008, pp. 13–17., doi:10.3200/joeb.84.1.13-17.

Clariana, R., & Wallace, P. (2002). Paper-based versus computer-based assessment: key factors

associated with the test mode effect. British Journal of Educational Technology, 33(5), 593-602. doi:10.1111/1467-8535.00294

Csapó, B., Molnár, G., & Nagy, J. (2014). Computer-based assessment of school readiness and

early reasoning. Journal of Educational Psychology, 106(3), 639-650. doi:10.1037/a0035756

Guimarães, B., Ribeiro, J., Cruz, B., Ferreira, A., Alves, H., Cruz-Correia, R., … Ferreira, M. A.

(2017). Performance equivalency between computer-based and traditional pen-and-paper assessment: A case study in clinical anatomy. Anatomical Sciences Education, 11(2), 124-136. doi:10.1002/ase.1720

Johnson, M. & Green, S. (2006). On-Line Mathematics Assessment: The Impact of Mode on

Performance and Question Answering Strategies. Journal of Technology, Learning, and Assessment, 4(5). Available from http://www.jtla.org

Mcdonald, A. S. (2002, 11). The impact of individual differences on the equivalence of computer-based and paper-and-pencil educational assessments. Computers & Education, 39(3), 299-312. doi:10.1016/s0360-1315(02)00032-5

Nikou, S. A., & Economides, A. A. (2016). The impact of paper-based, computer-based and

mobile-based self-assessment on students' science motivation and achievement. Computers in Human Behavior, 55, 1241-1248. doi:10.1016/j.chb.2015.09.025

Nissen, J. M., Jariwala, M., Close, E. W., & Dusen, B. V. (2018). Participation and performance

on paper- and computer-based low-stakes assessments. International Journal of STEM Education, 5(1). doi:10.1186/s40594-018-0117-4


Oz, H. & Ozturan, T. (2018). Computer-based and paper-based testing: Does the test administration mode influence the reliability and validity of achievement tests? Journal of Language and Linguistic Studies, 14(1), pg. 67-85.

Prisacari, A. A., & Danielson, J. (2017). Computer-based versus paper-based testing:

Investigating testing mode with cognitive load and scratch paper use. Computers in Human Behavior, 77, 1-10. doi:10.1016/j.chb.2017.07.044

Wang, S., Jiao, H., Young, M. J., Brooks, T., & Olson, J. (2007, 09). Comparability of Computer-Based and Paper-and-Pencil Testing in K–12 Reading Assessments. Educational and Psychological Measurement, 68(1), 5-24. doi:10.1177/0013164407305592


Appendix A

First Assessment: Topic 4 Paper- Pencil Version: Completed in a small group, 6 at a time, and instructions are read by the teacher. Questions 1-4 are worth 1 point each and questions 5 and 6 are worth 2 point each.


. Appendix B

Computer-Based Online Assessment: Completed in a small group, 2 at a time, and instructions are read by the program . Questions 1-4 are worth 1 point each and questions 5 and 6 are worth 2 point each.


Appendix C

First Assessment: Topic 5 Paper- Pencil Version: Completed in a small group, 6 at a time, and instructions are read by the teacher. Questions 1-2 and 6 are worth 1 point each and questions 3- 5 are worth 2 point each.


Appendix D

Computer-Based Online Assessment: Completed in a small group, 2 at a time, and instructions are read by the program . Questions 1-3 are worth 1 point each and questions 4-6 are worth 2 point each.


Appendix E

Survey: Teacher will call students up individually and will have each of the students identify the assessment he or she likes better. The teacher will then write down the student’s response and check with the student for clarification. Name: ________________________________________________

I like


Student Survey Responses:

Note: CBT refers to computer- based testing and PPA refers to paper pencil assessment.

running head: modes of assessment · time frame with the last day consisting of the summative...

Documents