running head: modes of assessment · time frame with the last day consisting of the summative...
TRANSCRIPT
1
Running Head: MODES OF ASSESSMENT
Modes of Assessment:
Will the Mode of Assessment, (Paper-and-pencil vs.Computer based assessments)
Affect Student Performance on a Summative Assessment?
Holly Read, Dalia Chavez & Karmin Ramirez
California State University, San Bernardino
2 MODES OF ASSESSMENT
Introduction:
A frequent question in the mind of a classroom teacher is whether to use the traditional
paper-and-pencil method for assessments or to attempt to use computer-based assessment tools.
With the integration of technology becoming more and more present, one may think that
students’ performance may be affected by the way they take an assessment. Both of these
methods have been used and have provided results that showcase students learning, but the
question that we, as researchers, wanted to explore was whether the mode of assessment would
influence, shape or affect the scores that the students receive.
Our plan was to focus on a group of 24 kindergarten students and assess their
performance on two separate assessments with the same amount of questions in each. Half of the
students will be taking the first assessment using the paper-and-pencil method, Appendix A,
while the other half will be taking the same assessment on the computer, Appendix B. For the
second assessment, the groups will be reversed, so the original paper-based assessment group
will be using a computer for their assessment, Appendix D, while the other group will now be
using the paper-and-pencil method, Appendix C.
Each assessment consists of six questions, with some of the questions being read aloud
to the students, due to their young age. We will use the data from both assessments and their
comparisons as our quantitative data. We will also conduct qualitative data by asking the
students to give feedback on whether they preferred the paper-and-pencil assessment versus
computer-based assessment which will be recorded by the teacher, Appendix E. Our goal in this
research project is to see if the mode of assessment will affect the students’ scores and will help
shape our future decisions on whether to use the traditional paper-and-pencil method (PPT) or
3 MODES OF ASSESSMENT
the more modern method of computer-based assessment (CBA) when conducting summative
assessments in our classrooms.
Our data will show the gender and age of the student, the scores from each assessment,
analysis on which assessment they performed better, and which mode they preferred, Table 1.
The mode they preferred is on another table along with their reason. Ultimately this will help us
determine whether the mode of assessment has any effect on whether the student performed
better or worse than previously.
This study will include insights to our research project, as well as a review of related
literature that help shape our research question, a description of our methodology, the results of
our data and thoughtful discussion regarding said results, and ultimately our conclusions and
recommendations for further action. The literature review will include the findings from our
initial research and will showcase the questions and decisions that we had to make as researchers
to both determine what our question should focus on and what factors we should eliminate in our
research. This study also details the methodology we used in order to conduct our research by
showcasing the “how” of our research. Finally, our results and findings will be presented and we
will give our recommendations on whether the students performed better on the paper-pencil
tests or on the computer-based assessments.
Literature Review:
After conducting a thorough literature review, we have found many resources which have
become useful in this study. While working on this study, we had to make decisions on what
exactly we were going to focus on in terms of data. Reading and reviewing previous studies
4 MODES OF ASSESSMENT
helped us to determine what was necessary to make note of and what was not. For example,
many of the studies that we found focused on multiple aspects of the participants, including
gender, class size, computer familiarity, anxiety, age, etc. Some of these pieces of literature
address many of these factors, such as the study conducted by Clariana and Wallace, in which
the researchers noted gender, competitiveness and computer familiarity. On the other hand, other
pieces were more straightforward and had a more narrow vision, such as Guimarães, Ribeiro,
Cruz, and Ferreira, who looked into simply the performance increase and student attitudes. When
conducting our research, it was imperative that we know what specific information we should
include in our data collection and what information was unnecessary to include.
The literature that had a focus on gender were particularly interesting because there were
many studies that proved that gender was a determining factor on whether the students
performed well or not. Researchers, such as Csapó, Molnár, and Nagy attempted to duplicate a
paper-based test onto a computer-based learning system and found that the females scored higher
than the males on both tests. Although this was not their main focus, the possibility that gender
played a role in their responses and data was interesting and worth noting. Additionally, Anakwe
used the independent variables of gender and class in her research, but ultimately they were
proven to not show any significant change when analyzing her data. When reviewing this
literature, we, as a team, had to decide whether gender was something we thought was necessary
to look into. Ultimately, we included the gender of the students into our table but decided that we
would not use gender as our primary focus of study.
Another factor that we were interested in exploring was the idea of a pre-test, post-test, or
even a survey at the end of our study in order to get more information from the students about
5 MODES OF ASSESSMENT
how the felt during their experience of both tests. As a group, we found an abundance of research
to help us make this decision which ranged from a pre-test to a survey after the initial research
was conducted. We found a study from Johnson and Green wherein they interviewed students
after an assessment to ask how they felt about both the pencil-paper version and the computer
based assessments. Nikou and Economides, on the other hand, provide multiple avenues for
qualitative data, such as questionnaires, pre and post-tests, surveys, etc. This amount of data
instruments proved to be slightly overwhelming, but overall the study was useful. These
researchers and their insights helped our team decide that we wanted to get student feedback in
the form of a survey after both of the assessments were given. This was helpful to our research
because it gave more insight into how the students felt about both modes.
Since we are living in the age when computers and technology are making their way, not
only into the classrooms but also into the home at an alarming rate, we also considered whether
computer familiarity and computer anxiety should be considered a factor in our research. Many
students have access to the internet at home, but the inherent anxiety of taking an assessment
could still exist for our students, in addition to the students’ knowledge or lack of knowledge
about how to use computers and technology. For instance, the study that McDonald conducted
looked into the inconsistencies that arise from assessment data due to the lack of student
individuality and personal preferences. However, Wang, Jiao, Young, Brooks, and Olson suggest
that students may be more comfortable with using the computer-based assessments due to their
familiarity with using technology. This gave us great insight because one of our students needed
translation that we provided through the teacher that would not have been possible for the
computer based assessment.
6 MODES OF ASSESSMENT
Additionally, the way in which we conducted our research was dependent on the “how”
and “when.” There proved to be an abundance of literature available on the subject of
paper-and-pencil versus computer based assessment, but we did find that Nissen, Jariwala, Close,
and Dusen provided a set up that was similar to the one that we would like to emulate. In their
study, they split their groups into two sections, a detail that we would end up using in our study,
as well. This allowed our groups to be more manageable and the data was easier to understand.
Lastly, it was proven to be very difficult to find information or previous research studies
on the particular age group in which we were conducting our research. For instance, the study
conducted by Prisacari and Danielson dealt with chemistry students in a higher grade level and
limited information was available for the kindergarten age group, so we had to make do with the
resources and insight that was brought forth into our realm of the topic. Although this was a
challenge, it gave us more motivation to use our group as subjects for this study and ultimately
the literature that was discovered helped our team make tough decisions that led to a successful
research study.
Methodology:
Participants
The participants in this study were 24 kindergarteners all in the same full-day
kindergarten classroom. The participants range from 5 to 6 years old. There is also an equal
number of female students as there are male students, 12 each, in the classroom. The students
were numbered off randomly to create the two groups. Group 1 was made up of numbers 1
through 12 and consisted of 7 females and 5 males all at different academic levels (high, medium
7 MODES OF ASSESSMENT
and low.) Group 2 was made up of numbers 13 through 24 and consisted of 7 males and 5
females all at different academic levels. For the second assessment group 2, had 1 student unable
to complete the assessment due to absence.
Instruments
The instruments are math summative assessments for topics 4 and 5 in the curriculum
provided by the school site. Both assessments were premade by Pearson and tailored to the
textbook used during instruction. All assessments that were done in the classroom prior to this
research was solely done on a paper-pencil version of the assessment.
Topic 4 focuses on comparing numbers 0 to 10. The topic was taught in a week and a half
time frame with the last day consisting of the summative assessment. The Paper-Pencil version
of the assessment consisted of 6 questions. Questions 1 through 4 were worth 1 point each. Each
question was graded based on if the question had all the components correct. Questions 5 and 6
had a point value of 2 points each. Each of these questions required the student to draw and
answer. The computer-based version consisted of 6 questions as well in the same format. The
point value was the same as the paper-pencil version.
Topic 5 focuses on classifying and counting data and was taught in the same amount of
time as topic 4, a week and a half. Both the paper-pencil version and the computer-based both
consisted of 6 questions. For the PPA version, questions 1, 2 and 6 were worth 1 point each and
questions 3,4 and 5 were worth 2 points each. As for the CBA version, questions 1 through 3
were worth 1 point each and questions 4 through 6 were worth 2 points each.
A survey was also conducted after both assessments were completed for students to
answer. The survey is completed one on one with the student and the teacher. It consists of an
8 MODES OF ASSESSMENT
oral question with both assessments, paper-pencil and computer, laid out in front of the student.
The questions asked will be “ We have taken our test on paper and on the iPad, which one do
you like to take your test on more?” After the student chooses then the following question will be
asked: “Why did you like that one?” The results will be written down on a survey form along
with the student’s reason written verbatim.
Procedures
After teaching the content of Topic 4, the teacher conducted the assessment during small
group “center” time. Students 13 through 18 were called first to the back table. Students were
given the paper-pencil version of the Topic 4 assessment, Appendix A. The teacher read each
question and waited for every student to show that they were done answering the question to
move on to the next. When all the students were done, they returned to their centers and the next
group (students 19 through 24) were called. These students received the same paper version and
the teacher conducted the test in the same manner as before. When those students were done, it
was then time for students 1 through 12 to complete their computer-based version, Appendix B.
The teacher called two students to the back table at a time to complete the computer-based
assessment. Students were given headphones and the teacher manually logged each student into
their own account on an iPad. The students were shown how to press play to hear each answer
and how to click on the answer that they picked. The teacher then showed the students how to
select next when they were ready to move to the next question. When the student reached the end
of the test, the teacher made sure that every question received an “Answered” status. If there
were unanswered questions the teacher instructed the student to go back to the question and
answer it. If all questions were answered then the teacher would click “Submit Test.” The
9 MODES OF ASSESSMENT
teacher then told the student their score when they were done. This continued until all students
completed the CBA.
The teacher then took another week and a half to teach Topic 5. The teacher then
conducted the assessments during center time. During this assessment the students 1 through 12
completed their assessment on a paper-pencil version first, Appendix C. Six were called at a time
to complete. Student 7 did receive the instructions in Spanish as well due to a language barrier.
Students 13 through 24 completed the computer-based assessment in the same manner as the
other students did for Topic 4. Student 19 did not complete the assessment due to his absence.
The next day, during center time, the students were called to the back table by their
teacher one by one to complete the survey, Appendix E. The teacher laid out a paper version of
Topic 5 and an iPad. The teacher then asked “ We have taken our test on paper and on the iPad,
which one do you like to take your test on more? Why did you like that one?” The teacher then
circled their answer on the survey form and wrote down their reasoning. The teacher reread their
reasoning back to the student to make sure that it was what they said.
Data Analysis:
Quantitative: Assessment
Once all of the data was collected, recorded, and organized by age, gender, assessment
type, assessment score, and student performance, a few patterns emerged. Of the students that
completed both assessments, it was discovered that the students performed equally well. 11 of
the students had higher scores on the CBA, while the other 11 performed better on the PPA. It
can be noted that on the CBA assessment students were required to choose an answer that could
10 MODES OF ASSESSMENT
result in a correct answer. On the paper-pencil assessment, students could leave the question
blank. The student’s assessment scores were collected and reported in Table 1. The results of the
statistical analysis of the student’s scores appear in Table 2.
Table 2 Summary Statistics
Column n Mean Varian
ce Std.
dev. Std.
err. Median Range Min Max
CBA 23 37.7391
3 462.656
13 21.5094
43 4.48502
88 33 88 0 88
PPA 24 43.3333
33 1552.31
88 39.3994
78 8.04238
47 38 100 0 100
The students had a mean assessment score of 37.73% on the CBA, while the PPA mean
score was 43.33%. If we remove the outlier, (student 5 scored 0 for both assessments), we have
an increase in the mean score of 1.88% for PPA and 1.71% for CBA.
Qualitative: Survey
Overall, the students reported preferring the CBA 62.5 %, 15 students over PPA. This
could be due to the novelty of the mode, this was a new form of assessment for all of the
students. Once the student responses were coded, there were seven categories that students
responses fell under: Fun (F), Easy (E), Tactile (T), Draw (D), Like It (LI), Learn (L), and Have
To (H). Of the 20 students that were able to provide responses to the survey, the preferred
reason for selecting one mode over the other fell under Tactile (T), 7 students, followed by
Fun(F) & Like It (LI) with 3, Draw(D) with 2, and finally Learn (L) & Have To (H) with 1. In
looking at the students’ preferred mode of assessment and their actual scores, 13 of the 19
students actually performed better on the assessment mode they indicated that they preferred.
11 MODES OF ASSESSMENT
Their reasonings were both similar and different. The 5 students who performed better on PPA
their reasonings ranged from H, T, and D. While the 8 students who performed better on CBT
their reasons ranged from LI, T, L, and F. The students responses support the idea that
assessment novelty had an effect on their preference/selection of preferred assessment type.
Results:
As a result, we as a group surmised that the results of our data analysis are inconclusive
considering our original research question. Our question asks whether the mode of assessment
had any affect on the success of the student and due to average score, we can say that it is not
conclusive as to whether the mode of assessment allowed students to perform better or worse.
However, we can theorize that the novelty of the CBA and the tactile aspects of the PPT could
12 MODES OF ASSESSMENT
have some effect on the students’ preferred method. Overall our group can infer that there is no
definitive answer for our original research question; however, students prefer the CBA to the
PPT by 12.5%.
Conclusion and Recommendation:
In reviewing the results and data, it was determined that at this time, mode of assessment
had no conclusive effect on performance. Although there was a difference in mean scores, that
was not sufficient to state that there had been a significant difference in student performance
based on the mode of assessment. This was similar to the results gathered in the study:
13 MODES OF ASSESSMENT
Comparability of computer-based and paper-and-pencil testing in K–12 reading assessments,
conducted by Shudong Wang, Hong Jiao, Young, Brooks, & Olson, 2007.
Shudong Wang et al., 2007 similarly found “that the administration mode had no statistically
significant effect.”
Limitations:
There were a number of factors that need to be looked at for future studies. One thing to
note was the small sample size, as well as the student’s age and the limits placed on the study
due to these factors. Some of the children reported difficulty with the PPA, simply because at
this time they have not developed sufficient hand dexterity to fill in the answers; writing the
response took an additional step for students. The computer assessment was provided by the
publisher and not an exact duplicate of the assessment provided to the students completing the
paper and pencil assessment, this could also have had an effect on responses. Students who
completed the computer assessment did not have access to review previous questions since once
the question is answered and submitted, the screen changes to the next question. Students did
have access to review previous questions, but at this age are unable to comprehend going back
and checking. They had a “back” button. An unanticipated factor was also noted in the study;
there was a student that needed a translation of the assessment before being able to respond to
the questions.
Further Study:
14 MODES OF ASSESSMENT
For future recommendations, gender, as well as age may be factors that need to be
explored further. Also, student’s familiarity with the assessment tools could be looked into as
well, and in doing so perhaps explore the novelty of the experience when taking the assessment
in a new format.
15 MODES OF ASSESSMENT
References
Anakwe, Bridget. “Comparison of Student Performance in Paper-Based Versus Computer-Based Testing.” Journal of Education for Business, vol. 84, no. 1, 2008, pp. 13–17., doi:10.3200/joeb.84.1.13-17.
Clariana, R., & Wallace, P. (2002). Paper-based versus computer-based assessment: key factors
associated with the test mode effect. British Journal of Educational Technology, 33(5), 593-602. doi:10.1111/1467-8535.00294
Csapó, B., Molnár, G., & Nagy, J. (2014). Computer-based assessment of school readiness and
early reasoning. Journal of Educational Psychology, 106(3), 639-650. doi:10.1037/a0035756
Guimarães, B., Ribeiro, J., Cruz, B., Ferreira, A., Alves, H., Cruz-Correia, R., … Ferreira, M. A.
(2017). Performance equivalency between computer-based and traditional pen-and-paper assessment: A case study in clinical anatomy. Anatomical Sciences Education, 11(2), 124-136. doi:10.1002/ase.1720
Johnson, M. & Green, S. (2006). On-Line Mathematics Assessment: The Impact of Mode on
Performance and Question Answering Strategies. Journal of Technology, Learning, and Assessment, 4(5). Available from http://www.jtla.org
Mcdonald, A. S. (2002, 11). The impact of individual differences on the equivalence of computer-based and paper-and-pencil educational assessments. Computers & Education, 39(3), 299-312. doi:10.1016/s0360-1315(02)00032-5
Nikou, S. A., & Economides, A. A. (2016). The impact of paper-based, computer-based and
mobile-based self-assessment on students' science motivation and achievement. Computers in Human Behavior, 55, 1241-1248. doi:10.1016/j.chb.2015.09.025
Nissen, J. M., Jariwala, M., Close, E. W., & Dusen, B. V. (2018). Participation and performance
on paper- and computer-based low-stakes assessments. International Journal of STEM Education, 5(1). doi:10.1186/s40594-018-0117-4
16 MODES OF ASSESSMENT
Oz, H. & Ozturan, T. (2018). Computer-based and paper-based testing: Does the test administration mode influence the reliability and validity of achievement tests? Journal of Language and Linguistic Studies, 14(1), pg. 67-85.
Prisacari, A. A., & Danielson, J. (2017). Computer-based versus paper-based testing:
Investigating testing mode with cognitive load and scratch paper use. Computers in Human Behavior, 77, 1-10. doi:10.1016/j.chb.2017.07.044
Wang, S., Jiao, H., Young, M. J., Brooks, T., & Olson, J. (2007, 09). Comparability of Computer-Based and Paper-and-Pencil Testing in K–12 Reading Assessments. Educational and Psychological Measurement, 68(1), 5-24. doi:10.1177/0013164407305592
17 MODES OF ASSESSMENT
Appendix A
First Assessment: Topic 4 Paper- Pencil Version: Completed in a small group, 6 at a time, and instructions are read by the teacher. Questions 1-4 are worth 1 point each and questions 5 and 6 are worth 2 point each.
18 MODES OF ASSESSMENT
. Appendix B
Computer-Based Online Assessment: Completed in a small group, 2 at a time, and instructions are read by the program . Questions 1-4 are worth 1 point each and questions 5 and 6 are worth 2 point each.
19 MODES OF ASSESSMENT
20 MODES OF ASSESSMENT
21 MODES OF ASSESSMENT
22 MODES OF ASSESSMENT
23 MODES OF ASSESSMENT
24 MODES OF ASSESSMENT
Appendix C
First Assessment: Topic 5 Paper- Pencil Version: Completed in a small group, 6 at a time, and instructions are read by the teacher. Questions 1-2 and 6 are worth 1 point each and questions 3- 5 are worth 2 point each.
25 MODES OF ASSESSMENT
Appendix D
Computer-Based Online Assessment: Completed in a small group, 2 at a time, and instructions are read by the program . Questions 1-3 are worth 1 point each and questions 4-6 are worth 2 point each.
26 MODES OF ASSESSMENT
27 MODES OF ASSESSMENT
28 MODES OF ASSESSMENT
29 MODES OF ASSESSMENT
30 MODES OF ASSESSMENT
31 MODES OF ASSESSMENT
32 MODES OF ASSESSMENT
Appendix E
Survey: Teacher will call students up individually and will have each of the students identify the assessment he or she likes better. The teacher will then write down the student’s response and check with the student for clarification. Name: ________________________________________________
I like
33 MODES OF ASSESSMENT
Student Survey Responses:
Note: CBT refers to computer- based testing and PPA refers to paper pencil assessment.