NBT VERSUS DIAGNOSTIC TEST
PRESENTER: RADLEY MAHLOBOINSTITUTION: VAAL UNIVERSITY OF
TECHNOLOGY, VANDERBIJLPARK, SOUTH AFRICAEMAIL: [email protected]
Background information
There is a high failure rate of first-year mathematics students in South African tertiary institutions. The failure can be attributed, among others, to the following reasons:
– Some students come to the tertiary institutions having been exposed only to mathematics topics their grade 12 teachers were comfortable with.
– Some students are just unable to cope with the demands of tertiary institutions.
Inadequate mathematical background
• To complement the grade 12 National Senior Certificate (NSC) examination results, a national benchmark test (NBT) was introduced in South Africa.
• NBT is run from University of Cape Town (UCT). Prospective tertiary students from South African tertiary institutions register with UCT in order to write NBTs.
• Not all institutions currently use the NBT for its intended purpose.
• Some of the institutions use diagnostic tests in order to identify first-year mathematical lack.
Objectives of this presentation
In this presentation I intend to address the following:
• Description of NBT and its intended purpose.
• Description of the diagnostic test (DT).
• Perceived inadequacies of the NBT.
• NBT versus DT as predictors of performance in first-year mathematics examination.
Introduction: National Benchmark Test
The National Benchmark Tests (NBTs) were first introduced in 2005 by Higher Education South Africa (HESA). These tests, according to HESA, had 3 purposes:• To assess entry-level academic literacy and
mathematics skills.• To assess the relationship between entry-level
skills and school-level exit results.• To provide institutions with additional
information in the admission and placement of entry-level students.
Introduction: National Benchmark Test
Two tests, namely Academic and Quantitative Literacy Test (AQL) and Mathematics Test (MAT) were taken by prospective tertiary students.
• The AQL is multiple-choice and has a total of 3 hours of writing time. This particular test, a combination of academic literacy and quantitative literacy, needs to be taken by all applicants regardless of what they plan to study.
• MAT is written by applicants to programmes for which maths is a requirement. It is also multiple-choice and the student has 3 hours to complete it.
Diagnostic Test (DT)
Here we focus on a Diagnostic Test (DT) that was set at a university of Technology (UoT) in order to• identify students’ pre-tertiary area of mathematical weakness or
strength in order to facilitate better prospects of success by the first-year mathematics students at the UoT.
• assess the students’ prerequisite knowledge by testing grade 12 mathematics relevant to the students’ predominantly first-year Engineering courses.
The Mathematics Centre (MC) is a unit responsible at the UoT for setting the diagnostic tests. It only intervened when the students were already busy with their first-year Engineering mathematics. This is because by the time the students finished writing the Diagnostic Test, the first-year mathematics classes had already started.
Common points: NBT versus DT
Among some of the goals of MATs is that: “The MATs require writers to demonstrate sufficient understanding of concepts to enable them to apply those concepts in a variety of context” (NBT, 2013). • Also, “The MAT focuses more on the knowledge and skills taught at school level, but are
explicitly designed to measure the preparedness of candidates for higher education. These higher-order skills underlie success in Mathematics in Higher Education” (NBT, 2013).
The DT was also designed to measure preparedness of the candidates for higher education. • Assessing students’ first-year prerequisite knowledge by testing grade 12 mathematics
proficiency is in itself measuring the students’ preparation for first-year mathematics. The difference between the MAT and DT is the purpose for which their measurements are done –the NBT for assisting in tertiary placement and DT for diagnostic intervention.
• Both tests do not cue the writer in any way. – In both tests no indication is given as to whether a question should be dealt with using geometrical
or algebraic reasoning. The fact that mathematics often requires learners to integrate many different skills and concepts in any given problem means that individual questions will assess across a range of mathematical competencies.
– For example, a question dealing with the graphical representation of a function may also assess spatial and algebraic competencies. This means that writers must have a deep understanding of mathematics and know what reasoning is appropriate in a given context; they will need these skills in Higher Education.
Problem statement
• Not all South African tertiary institutions urge their prospective students to write NBTs.
• Not all South African tertiary institutions use NBT results when placing students.
• Other tertiary institutions do their own entry tests for the prospective students.
• In other words, the status of NBT as a national benchmark test is not nationally embraced.
Observations leading to research questions
It is the contention of this paper that having a national benchmark test that addresses the concerns of those not currently embracing it is a good idea. However, there seems to be a reason prompting some institutions not to embrace NBT:• Concern has been expressed at the discrepancy between the National
Senior Certificate (NSC) grade 12 exams, which most students passed, and the National Benchmark Tests (NBTs), which the majority did not pass.
The concern has prompted this investigation to establish if the discrepancy can be attributed to MAT being unnecessarily too difficult, or whether indeed the MAT is a better reflection of pre-tertiary student performance as opposed to NSC examination results. This paper compares the MAT results with DT results in terms of predicting performance in first-year mathematics, since both are intended to establish readiness for tertiary mathematics albeit for different reasons.The paper also looks at the MAT as a multiple choice test with a view of determining if this can impact the MAT results.
Research question
The research question is
How do the MAT results compare to DT results in terms of predicting performance of the students in their first-year mathematics studies?
• This paper hypothesises that there is no difference between the two in terms of their predictive role. What prompted the question is whether the MAT result is a dependable benchmark of the prediction.
Literature Survey
To give a context within which the tests are set, we briefly discuss the type of learner the South African educational system envisage.Education and Training in South Africa has seven Critical Outcomes (COs), which derive from the Constitution (DoE, 2008:10). • Each of the seven critical outcomes describes an essential
characteristic of the type of South African citizen the education sector hopes to produce.
• The mathematics policy document (DoE, 2008:10) states that these critical outcomes should be reflected in the teaching approaches and methodologies that mathematics teachers use.
In South Africa, “Curriculum and Assessment Policy Statements” (CAPs) refer to the policy documents stipulating the aim, scope, content and assessment for each subject listed in the National Curriculum Statement Grades R – 12 (NCS).
Envisaged national educational product
According to CAPs (2011:6), the NCS aims to produce learners that are able to fulfill the following seven COs: • identify and solve problems and make decisions using critical and creative
thinking; • work effectively as individuals and with others as members of a team; • organise and manage themselves and their activities responsibly and
effectively; • collect, analyse, organise and critically evaluate information; • communicate effectively using visual, symbolic and/or language skills in
various modes; • use science and technology effectively and critically showing responsibility
towards the environment and the health of others; and• demonstrate an understanding of the world as a set of related systems by
recognising that problem solving contexts do not exist in isolation.
Focus of this investigation
• In this paper focus will be on the first critical outcome: identify and solve problems and make decisions using critical and creative thinking.
• Attention of this paper will be zeroed on the mathematical “critical thinking” aspect of the Critical Outcome.
• This is one of the aspects of the South African context of mathematical skills of the envisaged prospective tertiary students.
• Firstly, we describe critical thinking.
Critical Thinking
Among the definitions of critical thinking from current frameworks of learning outcomes are Binkley et al’s (2012). • They define critical thinking as ways of thinking that incorporate problem solving and decision
making. • They claim that critical thinking can be categorised into knowledge, skills and
attitudes/values/ethics. • Knowledge includes a) reason effectively, use systems thinking, and evaluate evidence; b) solve
problems and c) clearly articulate. • Skills include a) reason effectively and b) use systems thinking. • Attitudes/values/ethics include a) make reasoned judgments and decisions, b) solve problems, and
attitudinal disposition. • Halpern (2003:6) defined critical thinking as “the use of those cognitive skills or strategies that
increase the probability of a desired outcome. It is used to describe thinking that is purposeful, reasoned, and goal directed – the kind of thinking involved in problem solving, formulating inferences, calculating likelihoods, and making decisions, when the thinker is using skills that are thoughtful and effective for the particular context and type of thinking tasks”.
• Despite the widespread attention on critical thinking, no clear-cut definition has been identified (OU Lydia et al, 2014).
Theoretical framework
• While acknowledging that there is no clear-cut definition of critical thinking, this paper notes that critical thinking can be determined through assessment of skills used in problem solving and decision making (Binkeyet al, 2012, Butler, 2012; Halpern, 2003).
• This notion is consistent with some of the objectives of MAT: “The MATs require writers to demonstrate sufficient understanding of concepts to enable them to apply those concepts in a variety of contexts” (NBT, 2013).
• The focus of the DT was also on measuring application of concepts in a variety of contexts.
• It is the contention of this paper that the ‘problem solving – decision making’ notion of critical thinking can be demonstrated by the achievement of the Critical Outcome: “learners must identify and solve problems and make decisions using critical and creative thinking”. The elements identified in the definition of critical thinking, namely knowledge, skills and attitudes (Binkey et al, 2012) can be incorporated in the checklist for assessment of mathematical skills of the prospective tertiary students.
Research methodology
1. Research designThe approach needed to answer the research question was a quantitative approach. 2013 MAT and 2014DT results were analysed. • Means and levels of significance were used to compare
2013 MAT results with 2014 First-year examination result on one hand, and
• 2014 DT results with 2014 First-year examination result on the other hand,
in order to establish which of the two results will better predict performance of the students in first-year mathematics for that year.
Research methodology
2. Population and samples• The sample was a purposeful sample. The answer
to the research question involved 200 prospective tertiary students, belonging to a certain University of Technology in South Africa, who wrote 2013 NBTs. – 193 of them went on to write 2014 First-year
mathematics examination. – 7 did not qualify to write the examination. – 110 of the 193 students wrote the Diagnostic Test. The
population was the first-year mathematics students at the UoT.
Research methodology
3. Validity/ Reliability of the instrument• The tests (MAT and DT) all adhered to the national education policy
specifications, and were as equivalent as possible before they were written.
• The questions in both the DT and the MAT were embedded in the concepts set out in the CAPS statement, but the tests were not constrained to testing everything covered by the CAPS.
• While both the DT and the MATs could not test anything outside the school curriculum, they were not constraint to include all school mathematics topics, and thus elected to focus on those aspects of the school curriculum that had greater bearing on performance in first-year mathematics courses.
• Both the DT and the MATs were differentiated in terms cognitive levels, starting with lower-order questions in order to facilitate an easy introduction into the test, and then progressing to questions that have greater cognitive demand.
Research methodology
4. Data Analysis
• Quantitative data analysis was used. As mentioned above, means and levels of significance were used to compare 2013 MAT results with 2014 First-year examination result on one hand, and 2014 DT results with 2014 First-year examination result on the other hand.
Results
MAT as a multiple-choice test• Multiple choice items consist of a question or incomplete statement (called a stem) followed
by 3 to 5 response options. – The correct response is called the key while the incorrect response options are called distracters.– Multiple-choice testing became popular in the 1900s because of the efficiency that it provided
(Swartz, 2006). As the influence of psychometricians grew in 1926, the psychometric test became a multiple-choice test" (Matzen and Hoyt, 2006).
– Advantages of multiple-choice tests include how quickly tests can be graded compared to others. – It is much more cost effective than having to read over written answers which take time and possibly
training depending on who is employed to grade them (Holtzman, 2008).– The large number of questions makes it possible to test a broad range of content and provides a
good sample of the test taker’s knowledge, reducing the effect of “the luck of the draw”(Livingstone, 2009).
– Questions that require the test taker to produce the answer, rather than simply choosing it from a list, are referred to as constructed-response questions.
– Although constructed-response items have great face validity and have the potential to offer authentic contexts in assessment, they tend to have lower levels of reliability than multiple-choice items for the same amount of time (Lee, Liu, & Linn, 2011).
Results
MAT as a multiple-choice test
• Studies show high correlations of multiple-choice items and constructed-response items of the same constructs (Klein et al., 2009).
• Given that there may be situations where constructed-response items are more expensive to score and that multiple-choice items can measure the same constructs equally well in some cases, one might argue that it makes more sense to use all multiple-choice items and disregard construct-response items.
• There is a down side to assessment using multiple choice questions.
– One is limited feedback from student response to correct errors in the student understanding.
– Many skills that schools teach are too complex to be measured effectively with multiple-choice questions.
– A multiple-choice test for mathematics students can determine whether they can solve many kinds of questions, but it cannot determine whether they can construct a mathematical proof (Livingstone, 2009).
– A practical example is the current student culture in our tertiary institutions. Students are not interested in understanding how certain formulae are derived. They are more interested in how the formula can be used, the intention being to memorise the formula.
Results
MAT as a multiple-choice test
• Students who cannot state the general scientific principle illustrated by a specific process in nature may have no trouble recognising that principle when they see it stated along with three or four others.
• In academic subjects, there is usually a strong tendency for the students who are stronger in the skills measured by multiple-choice questions to be stronger in the skills measured by constructed-response test questions. But if all the students improve in the skills tested by constructed-response test questions, their performance on the multiple-choice questions may not reflect that improvement (Livingstone, 2009).
• When multiple-choice tests are used as the basis for important decisions, teachers have a strong incentive to emphasise the skills tested by the questions on those tests. With a limited amount of class time available, they have to give a lower priority to the kinds of skills that would be tested by constructed-response test questions (Livingstone, 2009).
• An example is the advice given to NBT teachers: “It might be helpful to give learners some guidelines regarding how to deal multiple choice tests” (NBT, 2013:7).
Results
• With construct-response, it is possible to create more authentic contexts and assess students’ ability to generate rather than select responses. In real-life situations where critical thinking skills need to be exercised, there will not be choices provided. In the case of critical thinking, construct-response items could be a better proxy of real-life scenarios than multiple-choice items (Livingstone, 2009).
• One of the greatest problems in constructed-response testing is the time and expense involved in scoring. In recent years, researchers have made a great deal of progress in using computers to score the responses. Four scoring engines – computer programmes for automated scoring – have been developed at ETS in the past few years. They are called e-rater (for “essay rater), c-rater (for “content rater”), and m-rater (for math rater”). The m-rater engine is a scoring engine for responses that consist of an algebraic expression (e.g. a formula), a plotted line or a curve on a graph, or a geometric figure. To score an algebraic expression, the m-rater engine determines whether the formula written by the test taker is algebraically equivalent to the correct answer. The m-rater engine scores a straight-line graph by transforming the line into an algebraic expression; it scores a curved-line graph by testing the curve for correctness at several points (Livingstone, 2009).
• There is a specific context within which this paper was conceived. It is the context of producing a South African citizen who is able to identify and solve problems and make decisions using critical thinking. The checklist as provided by NBT’s achievement levels seems consistent with this envisaged citizen, but can be improved. Murphy (2001) makes some generic points about what constitutes good assessment of key skills: “If they are transferable they must be tested in various contexts to demonstrate this and indeed particularly in new contexts”. In engineering mathematics modules, where a frequent complaint is that “students can do the maths in Maths but not in Engineering” one possible integrating activity would be to set a modelling case study, and award marks for both “doing the maths” and for writing up a report on the case study (Challis et al, 2002).
Results
DT Exam
Mean
42.2454545
5
51.0259067
4
Variance
444.920850
7
293.733700
3
Observations 110 193
Hypothesized Mean Difference 0
df 191
t Stat
-
3.72150918
P(T<=t) one-tail
0.00013019
5
t Critical one-tail
1.65287054
8
P(T<=t) two-tail 0.00026039
t Critical two-tail
1.97246194
6
DT verses Mathematics 1 Examination results.
The mean mark of the DT was 42, 2%, while that of the first-year mathematics was 51, 0%.
The difference is statistically significant ( )005,000026,0 p
Results
Exam MAT
Mean 51.02590674 34.645
Variance 293.7337003 55.40600503
Observations 193 200
Hypothesized Mean Difference 0
df 260
t Stat 12.21311437
P(T<=t) one-tail 1.02255E-27
t Critical one-tail 1.650735343
P(T<=t) two-tail 2.0451E-27
t Critical two-tail 1.969129946
Mathematics 1 examination result (Exam) versus MAT
Results
DT MAT
Mean 42.24545455 34.645
Variance 444.9208507 55.40600503
Observations 110 200
Hypothesized Mean Difference 0
df 124
t Stat 3.6560236
P(T<=t) one-tail 0.000188451
t Critical one-tail 1.657234971
P(T<=t) two-tail 0.000376901
t Critical two-tail 1.979280091
6.2.3. DT versus MAT
The difference is, once again, of statistical significance
Discussion, conclusion and recommendation
• The poor performance of the students in the MAT raises a question about its benchmark status. The question this paper is asking is whether it can be used religiously as a true indicator of the student first-year potential performance in mathematics, or whether other tests can be used by the tertiary institutions in its place. In other words, are the institutions of higher learning not vulnerable in placing their students based on NBT results? Alternatively, is reliance on NBT results not disadvantaging some of the students whose future is determined by the NBT recommendations?
• In this paper we looked at the DT as one of those possible tests, and looked at its feasibility as an indicator of student tertiary potential. Its proof to be a better estimator of first-year mathematics performance as opposed to the NBT result needs to be respected. More research needs to be done to establish reproducibility of the results of this research.
• It is then recommended that common decisions be taken on the basis of DT results. Alternatively, there should be a national pre-requisite test written for all institutions. It would differ from the current NBT in that it will not be multiple-choice, it will be focused on those aspects relevant to the different tertiary disciplines – there will be some sections of the DT that guide a student as being suitable to study engineering, others suitable for pure sciences etc. In other words, the suggested DT should have different choices in order to help in the placement of the students.
References
•
• BINKLEY, M., ERSTAD, O., HERMAN, J., RAISEN, S., RIPLEY, M., & RUMBLE, M. 2012. Defining 21st century skills. In P. Griffiths, B. Mcgaw, & E. Care (Eds). Assessment and teaching of 21st century skills [pp 17-66]. New York, NY: Springer Science and Business Media B.V.
•
• BUTLER, H.A. 2012. Halpern Critical Thinking Assessment predicts real-world outcomes of critical thinking, Applied Cognitive Psychology, 25(5):721-729.
•
• CAPS. 2011. Curriculum and Assessment Policy statements • CHALLIS, D. 2002. Integrating the conceptual and practical worlds: A case study from architecture. In A. Goody, J. Herrington & M.Northcode (Eds),
Quality conversation: Research and Development in Higher Education, 25:106-113.•
• DoE. New Curriculum Statement Grades 10 – 12 (General). 2008. Learning Programme Guidelines Mathematics.• DoE. 2005. New Curriculum Statement Grades 10 – 12 (General). 2005. Learning Programme Guidelines Mathematics.• EDUCATIONAL TESTING SERVICE. 2013. ETS Proficiency Profile user’s guide. Princeton: NJ.•
• HALPERN, D.F. 2003. Thought and knowledge: An introduction to critical thinking. Mahwah, NJ: Erlbaum.•
• LIU, R., Qiao, X., and LIU, Y. 2010. Paradigm Shift of learner-centered teaching style: reality or illusion? Arizona Working Papers in SLAT – Vol. 13:78•
• LIVINGSTONE, S.A. 2009. Constructed-Response Test Question: Why Use Them; How We Score Them. ETS R & D Connections -09-11.•
• MATZEN, R.N. Jr., & HOYT, J.E. 2004. Basic writing placement with holistically scored essay: Research evidence. Journal of Development Education, 28(1):2-4, 6. 8. 20, 23, 34.
•
• SWARTZ, S.M. 2006. Acceptance and Accuracy of Multiple Choice, Confidence-Level and Essay Question Formats for Graduate Students. Journal of Education for Business 81(4):215-220.
•
•
•