Download - Dr Radley Mahlobo - Vaal Institute of Technology South Africa - An international perspective on assessment

NBT VERSUS DIAGNOSTIC TEST

PRESENTER: RADLEY MAHLOBOINSTITUTION: VAAL UNIVERSITY OF

TECHNOLOGY, VANDERBIJLPARK, SOUTH AFRICAEMAIL: [email protected]

Background information

There is a high failure rate of first-year mathematics students in South African tertiary institutions. The failure can be attributed, among others, to the following reasons:

– Some students come to the tertiary institutions having been exposed only to mathematics topics their grade 12 teachers were comfortable with.

– Some students are just unable to cope with the demands of tertiary institutions.

Inadequate mathematical background

• To complement the grade 12 National Senior Certificate (NSC) examination results, a national benchmark test (NBT) was introduced in South Africa.

• NBT is run from University of Cape Town (UCT). Prospective tertiary students from South African tertiary institutions register with UCT in order to write NBTs.

• Not all institutions currently use the NBT for its intended purpose.

• Some of the institutions use diagnostic tests in order to identify first-year mathematical lack.

Objectives of this presentation

In this presentation I intend to address the following:

• Description of NBT and its intended purpose.

• Description of the diagnostic test (DT).

• Perceived inadequacies of the NBT.

• NBT versus DT as predictors of performance in first-year mathematics examination.

Introduction: National Benchmark Test

The National Benchmark Tests (NBTs) were first introduced in 2005 by Higher Education South Africa (HESA). These tests, according to HESA, had 3 purposes:• To assess entry-level academic literacy and

mathematics skills.• To assess the relationship between entry-level

skills and school-level exit results.• To provide institutions with additional

information in the admission and placement of entry-level students.

Introduction: National Benchmark Test

Two tests, namely Academic and Quantitative Literacy Test (AQL) and Mathematics Test (MAT) were taken by prospective tertiary students.

• The AQL is multiple-choice and has a total of 3 hours of writing time. This particular test, a combination of academic literacy and quantitative literacy, needs to be taken by all applicants regardless of what they plan to study.

• MAT is written by applicants to programmes for which maths is a requirement. It is also multiple-choice and the student has 3 hours to complete it.

Diagnostic Test (DT)

Here we focus on a Diagnostic Test (DT) that was set at a university of Technology (UoT) in order to• identify students’ pre-tertiary area of mathematical weakness or

strength in order to facilitate better prospects of success by the first-year mathematics students at the UoT.

• assess the students’ prerequisite knowledge by testing grade 12 mathematics relevant to the students’ predominantly first-year Engineering courses.

The Mathematics Centre (MC) is a unit responsible at the UoT for setting the diagnostic tests. It only intervened when the students were already busy with their first-year Engineering mathematics. This is because by the time the students finished writing the Diagnostic Test, the first-year mathematics classes had already started.

Common points: NBT versus DT

Among some of the goals of MATs is that: “The MATs require writers to demonstrate sufficient understanding of concepts to enable them to apply those concepts in a variety of context” (NBT, 2013). • Also, “The MAT focuses more on the knowledge and skills taught at school level, but are

explicitly designed to measure the preparedness of candidates for higher education. These higher-order skills underlie success in Mathematics in Higher Education” (NBT, 2013).

The DT was also designed to measure preparedness of the candidates for higher education. • Assessing students’ first-year prerequisite knowledge by testing grade 12 mathematics

proficiency is in itself measuring the students’ preparation for first-year mathematics. The difference between the MAT and DT is the purpose for which their measurements are done –the NBT for assisting in tertiary placement and DT for diagnostic intervention.

• Both tests do not cue the writer in any way. – In both tests no indication is given as to whether a question should be dealt with using geometrical

or algebraic reasoning. The fact that mathematics often requires learners to integrate many different skills and concepts in any given problem means that individual questions will assess across a range of mathematical competencies.

– For example, a question dealing with the graphical representation of a function may also assess spatial and algebraic competencies. This means that writers must have a deep understanding of mathematics and know what reasoning is appropriate in a given context; they will need these skills in Higher Education.

Problem statement

• Not all South African tertiary institutions urge their prospective students to write NBTs.

• Not all South African tertiary institutions use NBT results when placing students.

• Other tertiary institutions do their own entry tests for the prospective students.

• In other words, the status of NBT as a national benchmark test is not nationally embraced.

Observations leading to research questions

It is the contention of this paper that having a national benchmark test that addresses the concerns of those not currently embracing it is a good idea. However, there seems to be a reason prompting some institutions not to embrace NBT:• Concern has been expressed at the discrepancy between the National

Senior Certificate (NSC) grade 12 exams, which most students passed, and the National Benchmark Tests (NBTs), which the majority did not pass.

The concern has prompted this investigation to establish if the discrepancy can be attributed to MAT being unnecessarily too difficult, or whether indeed the MAT is a better reflection of pre-tertiary student performance as opposed to NSC examination results. This paper compares the MAT results with DT results in terms of predicting performance in first-year mathematics, since both are intended to establish readiness for tertiary mathematics albeit for different reasons.The paper also looks at the MAT as a multiple choice test with a view of determining if this can impact the MAT results.

Research question

The research question is

How do the MAT results compare to DT results in terms of predicting performance of the students in their first-year mathematics studies?

• This paper hypothesises that there is no difference between the two in terms of their predictive role. What prompted the question is whether the MAT result is a dependable benchmark of the prediction.

Literature Survey

To give a context within which the tests are set, we briefly discuss the type of learner the South African educational system envisage.Education and Training in South Africa has seven Critical Outcomes (COs), which derive from the Constitution (DoE, 2008:10). • Each of the seven critical outcomes describes an essential

characteristic of the type of South African citizen the education sector hopes to produce.

• The mathematics policy document (DoE, 2008:10) states that these critical outcomes should be reflected in the teaching approaches and methodologies that mathematics teachers use.

In South Africa, “Curriculum and Assessment Policy Statements” (CAPs) refer to the policy documents stipulating the aim, scope, content and assessment for each subject listed in the National Curriculum Statement Grades R – 12 (NCS).

Envisaged national educational product

According to CAPs (2011:6), the NCS aims to produce learners that are able to fulfill the following seven COs: • identify and solve problems and make decisions using critical and creative

thinking; • work effectively as individuals and with others as members of a team; • organise and manage themselves and their activities responsibly and

effectively; • collect, analyse, organise and critically evaluate information; • communicate effectively using visual, symbolic and/or language skills in

various modes; • use science and technology effectively and critically showing responsibility

towards the environment and the health of others; and• demonstrate an understanding of the world as a set of related systems by

recognising that problem solving contexts do not exist in isolation.

Focus of this investigation

• In this paper focus will be on the first critical outcome: identify and solve problems and make decisions using critical and creative thinking.

• Attention of this paper will be zeroed on the mathematical “critical thinking” aspect of the Critical Outcome.

• This is one of the aspects of the South African context of mathematical skills of the envisaged prospective tertiary students.

• Firstly, we describe critical thinking.

Critical Thinking

Among the definitions of critical thinking from current frameworks of learning outcomes are Binkley et al’s (2012). • They define critical thinking as ways of thinking that incorporate problem solving and decision

making. • They claim that critical thinking can be categorised into knowledge, skills and

attitudes/values/ethics. • Knowledge includes a) reason effectively, use systems thinking, and evaluate evidence; b) solve

problems and c) clearly articulate. • Skills include a) reason effectively and b) use systems thinking. • Attitudes/values/ethics include a) make reasoned judgments and decisions, b) solve problems, and

attitudinal disposition. • Halpern (2003:6) defined critical thinking as “the use of those cognitive skills or strategies that

increase the probability of a desired outcome. It is used to describe thinking that is purposeful, reasoned, and goal directed – the kind of thinking involved in problem solving, formulating inferences, calculating likelihoods, and making decisions, when the thinker is using skills that are thoughtful and effective for the particular context and type of thinking tasks”.

• Despite the widespread attention on critical thinking, no clear-cut definition has been identified (OU Lydia et al, 2014).

Theoretical framework

• While acknowledging that there is no clear-cut definition of critical thinking, this paper notes that critical thinking can be determined through assessment of skills used in problem solving and decision making (Binkeyet al, 2012, Butler, 2012; Halpern, 2003).

• This notion is consistent with some of the objectives of MAT: “The MATs require writers to demonstrate sufficient understanding of concepts to enable them to apply those concepts in a variety of contexts” (NBT, 2013).

• The focus of the DT was also on measuring application of concepts in a variety of contexts.

• It is the contention of this paper that the ‘problem solving – decision making’ notion of critical thinking can be demonstrated by the achievement of the Critical Outcome: “learners must identify and solve problems and make decisions using critical and creative thinking”. The elements identified in the definition of critical thinking, namely knowledge, skills and attitudes (Binkey et al, 2012) can be incorporated in the checklist for assessment of mathematical skills of the prospective tertiary students.

Research methodology

1. Research designThe approach needed to answer the research question was a quantitative approach. 2013 MAT and 2014DT results were analysed. • Means and levels of significance were used to compare

2013 MAT results with 2014 First-year examination result on one hand, and

• 2014 DT results with 2014 First-year examination result on the other hand,

in order to establish which of the two results will better predict performance of the students in first-year mathematics for that year.


2. Population and samples• The sample was a purposeful sample. The answer

to the research question involved 200 prospective tertiary students, belonging to a certain University of Technology in South Africa, who wrote 2013 NBTs. – 193 of them went on to write 2014 First-year

mathematics examination. – 7 did not qualify to write the examination. – 110 of the 193 students wrote the Diagnostic Test. The

population was the first-year mathematics students at the UoT.


3. Validity/ Reliability of the instrument• The tests (MAT and DT) all adhered to the national education policy

specifications, and were as equivalent as possible before they were written.

• The questions in both the DT and the MAT were embedded in the concepts set out in the CAPS statement, but the tests were not constrained to testing everything covered by the CAPS.

• While both the DT and the MATs could not test anything outside the school curriculum, they were not constraint to include all school mathematics topics, and thus elected to focus on those aspects of the school curriculum that had greater bearing on performance in first-year mathematics courses.

• Both the DT and the MATs were differentiated in terms cognitive levels, starting with lower-order questions in order to facilitate an easy introduction into the test, and then progressing to questions that have greater cognitive demand.


4. Data Analysis

• Quantitative data analysis was used. As mentioned above, means and levels of significance were used to compare 2013 MAT results with 2014 First-year examination result on one hand, and 2014 DT results with 2014 First-year examination result on the other hand.

Results

MAT as a multiple-choice test• Multiple choice items consist of a question or incomplete statement (called a stem) followed

by 3 to 5 response options. – The correct response is called the key while the incorrect response options are called distracters.– Multiple-choice testing became popular in the 1900s because of the efficiency that it provided

(Swartz, 2006). As the influence of psychometricians grew in 1926, the psychometric test became a multiple-choice test" (Matzen and Hoyt, 2006).

– Advantages of multiple-choice tests include how quickly tests can be graded compared to others. – It is much more cost effective than having to read over written answers which take time and possibly

training depending on who is employed to grade them (Holtzman, 2008).– The large number of questions makes it possible to test a broad range of content and provides a

good sample of the test taker’s knowledge, reducing the effect of “the luck of the draw”(Livingstone, 2009).

– Questions that require the test taker to produce the answer, rather than simply choosing it from a list, are referred to as constructed-response questions.

– Although constructed-response items have great face validity and have the potential to offer authentic contexts in assessment, they tend to have lower levels of reliability than multiple-choice items for the same amount of time (Lee, Liu, & Linn, 2011).

Results

MAT as a multiple-choice test

• Studies show high correlations of multiple-choice items and constructed-response items of the same constructs (Klein et al., 2009).

• Given that there may be situations where constructed-response items are more expensive to score and that multiple-choice items can measure the same constructs equally well in some cases, one might argue that it makes more sense to use all multiple-choice items and disregard construct-response items.

• There is a down side to assessment using multiple choice questions.

– One is limited feedback from student response to correct errors in the student understanding.

– Many skills that schools teach are too complex to be measured effectively with multiple-choice questions.

– A multiple-choice test for mathematics students can determine whether they can solve many kinds of questions, but it cannot determine whether they can construct a mathematical proof (Livingstone, 2009).

– A practical example is the current student culture in our tertiary institutions. Students are not interested in understanding how certain formulae are derived. They are more interested in how the formula can be used, the intention being to memorise the formula.

Results

MAT as a multiple-choice test

• Students who cannot state the general scientific principle illustrated by a specific process in nature may have no trouble recognising that principle when they see it stated along with three or four others.

• In academic subjects, there is usually a strong tendency for the students who are stronger in the skills measured by multiple-choice questions to be stronger in the skills measured by constructed-response test questions. But if all the students improve in the skills tested by constructed-response test questions, their performance on the multiple-choice questions may not reflect that improvement (Livingstone, 2009).

• When multiple-choice tests are used as the basis for important decisions, teachers have a strong incentive to emphasise the skills tested by the questions on those tests. With a limited amount of class time available, they have to give a lower priority to the kinds of skills that would be tested by constructed-response test questions (Livingstone, 2009).

• An example is the advice given to NBT teachers: “It might be helpful to give learners some guidelines regarding how to deal multiple choice tests” (NBT, 2013:7).

Results

• With construct-response, it is possible to create more authentic contexts and assess students’ ability to generate rather than select responses. In real-life situations where critical thinking skills need to be exercised, there will not be choices provided. In the case of critical thinking, construct-response items could be a better proxy of real-life scenarios than multiple-choice items (Livingstone, 2009).

• One of the greatest problems in constructed-response testing is the time and expense involved in scoring. In recent years, researchers have made a great deal of progress in using computers to score the responses. Four scoring engines – computer programmes for automated scoring – have been developed at ETS in the past few years. They are called e-rater (for “essay rater), c-rater (for “content rater”), and m-rater (for math rater”). The m-rater engine is a scoring engine for responses that consist of an algebraic expression (e.g. a formula), a plotted line or a curve on a graph, or a geometric figure. To score an algebraic expression, the m-rater engine determines whether the formula written by the test taker is algebraically equivalent to the correct answer. The m-rater engine scores a straight-line graph by transforming the line into an algebraic expression; it scores a curved-line graph by testing the curve for correctness at several points (Livingstone, 2009).

• There is a specific context within which this paper was conceived. It is the context of producing a South African citizen who is able to identify and solve problems and make decisions using critical thinking. The checklist as provided by NBT’s achievement levels seems consistent with this envisaged citizen, but can be improved. Murphy (2001) makes some generic points about what constitutes good assessment of key skills: “If they are transferable they must be tested in various contexts to demonstrate this and indeed particularly in new contexts”. In engineering mathematics modules, where a frequent complaint is that “students can do the maths in Maths but not in Engineering” one possible integrating activity would be to set a modelling case study, and award marks for both “doing the maths” and for writing up a report on the case study (Challis et al, 2002).

Results

DT Exam

Mean

42.2454545

5

51.0259067

4

Variance

444.920850

7

293.733700

3

Observations 110 193

Hypothesized Mean Difference 0

df 191

t Stat

-

3.72150918

P(T<=t) one-tail

0.00013019

5

t Critical one-tail

1.65287054

8

P(T<=t) two-tail 0.00026039

t Critical two-tail

1.97246194

6

DT verses Mathematics 1 Examination results.

The mean mark of the DT was 42, 2%, while that of the first-year mathematics was 51, 0%.

The difference is statistically significant ( )005,000026,0 p

Results

Exam MAT

Mean 51.02590674 34.645

Variance 293.7337003 55.40600503



df 260

t Stat 12.21311437

P(T<=t) one-tail 1.02255E-27

t Critical one-tail 1.650735343

P(T<=t) two-tail 2.0451E-27

t Critical two-tail 1.969129946

Mathematics 1 examination result (Exam) versus MAT

Results

DT MAT

Mean 42.24545455 34.645

Variance 444.9208507 55.40600503



df 124

t Stat 3.6560236

P(T<=t) one-tail 0.000188451

t Critical one-tail 1.657234971

P(T<=t) two-tail 0.000376901

t Critical two-tail 1.979280091

6.2.3. DT versus MAT

The difference is, once again, of statistical significance

Discussion, conclusion and recommendation

• The poor performance of the students in the MAT raises a question about its benchmark status. The question this paper is asking is whether it can be used religiously as a true indicator of the student first-year potential performance in mathematics, or whether other tests can be used by the tertiary institutions in its place. In other words, are the institutions of higher learning not vulnerable in placing their students based on NBT results? Alternatively, is reliance on NBT results not disadvantaging some of the students whose future is determined by the NBT recommendations?

• In this paper we looked at the DT as one of those possible tests, and looked at its feasibility as an indicator of student tertiary potential. Its proof to be a better estimator of first-year mathematics performance as opposed to the NBT result needs to be respected. More research needs to be done to establish reproducibility of the results of this research.

• It is then recommended that common decisions be taken on the basis of DT results. Alternatively, there should be a national pre-requisite test written for all institutions. It would differ from the current NBT in that it will not be multiple-choice, it will be focused on those aspects relevant to the different tertiary disciplines – there will be some sections of the DT that guide a student as being suitable to study engineering, others suitable for pure sciences etc. In other words, the suggested DT should have different choices in order to help in the placement of the students.

References

•

• BINKLEY, M., ERSTAD, O., HERMAN, J., RAISEN, S., RIPLEY, M., & RUMBLE, M. 2012. Defining 21st century skills. In P. Griffiths, B. Mcgaw, & E. Care (Eds). Assessment and teaching of 21st century skills [pp 17-66]. New York, NY: Springer Science and Business Media B.V.

•

• BUTLER, H.A. 2012. Halpern Critical Thinking Assessment predicts real-world outcomes of critical thinking, Applied Cognitive Psychology, 25(5):721-729.

•

• CAPS. 2011. Curriculum and Assessment Policy statements • CHALLIS, D. 2002. Integrating the conceptual and practical worlds: A case study from architecture. In A. Goody, J. Herrington & M.Northcode (Eds),

Quality conversation: Research and Development in Higher Education, 25:106-113.•

• DoE. New Curriculum Statement Grades 10 – 12 (General). 2008. Learning Programme Guidelines Mathematics.• DoE. 2005. New Curriculum Statement Grades 10 – 12 (General). 2005. Learning Programme Guidelines Mathematics.• EDUCATIONAL TESTING SERVICE. 2013. ETS Proficiency Profile user’s guide. Princeton: NJ.•

• HALPERN, D.F. 2003. Thought and knowledge: An introduction to critical thinking. Mahwah, NJ: Erlbaum.•

• LIU, R., Qiao, X., and LIU, Y. 2010. Paradigm Shift of learner-centered teaching style: reality or illusion? Arizona Working Papers in SLAT – Vol. 13:78•

• LIVINGSTONE, S.A. 2009. Constructed-Response Test Question: Why Use Them; How We Score Them. ETS R & D Connections -09-11.•

• MATZEN, R.N. Jr., & HOYT, J.E. 2004. Basic writing placement with holistically scored essay: Research evidence. Journal of Development Education, 28(1):2-4, 6. 8. 20, 23, 34.

•

• SWARTZ, S.M. 2006. Acceptance and Accuracy of Multiple Choice, Confidence-Level and Essay Question Formats for Graduate Students. Journal of Education for Business 81(4):215-220.

•

•

•

Download - Dr Radley Mahlobo - Vaal Institute of Technology South Africa - An international perspective on assessment

Top Related