institute of education doctoral school summer conference 2011 mr. niall douglas bsc ma mbs mcollt...

26
COMBINING RELIABILITY AND VALIDITY IN ASSESSMENT: CAN AUDITING CREATE THE GOLDEN GOOSE IN ASSESSMENT QUALITY? Institute of Education Doctoral School Summer Conference 2011 Mr. Niall Douglas BSc MA MBS MCollT

Upload: alban-nicholson

Post on 24-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

COMBINING RELIABILITY AND VALIDITY IN ASSESSMENT: CAN AUDITING CREATE THE

GOLDEN GOOSE IN ASSESSMENT QUALITY?

Institute of Education Doctoral School Summer Conference 2011

Mr. Niall Douglas BSc MA MBS MCollT

Contents

1. Reliability and Validity in HE assessment

2. Consequences of assessment unreliability on plagiarism

3. My proposed research

4. Current market situation and likely future demands in HE assessment (if we have time)

5. Ethical consequences

Reliability and Validity in HE assessment

While I was reading for my second undergraduate degree at St. Andrews 2004-2008, one of the most common complaints I ever heard from students was about the inconsistency in grading In other words, grades varied a LOT in qualitative

assessment Most students also had issue with how quantitative

assessment grades up to 100%, whereas qualitative assessment only grades up to 75-80%

This was seen – and quite correctly in my opinion – as a hidden subsidy of quantitative study

Reliability and Validity in HE assessment

It turns out that St. Andrews undergraduates are hardly alone in having issue with this: Yorke (2000) found that this “unfairness” features highly in student dissatisfaction surveys

So why does UK Higher Education persist in these marking inconsistencies i.e. Unreliability in assessment?

[1]: Yorke, M. (2000) 'The quality of the student experience: what can institutions learn from data relating to non-completion?', Quality in Higher Education, vol. 6, pp. 61--75.

Reliability and Validity in HE assessment

One area in which UK HE diverges from US HE is in choosing validity over reliability in assessment Validity of assessment is measured by how well the

assessment achieves learning outcomes For example, one might choose a suboptimal form of

assessment (in terms of reliability, marking overheads or even the prevention of cheating) if that form has superior effects on student learning outcomes

In particular, in the UK HE tries to allow for more creativity and originality in student writing – something inherently subjectively assessable – even if grades consequently vary a lot, even with the same marker and the same student

Reliability and Validity in HE assessment

This is why UK HE marks qualitative assessments up to just 75-80% - because it has been repeatedly found that grades in qualitative assessment have a variance of up to ±20% (Hartog, Rhodes and Burt, 1936; Cox, 1967; Branthwaite, Trueman and Berrisford, 1981; Elton and Johnston, 2002) Marking to 75-80% heavily compresses the variance into a ±12.5-

15% range, 20% of which is ±2.5-3% - hence the historical UK HE choice of 10% bands between 1H, 2H1, 2H2 etc.

Of course, this is not widely realised by students, nor even staff! But it is done as a “least worst” outcome despite penalising qualitative study

If you want to know a LOT more on this, see Elton, L. and Johnston, B. (2002) 'Assessment in universities: a critical review of research'.

Hartog, S.P., Rhodes, E.C. and Burt, S.C. (1936) The marks of examiners: being a comparison of marks allotted to examination scripts by independent examiners and boards of examiners, together with a section on a viva voce examination, Macmillan.

Cox, R. (1967) 'Examinations and higher education: a survey of the literature', Higher Education Quarterly, vol. 21, pp. 292--340.

Branthwaite, A., Trueman, M. and Berrisford, T. (1981) 'Unreliability of Marking: further evidence and a possible explanation', Educational Review, vol. 33, pp. 41--46.

Consequences of assessment unreliability on plagiarism

Of course, because creativity and originality are enabled by this choice of UK HE it has the consequence of generating the highest grade volatility for the most creative and original students!

Therefore, while this preference for validity is admirable, does it have unintended consequences on the incentives to plagiarise?

Here are some facts about plagiarism ...

Consequences of assessment unreliability on plagiarism

Firstly, cheating detection software is extremely easy to fool. In fact, there are sites on the internet which will pre-check your work for you to make sure you won’t get caught

Secondly, there is a vibrant market in most universities of Masters students being paid by undergraduates to write essays for them – these being original don’t get detected As an example, the going rate in University College

Cork was about €100 per thousand lines, with a refund if the student didn’t get a 2H2 or higher

Consequences of assessment unreliability on plagiarism

A study by de Lambert et. Al (2002) found that 80% of New Zealand HE students plagiarise5

In my own experience at Hull, St. Andrews and University College Cork, it is even more common than that. I would actually say it is ubiquitous – as in, I have never met a recent student who hasn’t!

Additionally, for some subjects like Computer Science, the problem is over-detection of plagiarism because coursework answers tend to be very similar to one another Being falsely accused of plagiarism when you in fact did not is a

highly unpleasant experience I can assure you!

[5]: de Lambert, K., Ellen, N. and Taylor, L. (2002) 'Prevalence of academic dishonesty in tertiary institutions: The New Zealand story', New Zealand Journal of Applied Business Research, vol. 1.

Consequences of assessment unreliability on plagiarism

From my own experience, I reckon that the top students selectively plagiarise whereas the bottom students randomly plagiarise I have a feeling that, if true, this might be an

excellent proxy for measuring the understanding of study itself

This would support the hypothesis that the most creative and original students deliberately plagiarise not out of laziness or incapability, but rather as an expert method for reducing assessment volatility

And surely such adaptability is a valuable capability in a student?

Consequences of assessment unreliability on plagiarism

This raises a very interesting thought ...

Perhaps instead of looking at all plagiarism as inherently negative, perhaps the patterns in plagiarism might offer a superior indicator of the ability of the student?

Taking it a step further, perhaps the overall patterns of gestating an assessed work might be an even more superior indicator? In other words, instead of assessing final works as

standalone items, we also assess how those works were generated

My proposed research

My research intends to develop a software tool which records a person’s activities as they gestate an assessed work

This record is then formed into an audit trail which can be embedded into the Microsoft Office document submitted for assessment as proof of originality of work Note that the student can delete items from

the trail before submission. However we retain what they delete

My proposed research

These audit trails are accumulated onto an internet cloud over time along with ALL material seen by each student during the gestation of their works

This allows large scale statistical analyses to be performed looking for correlations between awarded grade and (for example) how a student goes about writing an essay, such as what they search for, where they look and in what sequence, and of course what they delete from their submitted audit trails

You can also run statistical analyses of many other things such as typical patterns in word usage or impact of citations on students

This lets you do a number of very cool things ...

My proposed research

For example, students can now compare how they write an essay to how other people write their essays This lets you turn summative assessment into formative assessment

– good! Universities can compare their grading with the average

grading of their past (or other universities) to discover if there are any anomalies in their current processes. In particular, the statistics can flag those students in a cohort most likely to have the highest variance in grade This lets the assessing department determine partial risk and deploy

double or even external marking to those students most likely to need it. This is a vast improvement over the random double marking currently employed

It also provides an independent external check of the assessor which considerably exceeds the QAA’s present gold standard because it checks each and every assessment rather than just a random sample

My proposed research

And there are obvious major benefits for the detection and elimination of plagiarism by increasing the effort to plagiarise and decreasing the rewards for doing so

There are some further benefits: The software can also analyse what you are writing while you

write it and suggest further reading on the basis of previously written essays stored in the cloud

It’s great for reflective practice, because I am not entirely sure how I gestate my own academic outputs. Watching myself write these slides I would wonder what it looks like from outside my head!

It could also be a boon to distance assessment the importance of which is only just beginning ...

My proposed research

On the advantages for distance (actually de-campused) education:

Most people have a statistically fingerprintable “style” to their research and writing. This lets you ensure that a submitted work came from the same person as before

Moreover, that style slowly evolves over time. If a person used my software from the very beginning of their academic career you’d get a graph of how their style evolves over time

This lets you calculate teaching practice quality as well as letting you compare students when they go before the workplace i.e. as a recruitment selection tool

The possibilities only begin there ... I don’t have time to mention more here, but even if it only partially works there are a multitude of uses for this data

Current market situation and likely future demands in HE assessment

This research is not just about doing cool things, it is also about solving a major pressing need for UK Higher Education

UK HE has already adapted to a substantial per student funding drop

For example, expenditure per student in 1984/85 was £8,600 in 2009 pounds. In 2008/09, it was just £7,389!)

That’s a drop of 14% per student! UK HE has already substantially improved its

productivity ...

Current market situation and likely future demands in HE assessment

From an Economic perspective, UK HE has experienced tremendous productivity improvements

Total Factor Productivity (TFP) analysis of UK HE finds a 51.5% improvement in TFP between 1980/81 and 1992/933

As a comparison, UK HE storms past Chinese HE in both TFP and TFP growth4 where in China TFP growth has actually been negative 1998-2002

[3]: Flegg, AT et al., (2003), ‘Measuring the Efficiency and Productivity of British Universities: An Application of DEA and the Malmquist Approach’

[4]: Ng, YC & Li, S, (2009), ‘Efficiency and productivity growth in Chinese universities during the post-reform period’

Current market situation and likely future demands in HE assessment

UK HE faces some large structural shifts in its primary market. According to the British Council, (2004), ‘Vision 2020: Forecasting international student mobility - a UK perspective’: Global transnational HE demand is forecast to rise

from 2.1m places to 5.8m by 2020 (a 176% increase) UK HE currently holds about 13% of the global

transnational HE market. While this is projected to only rise to 15% by 2020, it still means a trebling of international students for UK HE, or a 34% rise in total students attending UK HE when you include projected domestic growth

Current market situation and likely future demands in HE assessment

If you look at the cost of providing a HE course, they break down as follows: Fixed costs:

Planning and developing course design to fit other courses and quality standards

Developing course materials Running costs:

Providing infrastructure e.g. Buildings, computers, libraries Marketing, advertising and non-per student administration Teaching

Per-student costs: Tutorials Assessment Per student administration

Current market situation and likely future demands in HE assessment

As you gain students (especially distance students):1. Fixed costs per student drop2. Running costs per student drop3. Per-student costs do NOT drop

(Economic and Accounting) productivity is strongly correlated to economies of scale

Therefore economies of scale are inevitably bound to reducing per student costs

And that implies reducing the cost of assessing students ...

Current market situation and likely future demands in HE assessment

Given that UK HE has already plucked the “low hanging fruit” in productivity gains, how it is supposed to improve still further with a 34% increase in student numbers by 2020?

Especially given that new low cost HE providers are becoming increasingly cost competitive on the global stage by using much cheaper labour to reduce tutorial and per student admin costs?

In short: either UK HE innovates or it will get pushed out of its global market

Current market situation and likely future demands in HE assessment

Lastly, the Higher Education Achievement Report (HEAR) wants to phase out overall degree classifications in favour of academic transcripts and even more detailed per-topic assessments My worry about this is that this may penalise even

further the more creative and original students by showing up the volatility in their component grades

And that’s where my research comes in! If it works it could be a great tool for simultaneously reducing assessment costs whilst improving its quality and being more meritocratic. A win-win-win!

Ethical consequences

Just because it appears to be a win-win-win does not necessarily mean that there isn’t a big potential for problems ...

After all for every “good” use of such personally intimate data there are many “bad” uses For example, it certainly would be a gold mine for

marketing Or even one might use this data to early identify

those most likely to become subversive or engage in criminal behaviour or even fail to repay loans

Ethical consequences

Some of these problems can be inhibited by deliberately designing the data storage in a form which makes it hard to regress such indicators

However, many of these problems also affect all social networking users in general

We definitely comply with the EU Data Protection Directive, so we are currently legal and meeting best current practice

And while we can identify individuals, anyone we give access to our data for research purposes can not (without the individual user’s explicit permission)

Ethical consequences

In the end, it is probably too early to be able to say much more without knowing what will work and what won’t i.e. We have to build it and see what is possible first We are, however, very mindful of these issues

Thank you for listening!

Questions?