validity and test validation prepared by olga simonova, inna chmykh, svetlana borisova, olga...

VALIDITY AND TEST VALIDATIONPrepared by Olga Simonova, Inna Chmykh, Svetlana Borisova, Olga KuznetsovaBased on materials by Anthony Green

ABC Test of EnglishResultsIvana 45%Irina 78%

Which student is better at English?

Validity

Validity

TSome aspects may not be tested:

Construct under-representation Assessment tasks

Language Ability

ValiditySome abilities that are important to success in a test may not be connected to real-world language abilities: •ability to cope with exam stress;•awareness of how multiple-choice questions are written;•willingness to guess etc.These are construct irrelevant factors.

What is validity?

Tests are tools for helping us to make good decisions.Construct relevance:• a test of maths (even if it’s very reliable) can’t

tell us about someone’s ability to sing;• a test of written grammar can’t tell us much

about someone’s ability to hold a conversation.Construct representation:• does the test cover all aspects of the relevant

abilities?

What is validity?

‘validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests’

American Educational Research Association et al. (1999) This means that test results can be valid for one purpose and for one particular population of test takers, but not for others.A test may be valid for placement purposes on a general language course, but not for employment selection.

What do we want the results to mean? What evidence can we collect to find out if scores really support this interpretation?•evaluation – the test taker’s performance is a fair reflection of his/her abilities;•generalization – similar scores would be obtained if the test taker was given a different form of the test, or if the raters scoring his/her performance were different;•explanation – the test reflects a coherent theory of language ability;•utilisation – the tested abilities are relevant to the decision being made about the test taker.

Building a validation argument

• at different stages in the cycle, different questions need to be answered;

• different types of validity may be more relevant at each stage;

• tests made for different purposes raise different issues.

Validation in the assessment cycle:

• Evaluation – the test taker’s performance is a fair reflection of his/her abilities. Test form and administration.

• Generalization – similar scores would be obtained if the raters scoring his/her performance were different. Test score and rating scales.

• Explanation – the test reflects a coherent theory of language ability. Specification.

• Utilisation –the tested abilities are relevant to the decision being made about the test taker. Test purpose and target language use domain.

Building a validation argument:

VALIDITY AND TEST VALIDATION

Validity in test design

“Tests for the measurement of language abilities must be constructed according to a coherent validity framework based on the latest developments in theory and practice.”

(Weir, 2005)

CONTEXT VALIDITY

COGNITIVEVALIDITY

TEST TASK

PERFORMANCE

SCORING VALIDITY

Socio-cognitive approach (O’Sullivan & Weir, 2010)

CONSEQUENTIALVALIDITY

CRITERION-RELATEDVALIDITY

Content (context) validity

Content validity is based on subject experts' judgments of test content.Does the content of the test adequately cover all the aspects of language ability we are interested in for making this decision?

Content (context) validity

A test is said to have content validity if its content constitutes a representative sample of the language skills, structures, etc. with which it is meant to be concerned.

(Hughes, 2005)

The term content validity was traditionally used to refer to the content coverage of the task. Context validity is preferred as a more inclusive superordinate which signals the need to consider the discoursal, social and cultural context as the linguistic parameters under which the task is performed (its operations and conditions).

(Weir and Shaw, 2005)

Do test takers go through the same mental processes when responding to test tasks as when they use language in the real world in the situations we are interested in?

Cognitive (or theory-based) validity

Theory-based validity involves collecting a priori evidence through piloting and trialling before the test event, for example through verbal reports from test takers on the cognitive processing activated by the test task, and a posteriori evidence involving statistical analysis of scores following test administration.


Cognitive (or theory-based) validity

Scoring validity

Scoring validity accounts for the extent to which test scores are: •based on appropriate criteria;•exhibit consensual agreement in their marking;•free as possible from measurement error;•stable over time;•engender confidence as reliable decision making indicators.


Scoring validity = reliability

Are the test scores consistent enough for us to have confidence in the results?

Scoring validity

Criterion-related validity

Criterion-related validity relates to the degree to which results on the test agree with those provided by some independent and highly dependable assessment of the candidate's ability. This independent assessment is thus the criterion measure against which the test is validated.

(Hughes, 2003)

Are test results of the test consistent with other evidence we have about test takers’ abilities?

Criterion-related validity takes two forms: concurrent validity predictive validity

Concurrent validity

“involves the comparison of the test scores with some other measures of the same candidates taken at roughly the same time as the test.”

(Alderson et al., 1995:177)

Do scores on our test agree with the results of other tests of the same abilities?

Predictive validity

Predictive validity entails the comparison of test scores with some other measure for the same candidates taken some time after the test has been given.

(Alderson et al., 1995)The degree to which a test can predict candidates' future performance.

(Hughes, 2003) Did the test accurately predict which test takers were

going to perform best in their jobs/ in class/ etc.?

Does the introduction and use of the test have the intended social consequences?

Is there any:•bias in scoring and interpretation of results?•unfairness in test use?•positive or negative effect on teaching and learning?

Consequential validity (impact)

Face validity

Face validity refers to the test's “surface credibility or public acceptability” (Alderson, et al., 1995:172). Bachman (1990:307) states that “face validity is the appearance of real life.”

Do test takers/ teachers/ politicians/ the public generally believe in the value of the

test?

The assessment is credible to users: it looks as though it measures the skills or abilities of interest.For example, a multiple choice grammar test does not look as though it really tests the ability to speak English in real-world situations. All kinds of evidence could be used to show that people who pass the test are actually able to communicate effectively, but users may not be convinced because test takers are not actually required to speak. If the test does not have face validity, it is unlikely to be successful.

Face validity

Construct validity

In recent years the term construct validity has been used to refer to the general, overarching notion of validity. It is not enough to assert that a test has construct validity; empirical evidence is needed.

(Hughes, 2003)The arguments for using the test as a reasonable justification for taking any decision must be presented and examined: validation.

Round-up:suitable data for test validity

Face validity Questionnaires to and interviews with candidates, administrations and other users.

Context validity a) Compare test content with specifications/syllabus.b) Questionnaires to and interviews with 'experts' such as

teachers, subject specialists, applied linguists.c) Expert judges rate test items and texts according to

precise list of criteria.

Cognitive validity Students introspect on their test-taking procedures, either concurrently or retrospectively. Keystroke logs. Eye-tracking.

Concurrent validity a) Compare students' test scores with their scores on another test.

b) Compare students' test scores with teachers' rankings.c) Compare students' test scores with other measures of

ability such as students' teacher rating.

Suitable data for test validity

Predictive validity a) Compare students' test scores with their scores on tests taken some time later.

b) Compare students' test scores with success in final exam.c) Compare students' test scores with other measures of their

ability taken some time later, such as employers' assessments of their ability.

Construct validity a) Compare performance on each subtest with other subtests.b) Compare performance on each subtest with total of all other

subtests.d) Compare students' test score with students' biodata and

psychological characteristics.e) Multitrait-multimethod studies.f) Factor analysis.

Roles

• Designers

• Producers

• Organisers

• Administrators

• Assessees

• Scorers

• Users

Example validity questions

• Does the design of the test reflect an adequate theory of language?

• Is an appropriate balance of abilities required for success on the test?

• Do the test items reflect the designers’ intentions?• Is the test organised and administered in a way that

will ensure fairness?• Do assessees respond to the test tasks in a way that

reflects realistic language processing?• Do scorers consistently and accurately capture the

qualities of test takers’ performance?• Are decisions taken by users justified by the test?

Who is a validator?

Assessment developers (teachers, testing agencies):• to check the quality of their own work;• to showcase the quality of their tests.Assessment users:• to check that tests are giving them accurate and

relevant information.Independent agencies:• to enforce/ encourage good quality assessment.

Who is a validator?

Conclusion

• Test validity, according to Alderson et al., (1995:193), is 'time-consuming and difficult'.

• However, it is essential as a test without validity cannot be useful as a decision making tool.

• Applied linguists and teachers should focus more of their efforts on practical research in this field.

validity and test validation prepared by olga simonova, inna chmykh, svetlana borisova, olga...

Documents

test validation slide

test results

test purpose

test form

validity slide

test of maths

test takers performance

test design tests