developing a multi- method design for carrying out ... · defining the constructs • anaylsis...

Developing a multi-method design for

carrying out comparability studies of tests aligned

with the CEFR

Jamie Dunlea, Richard Spiby

British Council

Quynh Thi Ngoc Nguyen, Yen Thi Quyn Nguyen,

University of Languages and International Studies, Vietnam

National University Hanoi

14th EALTA

Conference

Sevres, France

June 1-3, 2017

Assessment Research Group

Overview of the study

The objectives of the of the comparison study

• To investigate the relationship between performance

of university students on the VSTEP and

performance on the Aptis test and the relationship of

both tests to the CEFR

• To investigate the comparability of the VSTEP and

Aptis from the perspective of constructs targeted and

test design

• To investigate local university students’ and

educators attitudes to the Aptis test through the

collection of qualitative questionnaire feedback

• To strengthen the methodology used for

comparability studies of tests linked to the CEFR,

particularly in relation to defining the constructs

What is Aptis?

Vietnamese Standardized Test

of English Proficiency

http://vstep.vn/

VSTEP Test Description

• Target test takers: Vietnamese adult

learners of English from 18 years old to

test their language proficiency for different

purposes

• Proficiency scales:

Under 4.0: under reported

4.0 – 5.5: Level 3 (B1)

6.0 – 8.0: Level 4 (B2)

8.5 – 10: Level 5 (C1)

Socio-cognitive model

www.britishcouncil.org 7

What is validity? Does the test measure what we want it to

measure?

Are the scores from the test accurate, reliable,

meaningful?

Are the scores useful for test users to make

decisions?

CONTEXT VALIDITY COGNITIVE VALIDITY

SCORING VALIDITY

CONSEQUENTIAL VALIDITY CRITERION –RELATED

VALIDITY

Main data collection and analysis

Date Main data collection / analysis

Dec 2015- Feb

Planning / preparing instruments

May 2016 Content analysis of VSTEP / APTIS

Data collection: content review of both tests by

trained panels of expert reviewers

May 2016 Pilot test at Vietnam National University.

Data collection: test scores from both tests,

questionnaires & interviews for test takers

May 2016- Oct

Analysis of content review and pilot testing data

Revision of instruments for main data collection

Jan 2017 Main testing

Data collection: Test scores from both tests and

questionnaires for test takers

Jan 2017-Mar

Main data analysis and preparation of technical

report

Date Main data collection / analysis

Dec 2015- Feb

Planning / preparing instruments

May 2016 Content analysis of VSTEP / APTIS

Data collection: content review of both tests by

trained panels of expert reviewers

May 2016 Pilot test at Vietnam National University.

Data collection: test scores from both tests,

questionnaires & interviews for test takers

May 2016- Oct

2016 March 2017

Analysis of content review and pilot testing data

Revision of instruments for main data collection

Jan 2017

March – April

Main testing

Data collection: Test scores from both tests and

questionnaires for test takers from 3 universities

July - Sep 2017 Main data analysis and preparation of technical

report

Results • Defining the constructs: contextual

and cognitive parameters

• Scoring: descriptive statistics,

correlations, exploratory factor

analysis.

• Questionnaires: attitudes of test

takers

Aptis – VSTEP comparison study

Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT

study.

• The analysis categories reflect the contextual and

cognitive parameters used in the Aptis test

specifications.

• The categories have also been used in an extensive,

large scale validation study (Dunlea, 2016)

• The aim is to refine these categories to create a

standardized analysis format capable of capturing a

snapshot of the contextual and cognitive profile of the

tasks and items in a test

Defining the constructs • 2 teams of researchers, a team from the Language

Testing Research Group at Innsbruck University, and a

team from ULIS.

• Each team receives a 1-day training workshop on the

analysis forms

• Each team then individually evaluates the test content

using the analysis forms, each team then discusses

their judgments and reaches a consensus view on the

judgments

• Judgments from LTRGI has been analyzed

(Judgments from ULIS team completed in March 2017)

Test Aptis

General Component Reading Task

Matching headings

to text Features of the Task

Skill focus Expeditious global reading of longer text, integrating propositions across a longer

text into a discourse-level representation.

Task Level A1 A2 B1 B2 C1 C2 task

description

Matching headings to paragraphs within a longer text. Candidates read through

a longer text consisting of 7 paragraphs, identifying the best heading for each

paragraph from a bank of 8 options.

Cognitive

processing

setting

Expeditious reading: local

(scan/search for specifics)

Careful reading: local

(understanding sentence)

Expeditious reading: global

(skim for gist/search for key

ideas/detail)

Careful reading: global

(comprehend main idea(s)/overall

text(s))

Cognitive

processing

Levels of

reading

Word recognition

Lexical access

Syntactic parsing

Establishing propositional meaning (cl./sent. level)

Inferencing

Building a mental model

Creating a text level representation (disc. structure)

Creating an intertextual representation (multi-text)

Task specs: an example

Features of the Input Text

Words 700-750 words

Domain Public Occupational Educational Personal

Discourse mode Descriptive Narrative Expository Argumentative Instructive

Content knowledge General Specific

Cultural specificity Neutral Specific

Nature information Only concrete Mostly concrete Fairly abstract Mainly abstract

Lexical Level K1 K2 K3 K4 K5 K6 K7 K8 K9 K10

Readability Flesch-Kincaid Grade Level 9-12

Grammar A1-B2 Exponents Average sentence length 18-20 words

Text genre Magazines, newspapers, instructional materials (such as extracts from

undergraduate textbooks describing important events and ideas, etc).

Features of the Response

Target Lengt

h Up to 10 words Lexical K1-K5 Grammar

A1 –

Distracto

h Up to 10 words Lexical K1-K5 Grammar

Key Within sentence Across

sentences

Across paragraphs

Categories Reading Task 1 (Task 1) Item 1 (Task 1) Item 2 (Task 1) Item 3 (Task 1) Item 4 (Task 1) Item 5

CONSENSUS CONSENSUS CONSENSUS CONSENSUS CONSENSUS CONSENSUS

Features of the TASK Features of the TASK Features of the TASK Features of the TASK Features of the TASK Features of the TASK Features of the TASK

Skill focus sentence comprehension, lexis

Task Level (CEFR) A1

Response format Multiple choice gap fill

Items per task 5

Cognitive processing 1 Careful reading: local

Cognitive processing 2 Establishing propositional meaning (cl./sent. level)

Content knowledge 1 (General)

Cultural specificity 1 (Neutral)

Features of the Input Text Features of the Input

Text Features of the Input

Domain Personal

Discourse mode Descriptive

Nature of information Only concrete

Topic Daily life

Text genre Personal letters / e-mail

Presentation Verbal (written)

Features of the Response Features of the

Response Features of the

Response

Key information Within Sentences Within Sentences Within Sentences Within Sentences Within Sentences

Operation Main idea /

conclusions Main idea / conclusions

Main idea / conclusions

Question presentation Verbal (written) Verbal (written) Verbal (written) Verbal (written) Verbal (written)

Option Presentation Verbal (written) Verbal (written) Verbal (written) Verbal (written) Verbal (written)

Aptis – Reading Task 4

Categories Reading APTIS Task 4 CONSENSUS

Features of the TASK Features of the TASK

Skill focus paragraph comprehension, reading for gist, understanding main ideas of longer complex text

Task Level (CEFR) B2 Response format Matching headings to text Items per task 7 Cognitive processing 1 Expeditious reading: global Cognitive processing 2 Building a mental model Content knowledge 2 Cultural specificity 1 (Neutral)

Features of the Input Text Features of the Input Text

Domain Public

Discourse mode Expository

Nature of information Fairly abstract

Topic Food and drink/Environmental issues

Text genre Magazines

Features of the

Response

(Task 4) Item 1

(Task 4) Item 2

(Task 4) Item 3

(Task 4) Item 4

(Task 4) Item 5

(Task 4) Item 6

(Task 4) Item 7

Key information

across sentences

Operation

Question presentation

Verbal (written)

Option Presentation

Verbal (written)

VSTEP Reading Task 4

Categories Reading Task 4

CONSENSUS Features of the TASK Features of the TASK

Skill focus identifying main ideas, finer details and implied relationships, understanding longer complex texts

Task Level (CEFR) C1 Response format MCQ Items per task 10 Cognitive processing 1 Careful reading: global

Cognitive processing 2 Creating a text level representation (disc. structure)

Content knowledge 2 Cultural specificity 2

VSTEP Reading Task

Features of the Input Text Features of the Input Text

Domain Public

Discourse mode Expository

Nature of information Fairly abstract

Topic Health & medicine -- social topic/Science and technology

Text genre Newspapers

VSTEP Reading Task

Categories Reading

(Task 4) Item 1

(Task 4) Item 2

(Task 4) Item 3

(Task 4) Item 4

(Task 4) Item 5

(Task 4) Item 6

(Task 4) Item 7

(Task 4) Item 8

(Task 4) Item 9

(Task 4) Item 10

Key information

Within sentences

across sentences

Within sentences

across sentences

Within sentences

across sentences

across paragraphs

across sentences

Operation

Specific information

Opinion Main idea / conclusions

Test structure / connections between the parts

Opinion

Question presentation

Verbal (written)

Option Presentation

Verbal (written)

Scoring analysis: Aptis CEFR

Exact agreement: 62% Adjacent agreement: 38%

VSTEP CEFR

Total Under B1 B1 B2 Aptis

Overall

Under B1 5 11 0 16

B1 3 41 7 51

B2 0 15 35 50

C 0 0 13 13

Total 8 67 55 130

VSTEP CEFR

Overall

Under B1 5 11 0 16

B1 3 41 7 51

B2 0 15 35 50

C 0 0 13 13

Total 8 67 55 130

VSTEP CEFR

Overall

Under B1 5 11 0 16

B1 3 41 7 51

B2 0 15 35 50

C 0 0 13 13

Total 8 67 55 130

KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of

Sampling Adequacy. .930

Bartlett's Test of

Sphericity

Approx. Chi-Square 956.778

Sig. .000

Component Matrixa

Component

1 AptisGVScore .921

AptisLScore .829

AptisRScore .816

AptisSScore .881

AptisWScore .853

VSTEPRScore .803

VSTEPLScore .669

VSTEPWScore .870

VSTEPSScore .827

Extraction Method: Principal

Component Analysis.

I prefer computer-based writing tests

to paper-and-pencil-based writing

tests.

33% 35%

stronglyagree

agree disagree Stronglydisagree

no selection

stronglyagree

agree disagree Stronglydisagree

noselection

I prefer face-to-face speaking

tests to machine-recorded

speaking tests

I have taken other computer-based

English tests before taking today's

Aptis test.

I often use computers.

No Yes No selection

1% 1% 1%

strongly agree agree disagree stronglydisagree

no selection

Some tentative conclusions

1. The use of mixed methods, including both qualitative

and quantitative provides multiple perspectives and aids

interpretation

2. The socio-cognitive model has provided a coherent

structure for identifying the sources of evidence useful

for creating a detailed picture of the tests

3. The pilot data, on a small but robust scale has

demonstrated that the two tests do measure similar

constructs around general English proficiency.

4. Preliminary results, including statistical analysis gave us

confidence that it will be useful to proceed to the main

data collection

developing a multi- method design for carrying out ... · defining the constructs • anaylsis...

Documents

aptis answers

coursework anaylsis

feasibility anaylsis

03 mathematical anaylsis

cd digipak anaylsis

aptis brochure

questionnaire anaylsis

image anaylsis

conducting instructional anaylsis

textual anaylsis

bata anaylsis

questionaire anaylsis

music magazine anaylsis

gept week 3

aptis writing

blade runner anaylsis

critical anaylsis

anaylsis final!

aptis reading

photography anaylsis