developing a multi- method design for carrying out ... · defining the constructs • anaylsis...
Post on 19-May-2018
217 Views
Preview:
TRANSCRIPT
Developing a multi-method design for
carrying out comparability studies of tests aligned
with the CEFR
Jamie Dunlea, Richard Spiby
British Council
Quynh Thi Ngoc Nguyen, Yen Thi Quyn Nguyen,
University of Languages and International Studies, Vietnam
National University Hanoi
14th EALTA
Conference
CIEP
Sevres, France
June 1-3, 2017
Assessment Research Group
Overview of the study
Assessment Research Group
The objectives of the of the comparison study
• To investigate the relationship between performance
of university students on the VSTEP and
performance on the Aptis test and the relationship of
both tests to the CEFR
• To investigate the comparability of the VSTEP and
Aptis from the perspective of constructs targeted and
test design
• To investigate local university students’ and
educators attitudes to the Aptis test through the
collection of qualitative questionnaire feedback
• To strengthen the methodology used for
comparability studies of tests linked to the CEFR,
particularly in relation to defining the constructs
What is Aptis?
Assessment Research Group
4
Assessment Research Group
Assessment Research Group
Vietnamese Standardized Test
of English Proficiency
Assessment Research Group
http://vstep.vn/
VSTEP Test Description
• Target test takers: Vietnamese adult
learners of English from 18 years old to
test their language proficiency for different
purposes
• Proficiency scales:
Under 4.0: under reported
4.0 – 5.5: Level 3 (B1)
6.0 – 8.0: Level 4 (B2)
8.5 – 10: Level 5 (C1)
Socio-cognitive model
www.britishcouncil.org 7
Assessment Research Group
What is validity? Does the test measure what we want it to
measure?
Are the scores from the test accurate, reliable,
meaningful?
Are the scores useful for test users to make
decisions?
CONTEXT VALIDITY COGNITIVE VALIDITY
SCORING VALIDITY
CONSEQUENTIAL VALIDITY CRITERION –RELATED
VALIDITY
Assessment Research Group
Main data collection and analysis
www.britishcouncil.org 9
Assessment Research Group
Date Main data collection / analysis
Dec 2015- Feb
2016
Planning / preparing instruments
May 2016 Content analysis of VSTEP / APTIS
Data collection: content review of both tests by
trained panels of expert reviewers
May 2016 Pilot test at Vietnam National University.
Data collection: test scores from both tests,
questionnaires & interviews for test takers
May 2016- Oct
2016
Analysis of content review and pilot testing data
Revision of instruments for main data collection
Jan 2017 Main testing
Data collection: Test scores from both tests and
questionnaires for test takers
Jan 2017-Mar
2017
Main data analysis and preparation of technical
report
Date Main data collection / analysis
Dec 2015- Feb
2016
Planning / preparing instruments
May 2016 Content analysis of VSTEP / APTIS
Data collection: content review of both tests by
trained panels of expert reviewers
May 2016 Pilot test at Vietnam National University.
Data collection: test scores from both tests,
questionnaires & interviews for test takers
May 2016- Oct
2016 March 2017
Analysis of content review and pilot testing data
Revision of instruments for main data collection
Jan 2017
March – April
2017
Main testing
Data collection: Test scores from both tests and
questionnaires for test takers from 3 universities
July - Sep 2017 Main data analysis and preparation of technical
report
Results • Defining the constructs: contextual
and cognitive parameters
• Scoring: descriptive statistics,
correlations, exploratory factor
analysis.
• Questionnaires: attitudes of test
takers
Aptis – VSTEP comparison study
Assessment Research Group
Defining the constructs • Anaylsis template builds on model used in Aptis-GEPT
study.
• The analysis categories reflect the contextual and
cognitive parameters used in the Aptis test
specifications.
• The categories have also been used in an extensive,
large scale validation study (Dunlea, 2016)
• The aim is to refine these categories to create a
standardized analysis format capable of capturing a
snapshot of the contextual and cognitive profile of the
tasks and items in a test
Aptis – VSTEP comparison study
Assessment Research Group
Defining the constructs • 2 teams of researchers, a team from the Language
Testing Research Group at Innsbruck University, and a
team from ULIS.
• Each team receives a 1-day training workshop on the
analysis forms
• Each team then individually evaluates the test content
using the analysis forms, each team then discusses
their judgments and reaches a consensus view on the
judgments
• Judgments from LTRGI has been analyzed
(Judgments from ULIS team completed in March 2017)
Aptis – VSTEP comparison study
Assessment Research Group
www.britishcouncil.org 13
Test Aptis
General Component Reading Task
Matching headings
to text Features of the Task
Skill focus Expeditious global reading of longer text, integrating propositions across a longer
text into a discourse-level representation.
Task Level A1 A2 B1 B2 C1 C2 task
description
Matching headings to paragraphs within a longer text. Candidates read through
a longer text consisting of 7 paragraphs, identifying the best heading for each
paragraph from a bank of 8 options.
Cognitive
processing
Goal
setting
Expeditious reading: local
(scan/search for specifics)
Careful reading: local
(understanding sentence)
Expeditious reading: global
(skim for gist/search for key
ideas/detail)
Careful reading: global
(comprehend main idea(s)/overall
text(s))
Cognitive
processing
Levels of
reading
Word recognition
Lexical access
Syntactic parsing
Establishing propositional meaning (cl./sent. level)
Inferencing
Building a mental model
Creating a text level representation (disc. structure)
Creating an intertextual representation (multi-text)
Task specs: an example
Assessment Research Group
www.britishcouncil.org 14
Features of the Input Text
Words 700-750 words
Domain Public Occupational Educational Personal
Discourse mode Descriptive Narrative Expository Argumentative Instructive
Content knowledge General Specific
Cultural specificity Neutral Specific
Nature information Only concrete Mostly concrete Fairly abstract Mainly abstract
Lexical Level K1 K2 K3 K4 K5 K6 K7 K8 K9 K10
Readability Flesch-Kincaid Grade Level 9-12
Grammar A1-B2 Exponents Average sentence length 18-20 words
Text genre Magazines, newspapers, instructional materials (such as extracts from
undergraduate textbooks describing important events and ideas, etc).
Task specs: an example
Assessment Research Group
www.britishcouncil.org 15
Features of the Response
Target Lengt
h Up to 10 words Lexical K1-K5 Grammar
A1 –
B2
Distracto
rs
Lengt
h Up to 10 words Lexical K1-K5 Grammar
Key Within sentence Across
sentences
Across paragraphs
Assessment Research Group
Task specs: an example
Aptis – VSTEP comparison study
Assessment Research Group
Categories Reading Task 1 (Task 1) Item 1 (Task 1) Item 2 (Task 1) Item 3 (Task 1) Item 4 (Task 1) Item 5
CONSENSUS CONSENSUS CONSENSUS CONSENSUS CONSENSUS CONSENSUS
Features of the TASK Features of the TASK Features of the TASK Features of the TASK Features of the TASK Features of the TASK Features of the TASK
Skill focus sentence comprehension, lexis
Task Level (CEFR) A1
Response format Multiple choice gap fill
Items per task 5
Cognitive processing 1 Careful reading: local
Cognitive processing 2 Establishing propositional meaning (cl./sent. level)
Content knowledge 1 (General)
Cultural specificity 1 (Neutral)
Features of the Input Text Features of the Input
Text Features of the Input
Text Features of the Input
Text Features of the Input
Text Features of the Input
Text Features of the Input
Text
Domain Personal
Discourse mode Descriptive
Nature of information Only concrete
Topic Daily life
Text genre Personal letters / e-mail
Presentation Verbal (written)
Features of the Response Features of the
Response Features of the
Response Features of the
Response Features of the
Response Features of the
Response Features of the
Response
Key information Within Sentences Within Sentences Within Sentences Within Sentences Within Sentences
Operation Main idea /
conclusions Main idea / conclusions
Main idea / conclusions
Main idea / conclusions
Main idea / conclusions
Question presentation Verbal (written) Verbal (written) Verbal (written) Verbal (written) Verbal (written)
Option Presentation Verbal (written) Verbal (written) Verbal (written) Verbal (written) Verbal (written)
Aptis – Reading Task 4
Assessment Research Group
Categories Reading APTIS Task 4 CONSENSUS
Features of the TASK Features of the TASK
Skill focus paragraph comprehension, reading for gist, understanding main ideas of longer complex text
Task Level (CEFR) B2 Response format Matching headings to text Items per task 7 Cognitive processing 1 Expeditious reading: global Cognitive processing 2 Building a mental model Content knowledge 2 Cultural specificity 1 (Neutral)
Aptis – Reading Task 4
Assessment Research Group
Features of the Input Text Features of the Input Text
Domain Public
Discourse mode Expository
Nature of information Fairly abstract
Topic Food and drink/Environmental issues
Text genre Magazines
Presentation Verbal (written)
Aptis – Reading Task 4
Assessment Research Group
Features of the
Response
(Task 4) Item 1
(Task 4) Item 2
(Task 4) Item 3
(Task 4) Item 4
(Task 4) Item 5
(Task 4) Item 6
(Task 4) Item 7
Key information
across sentences
across sentences
across sentences
across sentences
across sentences
across sentences
across sentences
Operation
Main idea / conclusions
Main idea / conclusions
Main idea / conclusions
Main idea / conclusions
Main idea / conclusions
Main idea / conclusions
Main idea / conclusions
Question presentation
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Option Presentation
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
VSTEP Reading Task 4
Assessment Research Group
Categories Reading Task 4
CONSENSUS Features of the TASK Features of the TASK
Skill focus identifying main ideas, finer details and implied relationships, understanding longer complex texts
Task Level (CEFR) C1 Response format MCQ Items per task 10 Cognitive processing 1 Careful reading: global
Cognitive processing 2 Creating a text level representation (disc. structure)
Content knowledge 2 Cultural specificity 2
VSTEP Reading Task
Assessment Research Group
Features of the Input Text Features of the Input Text
Domain Public
Discourse mode Expository
Nature of information Fairly abstract
Topic Health & medicine -- social topic/Science and technology
Text genre Newspapers
Presentation Verbal (written)
VSTEP Reading Task
Assessment Research Group
Categories Reading
(Task 4) Item 1
(Task 4) Item 2
(Task 4) Item 3
(Task 4) Item 4
(Task 4) Item 5
(Task 4) Item 6
(Task 4) Item 7
(Task 4) Item 8
(Task 4) Item 9
(Task 4) Item 10
Key information
Within sentences
across sentences
Within sentences
across sentences
across sentences
Within sentences
across sentences
across sentences
across paragraphs
across sentences
Operation
Specific information
Main idea / conclusions
Specific information
Main idea / conclusions
Main idea / conclusions
Main idea / conclusions
Opinion Main idea / conclusions
Test structure / connections between the parts
Opinion
Question presentation
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Option Presentation
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Verbal (written)
Scoring analysis: Aptis CEFR
Assessment Research Group
Scoring analysis: Aptis CEFR
Assessment Research Group
Exact agreement: 62% Adjacent agreement: 38%
VSTEP CEFR
Total Under B1 B1 B2 Aptis
Overall
CEFR
Under B1 5 11 0 16
B1 3 41 7 51
B2 0 15 35 50
C 0 0 13 13
Total 8 67 55 130
VSTEP CEFR
Total Under B1 B1 B2 Aptis
Overall
CEFR
Under B1 5 11 0 16
B1 3 41 7 51
B2 0 15 35 50
C 0 0 13 13
Total 8 67 55 130
VSTEP CEFR
Total Under B1 B1 B2 Aptis
Overall
CEFR
Under B1 5 11 0 16
B1 3 41 7 51
B2 0 15 35 50
C 0 0 13 13
Total 8 67 55 130
Scoring analysis: Aptis CEFR
Assessment Research Group
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of
Sampling Adequacy. .930
Bartlett's Test of
Sphericity
Approx. Chi-Square 956.778
df 36
Sig. .000
Component Matrixa
Component
1 AptisGVScore .921
AptisLScore .829
AptisRScore .816
AptisSScore .881
AptisWScore .853
VSTEPRScore .803
VSTEPLScore .669
VSTEPWScore .870
VSTEPSScore .827
Extraction Method: Principal
Component Analysis.
Scoring analysis: Aptis CEFR
Assessment Research Group
I prefer computer-based writing tests
to paper-and-pencil-based writing
tests.
33% 35%
18%
11%
3%
0%
5%
10%
15%
20%
25%
30%
35%
40%
stronglyagree
agree disagree Stronglydisagree
no selection
33%
41%
14%
10%
2%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
stronglyagree
agree disagree Stronglydisagree
noselection
I prefer face-to-face speaking
tests to machine-recorded
speaking tests
Scoring analysis: Aptis CEFR
Assessment Research Group
I have taken other computer-based
English tests before taking today's
Aptis test.
I often use computers.
75%
24%
1%
0%
10%
20%
30%
40%
50%
60%
70%
80%
No Yes No selection
56%
40%
1% 1% 1%
0%
10%
20%
30%
40%
50%
60%
strongly agree agree disagree stronglydisagree
no selection
Some tentative conclusions
Assessment Research Group
1. The use of mixed methods, including both qualitative
and quantitative provides multiple perspectives and aids
interpretation
2. The socio-cognitive model has provided a coherent
structure for identifying the sources of evidence useful
for creating a detailed picture of the tests
3. The pilot data, on a small but robust scale has
demonstrated that the two tests do measure similar
constructs around general English proficiency.
4. Preliminary results, including statistical analysis gave us
confidence that it will be useful to proceed to the main
data collection
top related