an ipo task difficulty matrix for prototypical tasks for task-based assessment sheila shaoqian luo...

39
An IPO Task Difficulty Matrix for Prototypical Tasks for Task-based Assessment Sheila Shaoqian Luo School of Foreign Languages Beijing Normal university Sept 22 2007

Upload: frederick-miller

Post on 25-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

An IPO Task Difficulty Matrix for Prototypical Tasks for Task-based Assessment

Sheila Shaoqian LuoSchool of Foreign LanguagesBeijing Normal universitySept 22 2007

The presentation structure…

introduction literature the rationale of the research research questions and research methods studies: the evolution of the IPO TD matrix findings issues and suggestion for future research implications

I Introduction:

The Chinese National English Curriculum (CNEC, 2001)

Characteristics: Multidimensional curriculum + Humanistic

approach Focus on ability to use the language Nine levels + Competence-based: Can-do-statements Promoting Task-Based Language Teaching (TBLT) Lists of themes, functions, grammar and vocabulary

Integrated

ability for use

Affect and attitudes

Learning

strategies Cultural awareness

Language skills

Linguistic knowledge

The CNEC GoalsThe CNEC Goals

II Literature: Language competence models

Canale and Swain’s Model: linguistic competence; sociolinguistic competence; discourse competence (Canale, 1983); strategic competence

Bachman’s communicative competence model: (1) Organization: - Grammar; Text; (2) Pragmatics: - illocution; sociolinguistics Skehan’s TBLA model:

(1) to inform task selection (to predict the relative difficulty of each task); (2) to ensure the full range of candidates’ ability will be tapped); (3) to assist test developers in structuring test tasks and the conditions under which these tasks are performed in appropriate ways; (4) to inform development of rating scale descriptors; (5) to facilitate interpretation of test scores (which may differ according to tasks)

Task based Performance and Language Testing: The Skehan Model V (2006)

Underlying Competences

Task

Performance

Score

Rater

Scale/Criteria }

Task characteristics

Task conditions

Interlocutors

Context of testing

Ability for Use

Consequences

Language competence models: Assessment

Canale and Swain’s Model framework play compensatory roles

Bachman’s model: strategic competence plays the central role, which orchestrates knowledge, language, context, assessment, planning, and execution; emphasizes on the search for an underlying “structure-of-abilities”

Skehan’s model: Task is the center in generalizing learners’ ability for use; goes beyond the role of strategic competence and draw into play “generalized processing capacities and the need to engage worthwhile language use (Skehan, 1998a, p. 171).

Issues in language testing

What we give test takers to do Unless tasks have known properties, we will not know if

performance is the result of the candidate or the particular task

Without a knowledge of how tasks vary, we cannot know how broadly we have sampled someone’s language ability, cf. narrowness and therefore ungeneralisabilty

How we decide about candidate ability Obviously underlying competences are important We also need to probe how people activate these

competences, through ability for use Knowledge of this area will enable us to make more

effective context-to-context generalisations and avoid the narrowness of context-bound performance testing

(Skehan, Dec. 2006)

If tasks are a relevant unit for testing, the research problem is to try to systematically “develop more refined measures of task difficulty” .

(Skehan, 1998:80)

The Problem of Difficulty Traditional approaches

Give a series of test items Calculate the pass proportion Rank the items in difficulty (classical, IRT)

Blue Skies Solutions Effects of different tasks on performance areas Do construct validation research Use a range of tasks when testing

(Skehan, Dec. 2006)

A more realistic solution: The present research

Use an analytic scheme to make estimates of task difficulty

Explore whether this analytic scheme can generate agreement between different raters

Explore whether this analytic scheme has a meaningful relationship to (a) performance ratings, and (b) discourse analysis measures

III Research Rationale: Defining the problem

Identification of valid, user-friendly sequencing criteria for tasks and test tasks is a pressing but old problem

Grading task difficulty and sequencing tasks both appear to be arbitrary processes not based on empirical evidence (Long & Crookes, 1992)

The Norris-Brown et al. matrix (1998; 2002; influenced by Skehan (1996) offers one way of characterising test task difficulty, but lacks obvious connection to a Chinese secondary context.

This research… investigates the development and use of a

prototype task difficulty scheme based on current frameworks for assessing task characteristics and difficulty, e.g. Skehan (1998), Norris et al. (1998), and Brown et al (2002).

Hypothesis: There is a systematic relationship between task

difficulty and hypothesized task complexity (see also Elder , 2002)

Weaknesses in previous findings on task difficulty: were of only moderate support for the proposed relationships between the combinations of cognitive factors with particular task types… (Elder et al., 2002)

IV Research questions

How can language ability in TBLT in mainlandChinese middle schools best be assessed?

1. Is the Brown et al. task difficulty framework appropriate to the mainland Chinese school context? If it is not, then what is an alternative framework?

2. Is it possible to have a task difficulty framework that can be generalized from context to context?

3. What are the teachers’ perceptions of task difficulty in a Chinese context?

4. What are the factors that are considered to affect task difficulty in this context?

Underlying abilities: (1) competence-oriented underlying abilities; (2) a structure made up of different interactive and inter-

related components (Canale & Swain, 1980; Bachman, 1990);

(3) different performances drawing upon these underlying abilities (Bachman, 1990);

(4) sampling such underlying abilities in the comprehensive and systematic manner so to provide the basis for generalizing to non-testing situations.

Predicting performance: the way abilities are actually used through tasks (factors may affect performance)

Generalizing from context to context: to characterize features of context in order to identify what in common different contexts are and how knowledge of performance in one area could be the basis for predicting a learner’s performance in another area.

A processing approach: to establish a sampling frame for the range of performance conditions which operate so that generalizations can be made, in a principled way, to a range of processing conditions. (Table (1).doc)

Research Design and Methodology

A hybrid method of quantitative analysis and qualitative analysis in both deductive and inductive ways: matrix studies matrix

deductive inductive(1) A correlational analysis to explore the relationship between

tasks and task difficulty components; and(2) a qualitative analysis of verbal self- reports and focus group

interviews on the factors that affect task difficultyTwo research phases:(1) Phase one: Study One~Study Four (March~May 2004)

Application of the Norris-Brown et al. task difficulty matrix(2) Phase two: Study Five~Study Ten (Oct 2004~2005)

Establishing and evolution of the IPO task difficulty matrix

Summary of research participants

Participants Number Experience Fields

Raters 69 5~25 years Teachers; TEFL material writers;

test writers; curriculum developers

Students 60 Grade 8 2–3/w 40-minute lessons (Grade

3~Grade 6); 5–6/w 45-minute lessons (Grade 7~Grade 9)

Summary of research instruments

Instruments Data type Research Question (RQ) to be Addressed

Task difficulty matrix

• Constructing IPO task difficulty matrix

Holistic vertical line Quantitative data

• Validating IPO task difficulty matrix

Introspective reports Qualitative data

(RQ 1 and RQ 4)

Focus-groupinterview

Qualitative data

• Teachers’ and students’ perceptions of tasks and task difficulty (RQ 3)

Documents: CNEC Quantitative +

Qualitative data

Constructing, validating, & generalizing the IPO task difficulty matrix

Teachers’ and students’ perceptions of tasks and task difficulty

(RQ 1, RQ 2, RQ 3, and RQ 4)

Research studies1. Phase one: Applying Norris et al.’s task difficulty matrix

Study One~Four (March~May 2004)

Applying modified Norris et al. (1998)’s task difficulty matrix (Tables1) to 28 professional and experienced English teachers to investigate its transferability in mainland China

Results:

(1) Impossible to rate task difficulty with pluses and minuses

(2) Among fourteen tasks, + agree - three tasks: Planning the weekend, Shopping in supermarket, Radio weather information. (common general topics in the daily life.)

(3) Tremendous disagreement between the Chinese teachers’ ratings and Norris et al.’s predicted difficulty level (Table2).

Task difficulty matrix for prototypical tasks: ALP (Norris et al., 1998, p. 84)

Component

code cognitive complexity

communicative demand

Characteristic

tasks (by theme)

diff. index range #input sources

in/out inputorganiz. availa

mode response level

Planning the weekend

Highlighting the main idea

Modified Task difficulty matrix

Task Code C Cognit C Comm S Task C

1

6Code C (complexity): linguistic complexity; linguistic input

Cognit C (Cognitive complexity): cognitive familiarity; cognitive processing;

amount of input

Comm S (Stress): time; interaction; context

Task C (conditions): language proficiency; language abilities;

language skills; culture & other

Phase one: Conclusions

The Norris et al (1998) and Brown et al. (2002) matrix unable to be reliably employed

There was a discrepancy on the difficulty levels of tasks between Norris et al. and the Chinese teachers

Agreement with general topics yet much disagreement among more cognitive demanding tasks

Norris et al. tasks might not be appropriate and there might need an alternative framework for predicting task difficulty

Phase Two: Establishing IPO task difficulty matrix (Studies Five~Ten; 2004~2005)

1 The IPO-CFS task difficulty scheme 2 CNEC-theme related tasks (Table3) 24 CNEC (2001) themes

Personal information; Family, friends and people around; Personal environments; Daily routines; School life; Interests and hobbies; Emotions; Interpersonal relationships; Plans and intentions; Festivals, holidays and celebrations; Shopping; Food and drink; Health and fitness; Weather; Entertainment and sports; Travel and transport; Language learning; Nature; The world and the environment; Popular science and modern technology; Topical issues; History and geography; Society; Literature and art

Content Content

Form Form Form

Modality Modality Modality

Support (making input

clearer)

Support (making processing

more efficient)

Input Processing Output

Content

Support (making oral/written

expression more accurate and fluent)

Findings

Study 5: Correlation for the means of both teachers : .65 However, the 2 sets of tasks generated variations in difficulty within one

theme Leading to further research into task characteristics and requirements, and task analysis (Table4)

Study 6 and 7: 24 CNEC tasks (Table5) that vary in difficulty IPO x extended CFMS (Table6) 2 self-reporters + Rater comments: detailed verbal self-report data to

examine mental processes during rating of the tasks and help refining the matrix.

Findings: (1) Encouraging correlations: all but one range from .52 to .83. The

exceptional pair of .34 leads to further data collection from both raters and students for the matrix reliability and validity.

(2) The matrix is improving, but needs input from actual raters; Inseparable Input, Processing, Output

S D L P X Y

S corr 1 .53 .61 .83 .82 .72

D corr 1 .52 .70 .54 .34

L corr 1 .79 .64 .76

P corr 1 .81 .74

X corr 1 .70

Y corr 1

Refining the IPO task difficulty matrixStudies Eight~Ten

Raters: Professionals (10 + 5 + 9) CNEC-theme related tasks (15 + 9) IPO x Information, Language, Performance

conditions, Support (ILPS) (Table8) Inter-rater correlations:

(1) Study Eight correlation range: .69 to .92

(2) Study Nine correlation range: .62 to .91

(3) Study Ten correlation range: .75 to .87.

Fifteen prototypical tasks

theme easy medium difficult

1: Personalinformation

Where does Lindalive?

Applying for asummer club

Li Pei’s bedroom

11: Shopping What a nice bike Shopping list A plan of theshops

12: Food and drink

Put the vegetablesin order:

A quiz: What am I?

CustomerSatisfaction Form

18: Nature Classifying pets What are they? Natural disasters

20: Popular science

The Vux Keep safe fromsharks

Plastics

IPO task difficulty matrix for task-based assessment Table9.doc

Dimensions: INPUT PROCESSING OUTPUTComponent:

I Information: Amount Type: Familiar-unfamiliar; Personal-impersonal; Concrete-

abstract; Unauthentic-authentic Organization: Structured-unstructured

Operations: Retrieval vs. transformation; Reasoning II Language: Level of syntax; Level of vocabulary

III Performance Conditions: Modality; Time pressure

IV Support

Structured -Unstructured

1. Input information or task has a clear and tight organizational structure, e.g. clear narrative with beginning, middle, end. All or most elements of task are clearly connected to one another.

2. Input information or task has organizational structure, but this is fairly loose, so that some connections need to be made by the test-taker.

3. Input information or task is partly organized, with some sections which are structured and organized, but with other areas which need more active integration by the test-taker.

4. Information or task requires test-taker to bring organization to material which isn’t organized. Test taker has to make the links which are necessary for the task to be done, or to organize the material which is involved..

A comparison between Brown et al.’s matrix and the IPO task difficulty matrix

Similarities (5):

Primary research question; Similar purposes; similar design of matrix; an example of an assessment alternative; Sources

Differences (10):

Test Objects; Task Themes; Task Focus; +(-)related to curriculum; Task Selection; Definitions/Labels; Characteristics; Layout; Rating System; Raters

Focus group interview summary

Features Sample tasks

FamiliarPets (Task 7); Feelings (Task 12); Writing an e-mail (Task 6)

UnfamiliarEducation policy and compulsory education (T3)

AuthenticDescribing feelings (Task 12); Replying emails (Task6); Pets (Task 7)

DifficultLength (Task 3)

Easy Replying emails (Task 6); Pets (Task 7)

Amount Task 7; Task 12

VI Implications: IPO-ILPS task difficulty matrix

Tasks and Task-based Assessment(1) Estimating task difficulty: to use learner performances on sampled

tasks to predict future performances on tasks that are constituted by related difficulty components. (Norris et al., 1998:58)

(2) Students with greater levels of underlying ability will be able to successfully complete tasks which come higher on such a scale of difficulty. (Skehan, 1998:184)

Language Teaching and Learning(1) may be useful for syllabus designers to develop and sequence

pedagogic tasks in order of increasing task difficulty: to promote language proficiency and facilitate L2 development, the acquisition of new L2 knowledge, and restructuring of existing L2 representations” (Robinson, 2001, p. 34).

(2) may help language teachers and testers when they make decisions regarding classroom teaching and design, and regarding the task-based assessments appropriate for the testing inferences they must make in their own education settings.

VII Limitations

language assessment does not necessarily need to “subscribe to a single model of language test development and use”: teachers and students may be interested more “in specific aspects of performance more appropriately conceived of as task- or text-related competence” (Brown et al., 2002, p. 116).

the matrix and procedures developed and investigated here is that they were from a cognitive perspective and many other factors are not explored from other perspectives.

the nature of the target language tasks that serve as the basis of the assessment instruments and procedures: task appropriateness in particular learning contexts + locally defined assessment needs.

VIII Issues and suggestions for future research

the IPO task difficulty matrix for TBA: -- to promote the generalizability: more research needed in

different regions in EFL contexts Tasks: Both carefully sampled spoken and written tasks +

calibrated test items for reading and listening. the social practice (McNamara & Roever, 2006) of the task

difficulty matrix More qualitative dimension on judging the difficulty level of a

task would bring the main outcome a qualitative profile, mainly features of the tasks.

Role of strategies in determining the difficulty levels To what extent does the IPO task difficulty matrix provide a basis

for the assessment of various language activities and competences?

IX Conclusions

Tasks are an interesting basis for exploring language teaching (Skehan, 2006a) and language testing.

“We need to find more and find out how to make tasks work more effectively. We don’t know yet how this can be done, but we will never know if we don’t do research” (Skehan, 2006a).

Hopefully, Norris and Brown et al.’s (1998; 2002) studies and the studies attributed in this thesis have provided useful information and instruments that will profitably contribute to this research area of task-based teaching and learning, and assessment.

Acknowledgments

This is a presentation based on my Ph.D. research underthe supervision of Professor Peter Skehan. My great

gratitudegoes to my supervisor, Professor Skehan. I also thank mycommittee members, Professor Jane Jackson and ProfessorDavid Coniam at the Chinese University of Hong Kong, whohave contributed thoughtful and helpful suggestions to thisstudy. My thanks go to the the participants in the research.

Selected References Bachman, L. F. (2002). Some reflections on task-based language performance

assessment. Language Testing, 19(4), 453-476.Brown, J. D., Hudson, T., Norris, J. & Bonk, W. J. (2002). An investigation of second

language task-based performance assessments. Second Language Teaching & Curriculum Center, University of Hawai’i at Manoa.

Coniam, D., & Falvey, P. (1999). Assessor training in a high-stakes test of speaking: the Hong Kong English language benchmarking initiative. Melbourne Papers in Language Testing, 8 (2), 1–19.

del Pilar Garcia Mayo, M. (Ed.). (2007). Investigating tasks in formal language learning. Clevedon: Multilingual Matters.

den Branden, K. V. (Ed.) (2006), Task-based language education: From theory to practice (pp. 1-16). Cambridge: Cambridge University Press.

Elder C., Iwashita N., & McNamara, T. (2002). Estimating the difficulty of oral proficiency tasks: What does the test-taker have to offer? Language Testing, 19,4, 343-368.

Elder, C., Knoch, U., Barkhuizen, G., & von Randow, J. (2005). Individual feedback to enhance rater training: Does it work? Language Assessment Quarterly, 2(3), 175-196.

Ellis, R. (2003). Task-based language learning and teaching. Oxford: Oxford University Press.

Ellis, R., & Barkhuizen, G. (2005). Analyzing learner language. Oxford: Oxford University Press.

Iwashita N., Elder C., & McNamara T. (2001). Can we predict task difficulty in an oral proficiency test? Exploring the potential of an information-processing approach to task design. Language Learning, 51(3), 401-436.

Knoch, U., Read, J., & Von Randow, J. (2006, June). Re-training writing raters online: How does it compare with face-to-face training? Paper presented at the 28th Annual Language Testing Research Colloquium of the International Language Testing Association, University of Melbourne, Australia (June 29, 2006).

Ministry of Education, China (2001). A pilot paper: The national English curriculum standards. Beijing: Beijing Normal University Press.

Norris, J. M., Brown, J. D., Hudson, T. D., & Bonk, W. (2002). Examinee abilities and task difficulty in task-based second language performance assessment. Language Testing, 19(4), 395-418.

Nunan, D. (1993). Task-based syllabus design: Selecting, grading, and sequencing tasks. In G.. V. Crookes & S. M. Gass (Eds.), Tasks in a pedagogical context: Integrating theory and practice (pp. 55-68). Clevedon, Avon: Multilingual Matters.

Nunan, D. (2004). Task-Based Language Teaching. Cambridge: Cambridge University Press.

Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring interactions in a componential framework. Applied Linguistics, 22 (1), 27 – 57.

Skehan, P. (1996). A framework for the implementation of task-based instruction. Applied Linguistics, 17 (1), 38-62.

Skehan, P. (1998). A Cognitive approach to language learning. Oxford: Oxford University Press.

Skehan, P. (1999). The influence of task structure and processing conditions on narrative retellings. Language Learning, 49(1), 93–120.

Skehan, P. (2001). Tasks and language performance assessment. In M. Bygate, P. Skehan & M. Swain (Eds.), Researching pedagogic tasks: Second language learning teaching and testing (pp. 167-185). London: Longman.

Skehan, P. (2003). Task-based instruction. Language Teaching, 36(1), 1-14.Skehan, P., & Foster, P. (1997). Task type and task processing conditions as

influences on foreign language performance. Language Teaching Research, 1(3), 185–211.

Wolfe-Quintero, K., Inagaki, S., & Kim, H. (1998). Second language development in writing: Measures of fluency, accuracy & complexity. Second Language Teaching & Curriculum Center, Honolulu: University of Hawai‘i Press.

The Great Wall starts from where we stand: A long way to go…