second-language strategy instruction: where do we go from...

Second-languagestrategyinstruction:Wheredowegofromhere?

LukePlonskyNorthernArizonaUniversity

SituatingStrategyUse(SSU3)

October15,2019

Language

LearnerEXTERNALmechanisms/processes LearnerINTERNALmechanisms/processes

ModelingL2development

?

input

instructionenvironment

medium

Instruct.context

context

motivation

workingmemory

beliefsaboutlanguagelearning

aptitude

anxiety

intelligence

emotions

personality

L2strategiesATI

+/-amount

type

explicit

implicit

SL

FL

SA

at-home

class

lab

immersion

tech

F2F

2

stress relativeclauses

complexity FSs

Butwhichofthesecanweinfluence?

Probably,butverylittleresearch

0.490.48

0.370.36

0.310.26

0.22

0 0.1 0.2 0.3 0.4 0.5 0.6

AptitudeWTC

MotivationAnxiety

StrategiesWM

L2MSS

HowstronglyaretheseconstructsassociatedwithL2proficiency/achievement(asshownviameta-analyses)?

Individual differences (learner-internal variables associated with L2 development)

Meta-analyticcorrelations(r)

Probablynot



Probablynot

Yes!

Li(2016);ElahiShirvan,Khajavy,MacIntyre,&Taherian(inpress);Masgoret&Gardner(2003);Teimouri,Goetze,&Plonsky(2019);Linck,Osthus,Koeth,&Bunting(2014);Al-Hoorie(2018).

StrategyInstruction(L2SI)

Def.Explicittrainingonspecificpracticesortechniquesthatcanbeemployedautonomouslytoimproveone’sL2learningand/oruse(Chen,2007;Ellis&Sinclair,1989;Tudor,1996;Tayloretal.,2006).

(Discussedinover400empirical,theoretical,andreviewarticleandbooks(seehere)

Outline

PartI:Stateofthescience/substantivefindingsofSI-TheWHAT

PartII:Methodologicalissues-TheHOW

PartIII:Recommendations

What do we know about the effects of L2SI? PartI

StrategyInstruction

• Intuitiveappeal• Theoreticalsupport• Strategiccompetence(e.g.,Canale&Swain,1980)• Learner-centeredness(Nunan,1988;Tudor,1996)• Developmentalsequences(i.e.,ratevs.route)• Autonomy/self-regulation/self-management(Gu,2003;Rubin,2005;Tseng,Dörnyei,&Schmitt,2006)

“Teachersshouldnotfocusexclusivelyonthecontentoflearning.Instead,attentionshouldalsobegiventotheprocess.For,tobeself-sufficient,learnersmustknowhowtolearn.”

From,TowardaTheoryofInstruction(Bruner,1966)

CritiquesofStrategyInstruction&SIResearch

•  Poordesign(e.g.,smallsamplesizes,non-randomgroupassignment,exclusionofcomparisongroups)•  Unjustifiedselectionofstrategies•  Uncertaintyoflong-termeffects•  Lackofvalidandreliableinstruments•  Incompletereportingoftreatmentsandresults•  Absenceofacomprehensivetheory(C’monSLAfolks!)•  Cost/benefitratioconcerns“…whatonemustteachstudentsofalanguageisnotstrategy,butlanguage”

(Bialystok,1990,p.147).

(Chamot,2005;Dörnyei,1995;Kellerman,1991;Macaro&Erler,2007;McDonough,1995;Macaro&Cohen,2007;Rees-Miller,1993;Roseetal.,2018)

ReviewingStrategyInstruction(Chamot,2005;Hassanetal.,2005;McDonough,1995)

PositiveeffectsforSI…• Contexts•  secondlanguage,foreignlanguage• middleschool,HS,university•  Children,adults•  beginner,intermediate,advanced•  class,lab

• Treatments•  Strategiestype:cognitive,metacognitive,socioaffective• Numberofstrategies:1-99•  Short-,long-term:1day-1year•  L1,L2;teacher-orresearcher-delivered

• Outcomes•  L2skills:reading,writing,listening,speaking,vocabulary,grammar•  others:autonomy,motivation,strategiesuse,generallanguageability

Negative/mixedeffectsforSI…• Contexts•  secondlanguage,foreignlanguage• middleschool,HS,university•  Children,adults•  beginner,intermediate,advanced•  class,lab

• Treatments•  Strategiestype:cognitive,metacognitive,socioaffective• Numberofstrategies:1-99•  Short-,long-term:1day-1year•  L1,L2;teacher-orresearcher-delivered

• Outcomes•  L2skills:reading,writing,listening,speaking,vocabulary,grammar•  others:autonomy,motivation,strategiesuse,generallanguageability

ReviewingStrategyInstruction(Chamot,2005;Hassanetal.,2005;McDonough,1995)

Plonsky(2019)

Ameta-analysisoftheeffectsofL2SIRQs:1.HoweffectiveisL2strategyinstruction?2.WhatistherelationshipbetweentheeffectivenessofSIanddifferentlearningcontexts,treatments,andoutcomevariables(e.g.,skillareas)?

First, what is meta-analysis?

•  Empiricalapproachtoreviewingliterature• Moresystematicandobjectivethantraditionalreviews• Origin?(“Necessityisthemotherofinvention”)

Assumption:Developingscientific

knowledgeisacumulativeand

corporateenterprise.

First, what is meta-analysis?

• THREEhallmarks(Mizumoto,Plonsky,&Egbert,inpress;Plonsky&Oswald,2015)

1.Exhaustive(vs.selective)searches(sample≈population)• àValiditygeneralizability

2.Systematiccodingforsubstantivefeaturesandeffects(vs.subjectivelyoridiosyncraticallyinterpreted)

3.Keycomponent:effectsizes(e.g.,d,r)moreprecise,stable,intuitive,andinformative(vs.p)

•  Consequently…

Assumption:Developingscientific

knowledgeisacumulativeand

corporateenterprise.

Meta-analyses provide stable, trustworthy answers!

• Q:Doestextualenhancementwork?• A:YES,buttheeffectsarefairlysmall(d=.22);andithelpsforgrammarlearningbutmightimpedetextcomprehension(Lee&Huang,2008;K=20)

• Q:Iscomputer-basedfeedbackhelpful?• A:Yes!Justashelpfulormoresothanface-to-facefeedback(Ziegler,2013;K=14).

• Q:Isithelpfultoprovidestudentswithfeedbackwhentheymakeerrorsinclass?• A:YES,butitdependsonwhattypeoffeedbackyouprovide

• WhataboutSI?

Lyster&Saito(2010)

Ameta-analysisoftheeffectsofL2SI(Plonsky,2019)

RQs:1.HoweffectiveisL2strategyinstruction?2.WhatistherelationshipbetweentheeffectivenessofSIanddifferentlearningcontexts,treatments,andoutcomevariables(e.g.,thefourskills)?

Method–Inclusioncriteria• ParticipantslearninganL2• TreatmentthatincludedL2strategyinstruction• Datacollectedandcomparedinacontrol-experimental(betweengroups)design• DV=quantitativemeasureoftheeffectofSI• Sufficientdatareportedtocalculateaneffectsize(Cohen’sd)

Method-Sample• 77primarystudiesoftheeffectivenessofSI• 112uniquesamples/treatmentgroups• 7,890individualparticipants

Method–datacollectionandanalysis

• Codedfor…(a)substantiveand(b)methodologicalfeaturesaswellas(c)estimatesoftreatmenteffects(Cohen’sd).• Analysis• RQ1:Weightedaverageoverall• RQ2:Weightedaverageforsubgroupscreatedaccordingtostudyfeatures(i.e.,potentialmoderators)

Results:RQ1•  Overalleffectsize:d=0.66[.62,.69]

•  Whatdoesthismean?•  RelativetoSLA:“medium”

EffectSize

Small-ish25th

percentile

Medium-ish50th

percentile

Large-ish75th

percentiled .40 .70 1.00

K = 346 primary studies and 91 meta-analyses of L2 research (N > 604,000)

(Plonsky & Oswald, 2014)

Results:RQ1•  Overalleffectsize:d=0.66[.62,.69]

•  Whatdoesthismean?•  RelativetoSLA:“medium”•  RelativetoL1SI:d=.45(Hattieetal.,1996)•  Expgroupsscoreonaverage2/3ofanSDabovecontrolgroups•  Approximately3/4ofEGparticipantsoutperformaverageCGparticipants

(Lipseyetal.,2012)•  Additionalandpracticalconsiderationsforinterpretation

•  Teachertraining•  Materialsdevelopment•  Classtime(cost/benefitratio?)•  Potentialforlong-termbenefit?

Results:RQ1àchangeovertime?•  Overalleffectsize:d=0.66[.62,.69]

0.43

0.93

0 0.2 0.4 0.6 0.8 1 1.2

1980-2005(k=70)

2006-2015(k=42)

Effectsize(d)

Twopossibleexplanations-“anotablygreaterstandardizationofinterventionframeworkshasgraduallyemergedinthepastdecade”(Ardashevaetal.,2017)-Methodological(vs.theoretical?)maturity(Plonsky&Gass,2011;Plonsky&Oswald,2014)

RQ2:EffectsofSIAcrossLearningContexts

0.84

0.57

0.77

0.26

0.69

0.55

0.82

0.39

0.74

0 0.2 0.4 0.6 0.8 1

L2(k=13)

FL(k=99)

Primary(k=11)

Secondary(k=28)

University(k=70)

Class(k=78)

Lab(k=32)

Beginner(k=44)

Inter/Adv(k=57)

Context

Institu

tion

Setting

Proficiency

Effectsize(d)

RQ2:EffectsofSIAcrossTreatmentTypes

0.56

1

0.49

0.65

0.86

0.58

0 0.2 0.4 0.6 0.8 1 1.2

Cognitive(k=92)

Metacognitive(k=35)

≤2weeks(k=50)

>2weeks(k=57)

1(k=41)

>1(k=48)

Type

Length

#ofstrategies

Effectsize(d)

1.11

1

0.82

0.63

0.59

0.06

2.07

0.75

0.05

-0.5 0 0.5 1 1.5 2 2.5

Strategyuse(k=10)

Speaking(k=13)

Reading(k=41)

Vocab(k=33)

Writing(k=8)

Listening(k=10)

Pronunciation(k=2)

Grammar(k=4)

General(k=5)

Effectsize(d)

RQ2:EffectsofSIAcrossL2SkillAreas

Ardasheva, Wang, Adesope, & Valentine (2017)

• Meta-analysisoftheeffectsofL2SIon•  RQ1:L2performance•  RQ2:Otherself-regulatedoutcomes(e.g.,anxiety,self-efficacy,attitudes)

•  2008-2014only

•  Sample•  RQ1:39reports(47samples)•  RQ2:16reports(17samples)


OverallResults

0.78

0.87

0 0.2 0.4 0.6 0.8 1

Language(k=43)

SRlearning(k=17)


Results,RQ1(linguisticoutcomes)

0.78

1.23

0.76

0.68

0.62

0.61

0.47

0.13

0 0.2 0.4 0.6 0.8 1 1.2 1.4

All(K=43)

Vocab(k=4)

Reading(k=20)

Listening(k=9)

General(k=2)

Speaking(k=2)

Writing(k=7)

Grammar(k=2)

1.111

0.820.630.590.06

2.070.75

0.05

-1 0 1 2 3

Strategyuse(k=10)Speaking(k=13)Reading(k=41)Vocab(k=33)Writing(k=8)

Listening(k=10)Pronunciation(k=2)

Grammar(k=4)General(k=5)

0.87

1.26

0.98

0.9

0.54

0.27

0 0.2 0.4 0.6 0.8 1 1.2 1.4

All(K=17)

Strategyeffectiveness(k=20

Strategyuse(k=11)

Anxiety(k=1)

Attitudes(k=2)

Self-efficacy(k=2)


Results,RQ2(non-linguisticoutcomes)

Additional meta-analytic evidence for strategies

•  Englishlearning(overall)(ElahiShirvan,2014)

• Reading(Chaury,2015;Maeng,2014;Tayloretal.,2006)

• VocabularylearningstrategiesforEFLlearners(Nematollahietal.,2017)

• Web-basedinstruction(Chang&Lin,2013)

PreliminaryImplicationsandDiscussion

SIcanbeeffectiveinallcontextsandforallskillsbutappearstobestronger:(a)withnon-beginners(“threshold”inChamot,2016;“therichgetricher”?)

(b)withmetacognitivestrategies(c)overlongerperiodsoftime,and(d)fewertargetstrategies(i.e.,lessismore)

BUTAgreatdealoffurtherresearchisstillneededacross…-  Learnerdemographicsandcontexts-  Linguistic(i.e.,skills)andnon-linguisticdomains(e.g.,anxiety)-  Individualstrategies

The HOW (SI Methods) PartII

We have some issues

• Design&Instrumentation(seee.g.,Pawlak,2019;Roseetal.,2018)•  Smallsamples•  Lackofdelayedposttests•  Lackoftheoreticalorempiricaljustificationofstrategiestaught•  Evidenceofreliability(internalconsistency)andvalidityoftenunknownBOTHformeasuresofstrategiesANDL2performance!

“Youcan’tfixwithanalysiswhatyoubungledbydesign”(Lightetal.,1990)Noanalysis—howeversophisticatedorelegant—canmakeupforpoorinstrumentation.

At least we’re not alone?

•  True.Theseproblemsarepervasivethroughoutprettymuchallofappliedlinguistics(andthroughoutthesocialsciences)!

Reliability evidence O(observation)=T(truescore)+E(error)

Reporting of reliability across domains of L2 research

6 7

16 20

28 37 38

41 43 45 46 47

50 59

64 66

0 20 40 60 80 100

Nekrasova & Becker (2009) Mackey & Goo (2007)

Norris & Ortega (2000) Russell & Spada (2006)

Derrick (2016) Brown (2016)

Jeon & Kaya (2006) Plonsky (2011) Ziegler (2013) Plonsky (2013)

Adesope et al. (2010) Lee, Jang, & Plonsky (2015)

Adesope et al. (2011) Plonsky & Kim (2016)

Plonsky & Gass (2011) Liu & Brown (2015)

EffectsofL2practice

WCF

What about the amount of (measurement) ERROR? What’s typical for the field?

• Reliabilitygeneralizationmeta-analysis(RGM)(Plonsky&Derrick,2016)• K=537from16L2journals• 2,244reliabilityestimates

0.82

0.92

0.95

0.7 0.75 0.8 0.85 0.9 0.95 1

Instrument

Interrater

Intrarater

[.74-.89]

[.83-.96]

[.90-.96]

(K=1,323)

(K=861)

(K=40)

25thPercentile75thPercentile

In other domains of L2 research?

• TBLT(K=85;Plonsky&Kim,2016)

rel.= 0.93 0.87 0.86 0.76 N/A N/A

In other domains of L2 research?

• L2pronunciation(K=77;Saito&Plonsky,2019)

What about for L2 strategies?

• Orevenfordifferentcategoriesofstrategies?

• Wereallydon’tknow!

• Whydoesthismatter?• UnreliabilityàErroràThreattovalidity• Attenuationofeffects(signalvs.noise)

Validity Evidence O(observation)=T(truescore)+E(error)

What we say about validity

•  Chapelle(inpress):“validationshouldbeofcentralimportanceforthecredibilityofresearchresults”

•  TQAuthorGuidelines:Authorsshouldprovidea“Descriptionoftheinstruments,whattheyaredesignedtomeasure,andareportoftheirvaliditytotheextentpossible,andtheirreliability.”

•  Ellis(inpress):“Whileresearchershavealwaysrecognizedthisissue[validityinSLAmeasurement],theyhavelargelyignoredit,oftenhappytotalkaboutlearningwithnoconsiderationofthetypeofdatatheyhadcollected”

•  Norris&Ortega(2012):”Problematic…isthetendencytoassume—ratherthanbuildanempiricalcasefor—thevalidityforwhateverassessmentmethodisadopted(pp.574-575).

•  Schmitt(2019):“Mostvocabularytestsarenotvalidatedtoanygreatdegree.”

Whataboutstrategyscales???SeeseminalworksbyCronbach&Meehl;Messick;Kane;Chapelle,Enright,&Jamieson

Are questionnaires to blame? (Great examples of alternatives in Gu’s plenary and Yashima & MacIntyre’s symposium)

•  1.Indirectmeasuresoftheconstructofinterest•  Suggestion:triangulation(e.g.,+observations;+interview)

•  2.Responsesoftenlimitedtowhatisbeingasked•  Suggestions:piloting;open-endeditems;interviews

•  3.Self-selectionbias•  Suggestions:randomorpurposivesampling;missingdataanalysis

•  4.Anonymityà+/-truthfulness?•  Suggestion:triangulation

•  5.Responsevalues?(3,5,9,1,000?)•  Suggestions:piloting;clearinstructions;scaledescriptors

•  6.Quantificationwithoutconsiderationofnumericalvalues•  Suggestion:richqualitativedata;betteruseofstats

•  7.Ambiguous(“double-barreled”)items•  Suggestion:Pilot.Leaveroomforcomments.

Potentialthreatstovalidity

Validity = multifaceted

Construct Predictive

Convergent/concurrentDiscriminant

/Divergent

Face

To what extent does L2 research demonstrate an explicit concern for (different facets of) validity?

“ThereisperhapsanunwrittenagreementthatreaderswillacceptmeasuresusedinanSLAstudyatfacevaluewithoutaskingabouttheirreliabilityandvalidityforthetaskathand.”(Cohen&Macaro,2013,p.133;seeBachman&Cohen,1998).

•  Isthistrueingeneral?• Andforstrategiesresearch?• Doyoueverseevalidityevidence?


• HowcouldweaddressthisQ?• Collectarepresentativesampleofstudies…

•  Syntheticapproach•  Verytime-consuming•  Subjecttohighinferencejudgments

• Corpus-basedapproach•  Fastandobjective•  Valid?


•  SecondLanguageResearchCorpus(L2RC;Plonsky,n.d.)•  22journals•  22,363articles(1946-2018)•  147,293,764words

•  Searchedforoccurrencesof:-[predictive,discriminant,divergent,construct,face,convergent,concurrent]+validity-validityargument

AL,ALL,AP,BLC,CMLR,ELTJ,FLA,IJAL,IRAL,JSLW,LAQ,LA,LL,LL&T,LTeaching,LTR,LTesting,MLJ,SLR,SSLA,System,TQ


Notescale

1.93

0.71

0.54

0.52

0.24

0.19

0.18

0.10

0 2 4 6 8 10

Construct

Face

Preditive

Concurrent

Discriminant

VArgument

Convergent

Divergent

Howmightthestrategiesliteraturecompare???

2inevery100articles

1inevery1,000articles

(Whataboutfalsepositives?Falsenegatives?)

It’s not all bad! • Nakatani(2006):Scaledevelopment/validation

•  DevelopmentoftheOralCommunicationStrategyInventory(OCSI)•  Stage1:Open-endedquestionnaire(N=80)•  Stage2:Pilotedwith400à(exploratory)factoranalysis(itemstructure)à8categoriesforspeakingand7forlisteningstrategies•  Stage3:ComparedwithdatafromSILL(N=62)

• Mizumoto&Takeuchi(2012):Scalevalidation•  Self-regulatingCapacityinVocabularyLearningScale(SRCvoc)•  Study1:N=443àitem-analysis:ITCof>.4;alphaforsubscales•  EFAtoexaminefactorstructure•  Study2:N=914àalphaforsubscales;CFA

• Ardasheva&Tretter(2013):ValidationofmodifiedversionofSILL•  Revisionofitemsandpiloting•  Administeredto1057childlearnersofESL•  CFAà6factorsolution

It’s not all bad!

•  Seealso•  Tragantetal.(2013)•  Ardasheva(2016)•  Teng&Zhang(2016)

Summary for Part II

• Whatweneedisanon-casual,rigorous,andsystematicagendafocusedonmeasurementasitpertainstoL2strategiesandstrategyinstruction.

Looking ahead à L2SI research wish list (see Sudina & Plonsky, in press)

What/substance•  SIacrossallskillareas.Esp:writing,listening,pronunciation,test-taking

• Aptitude-treatmentinteractionswithL2SI(e.g.,withbeliefs,workingmemory;seeYashima,Nishida,&Mizumoto,2017)

•  SIforspecificlearningcontexts:SA,CALL/MALL,EMI/CLIL

•  Teachertraining•  StudiesofteacherbeliefsregardingSIand•  EffectivenessofteachertraininginterventionsforSI

•  Theroleofstrategictransfer(L1àL2;L2àLn)


How/Method:Designs

• Non-”WEIRD”samples:e.g.,SL,pre-adolescent,advancedlearners

• Validityevidence/argumentsfortheutilityofindividualstrategiesßessentialjustificationforL2SIstudiesbutRARELYpresent

• Aclearerunderstandingofthelong-termeffectsofSI

•  “Bigger”andmorelongitudinaldesigns—atthecurricularlevel


How/Method:Measurement

• Validityargumentsformeasuresofboth(a)strategyusage(Takeuchi,2019;Tsengetal.,2006)and-Situated,qualitative,andmixedmethods(Pawlak&Oxford,2018;Roseetal.,2018)-Macro+microperspective(Pawlak,inpress)-Scenario-basedscalesà+contextualization(seeTeimouri,2018)-StudiesofthepredictivevalidityofindividualstrategiesandL2performance(asapre-requisiteforSI)(b)L2performance…alltothembemadeavailableontheIRISdatabaseiris-database.orgà+consistencyacrossstudies!!


How/Method:Datareportandanalyses• Morethoroughreportingof

•  Samplecharacteristics(e.g.,proficiency)•  Treatments(e.g.,length/intensity,materials)•  Data(ESs,CIs,visuals,reliabilitycoefficients)

• Moreinformeduseofquantitativeanalyses(Mizumoto&Plonsky,2015;Nix,2018;Takeuchi,2019)•  E.g.,Raschanalysis;Multivariatemodels;correctionsforattenuationduetomeasurementerror


How/Method:Beyondindividualstudies• Replicationstudies!• Additionalmeta-analysesofSIfocusedonindividualstrategiesorskills(e.g.,vocab,speaking)•  Systematicreviewandmeta-analysisofreliabilitycoefficients(what’snormal?)

Thankyou!LukePlonskylukeplonsky@gmail.comlukeplonsky.wordpress.com

second-language strategy instruction: where do we go from...

Documents