minimiza’on and probability distribu’on of dependency ... · (2.532±3.056), wsj2...

Post on 10-Aug-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Minimiza'onandProbabilityDistribu'onofDependencyDistanceintheProcessof

SecondLanguageAcquisi'on

JinghuiOuyang,JingyangJiangDepartmentofLinguis6cs

ZhejiangUniversity

1.  TheBackgroundofthecurrentpaper

2.  Minimiza6onandProbabilityDistribu6onofDependencyDistanceintheProcessofSecondLanguageAcquisi6on

•  1.PartoftheKeyProjectofNa6onalSocialScienceFounda6on

“ResearchontheSyntac6cDevelopmentofChineseEFLLearnersBasedonDependencyTreebank”(2017-2022)

•  2.Wehave collected about 1200 composi6ons, 150thousand words, ranging from grade 4 in primaryschools(11yearsold)tocolleges(11grades).

The Background of the current paper

ImportantConcepts

Dependency Relation

In the syntactic analysis framework of dependency syntax, sentence structure is analyzed using the dependency relations between words in a sentence (Tesnière, 1959; Hudson, 2007, 2010; Nivre, 2006; Liu, 2009). A dependency relation has three core properties: binary, asymmetry, and labeledness.

Dependency Distance

Dependencydistancereferstothe lineardistancebetweentwolinguis6cunits having a syntac6c rela6onship within a sentence (Heringer et al.,1980;Hudson,1995).

Dependency Distance

Mean Dependency Distance (MDD) Themeandependencydistance(MDD)ofasentence:

Themeandependencydistance(MDD)ofatreebank:

Dependencydistancereferstothe lineardistancebetweentwolinguis6cunits having a syntac6c rela6onship within a sentence (Heringer et al.,1980;Hudson,1995).

Dependency Distance Minimization (DDM)

•  The linear distance between two words with syntac6crela6onshipisrestrainedbyhumanworkingmemory.

•  The results of corpus-based research and psychologicalexperimentshaveindicatedthathumanlanguageshaveatendency towards dependency distance minimiza6on(DDM).

•  Dependencydistanceminimiza6onisfoundasauniversalquan6ta6vepropertyofmore than30human languages(Liu,2008,Futrelletal.2015).

Introduc'on

•  The materials of previous studies on DDM are allfromna6vespeakers’firstlanguage,butthereisnosuch study on second language learners’ languagesystem.

•  Second language learners’ language system,defined as “interlanguage”, is a structurallyintermediatestatusbetweenthena6veandtargetlanguages(Selinker1972).

•  Along with the improvement of their secondlanguage proficiency, learners’ language systemgradually develops towards na6ve speakers’language.

•  Does second language learners’ language systemalsodevelopundertheuniversalpressureofDDM?

Tofigureouthowsecondlanguagelearnersobeytheprincipleofdependencydistanceminimiza6onduringtheprocessofsecondlanguageacquisi6on,weinves6gatedthedevelopmentofMDDsin Chinese EFL (English as a Foreign Language) learners’ Englishwri6ngs at different learning phases to answer the first twospecificresearchques6ons:

•  Research Ques'on 1. How does the mean dependencydistance(MDD)intheEnglishcomposi6onswrihenbyChineseEFLlearnersdevelopacrossninegrades?

•  ResearchQues'on2.DoesChineseEFLlearnersdeveloptheirEnglishproficiencyunderthepressureofdependencydistanceminimiza6on?

•  Dependency distance can reflect the comprehensiondifficulty of syntac6c structure (Liu 2008). Dependencydistance minimiza6on is considered as resul6ng fromhumancogni6vemechanism(Liu2008,Luetal.2016)andthe effect of ‘the principle of least effort’ on syntac6cstructure (Zipf 1949). The distribu6on of dependencydistancespresentscertainregularity.

•  Ourpreviousstudy(Ouyang&Jiang2017)foundthattheprobability distribu6on of the dependency distance ofsecond language learners’ interlanguage can well fit theZipf-Alekseevdistribu6onand theparametersa andb intheZipf-Alekseevdistribu6onwellreflectsecondlanguagelearners’languageproficiencyatdifferentlearningstages.

•  Toconfirmthatthepreviousfindings(Ouyang&Jiang2017)arenotsta6s6calartefactsandtofurtherdemonstratetheDDM of interlanguage in the process of second languageacquisi6on (SLA) from the probability of dependencydistances,weconstructedtworandomlanguages(RL1,RL2)using the composi6ons at different learning stages andfihedtheirprobabilityofdependencydistancestodifferentexponen6aldistribu6onandpowerlawdistribu6onmodels,includingtheZipf-Alekseevdistribu6on.

•  Research Ques'on 3. Does the probability distribu6on ofdependency distance of random languages of secondlanguage learners’ wri6ngs well fit the Zipf-Alekseevdistribu6on?Iftheanswerisyes,cantheparametersintheZipf-Alekseev distribu6on well reflect second languagelearners’languageproficiencyatdifferentlearningstages?

Methodology

Participants

Group Number YearsofEnglishLearning

J1(firstgradeofjuniorhighschool) 75 3-4

J2(secondgradeofjuniorhighschool) 61 4-5

J3(thirdgradeofjuniorhighschool) 69 5-6

S1(firstgradeofseniorhighschool) 78 6-7

S2(secondgradeofseniorhighschool) 74 7-8

S3(thirdgradeofseniorhighschool) 79 8-9

U1(firstgradeofuniversity) 40 9-10

U2(secondgradeofuniversity) 28 10-11

P1(FirstgradepostgraduateofEnglishmajor) 26 13-14

· First graders of junior school—First gradepostgraduatesofEnglishmajor· 367 Chinese students from two high schools and one university inZhejiangProvince

Materials

Group Topic Genre Sampledcomposi'ons Wordcount

J1 My Weekend Narrative 60 6375

J2 My Weekend Narrative 60 6375

J3 My Weekend Narrative 44 6417

S1 A(n) Embarrassing/ Surprising/Unforgettable Thing Narrative 44 6307

S2 A(n) Embarrassing/ /Surprising/Unforgettable Thing Narrative 39 6312

S3 A(n) Embarrassing/ Surprising/ Unforgettable Thing Narrative 41 6358

U1 An Interesting/Annoying/ Embarrassing Story Narrative 25 6539

U2 An Interesting/Annoying/ Embarrassing Story Narrative 28 6431

P1 An Interesting/Annoying/ Embarrassing Story Narrative 26 7469

Total Narrative 341 58583

Self-built dependency treebank: 341 English composi6ons wrihenpar6cipantswithintheprescribed6melimitintheclassContras6ve dependency treebank: sub-corpora with about 6500wordsofeachcorpusfromtheWallStreetJournal(WSJ)Corpus

Contents

Procedure

u Data Collection

u Automatic POS and Dependency Relation Annotation

u Establishment of Syntactic Relation and Error Tagging System

u Manual Tagging and Modification

Contents Manual Syntactic Annotation and Modification WordOrder Word POS WordOrder

ofGovernor Governor POSofGovernor

DependencyRela'on

DependencyDistance

1 For IN-case-E 5 think VBP prep 4

2 students NN 1 For IN pobj -1

3 , , 5 think VBP punct 2

4 I PRP 5 think VBP nsubj 1

5 think VBP 5 think VBP root 0

6 we PRP 7 are VBP nsubj 1

7 are VBP 5 think VBP ccomp -2

8 stressed JJ 7 are VBP xcomp -1

9 out RP 8 stressed JJ compound:prt -1

10 . . 5 think VBP punct -5

Modifica6on:1.  Automa6csyntac6cannota6oninconsistentwithoursyntac6c

rela6onsystem2.  Wrongautoma6ctagging3.  Lexicalandgramma6calerrors

ConstructTwoRandomTreebanks

Inthefirstrandomtreebank(RL1),withineachsentencewe select oneword as the root, andthen for every otherwordwe randomly selectanother word in the same sentence as itsgovernor,disregardingsyntaxandmeaning.

In the second random treebank (RL2), whilegovernorsareassignedrandomly,wemakesurethattheresultantdependencytree(graph) isaprojec6veandconnected tree, i.e.,nocrossingarcsareallowedinthegraph.

Contents The Zipf-Alekseev Distribution

Results&Discussions

MDDatDifferentGrades

Junior high school: The MDD of Chinese EFL learners’ English writings increasessigni/icantly (p=0.000) from J1 (1.841) to J2 (2.061), butstays stable (p=0.936>0.05)fromJ2(2.061)toJ3(2.064).

Seniorhighschool: TheMDDofChineseEFLlearner’Englishwritings/irstincreasessigni/icantly (p=0.003) at S1 (2.188), then continues increasing insigni/icantly(p=0.445>0.05)atS2,butexperiencesasigni/icant(p=0.022)decreaseatS3(2.125).

University:theMDDoftheirwritingsincreasessigni/icantly(p=0.000)atPirst,butthenkeepssteady(p=0.782>0.005).

MDDofChinesepostgraduatesofEnglishmajor&contras'vesub-corpora

TheresultsofindependentT-testshowthatthereexistsigni/icantdifferences(t(12490)=-1.426,p=0.002<0.05; t(12471)=-1.089, p=0.017<0.05; t(12223)=-3.047, p=0.000<0.01; t(12302)=-1.628,p=0.000<0.01) between the dependency distances in English writings by ChinesepostgraduatesofEnglishmajor(2.461±2.614)andthoseinfourcontrastivesub-corpora:WSJ1(2.532±3.056),WSJ2(2.516±2.952),WSJ3(2.625±3.296)andWSJ4(2.545±3.097)

TheMDDofhigh-levelChineseEFL learners(postgraduateofEnglishmajor)hasn’treachedthelevelofEnglishnativespeakers.

MDDofinterlanguageandrandomlanguages

ThetworandomlanguageshavemuchgreaterMDDsthannaturallanguage(NL)ofChineselearners.Ofthetworandomlanguages,RL2hasasmallerMDDthanRL1.

Thedistribu'onofdependencydistancesofNL(Ouyang&Jiang,2017)

Theprobabilitydistribu6onofdependencydistance of second language learners’interlanguage well fits the Zipf-Alekseevdistribu6onandtheparametersaandbinthe Zipf-Alekseev distribu6on well reflectsecond language learners’ languageproficiencyatdifferentlearningstages.

Thedistribu'onofdependencydistancesofRL1

Thirteendistribu6oncurvesareall concave down. But thefiqng resultsofRL1 show thatthe dependency distances ofthirteen groups of RL1 cannotwell fit one same probabilitydistribu6on.

Thedistribu'onofdependencydistancesofRL2

The thirteen distribu6on curves ofRL2areallconcavedown.The fiqng results show that thedependency distances of thirteengroups of RL2 can fit the followingprobability distribu6ons: Righttruncated modified Zipf-Alekseev(a, b; n=x-max, α fixed), Nega6vebinomial (k, p), Right truncatednega6ve binomial (k, p; R=x-max),Mixednega6vebinomial (k,p1,p2,α), InversePolya(a,k,p),Extendedposi6ve nega6ve binomial (k, p; αfixed),Mixedgeometric (q1,q2,α),and Mixed geometric-logarithmic(q,β,α).

Althoughthedistribu6onofdependencydistancesofRL2canwellfittheRighttruncated modified Zipf-Alekseev distribu6on, the parameters have nocorrela6onwiththegrades.

Nocorrela'onbetweentheparameteraandthegrades(R2=0.350,p>0.05),nocorrela6onbetweentheparameterbandthegrades(R2=0.119,p>0.05)andnocorrela6onbetweentheparameterαandthegrades(R2=0.074,p>0.05).

The varia'ons of parameters (a, b, α) of the Right truncatedmodifiedZipf-Alekseevfi^ngthedependencydistancesofRL2

Conclusions&implica'ons

•  The MDD of Chinese EFL learners’ English wri6ngs increasessignificantlyacrossninegrades.

•  The MDD of high-level Chinese EFL learners (postgraduate ofEnglishmajor)doesn’treachthelevelofEnglishna6vespeakers.

•  TheMDDofChineseEFLlearners’Englishwri6ngsremainstableattheuniversitylevel.(fossiliza6on&limitofworkingmemoryload)

•  TheMDDsof learners’ interlanguage at different learningphasesaresignificantlylowerthantheircorrespondingrandomlanguages(RL1 and RL2). This indicates that Chinese EFL learners developtheirEnglishproficiencyunderthepressureofDDM.

•  RL2hasalowerMDDthanRL1,andnatural languagehasalowerMDDthanRL2,whichsuggeststhatsyntaxalsoplaysakeyroleinminimizing the MDD of second language learners’ interlanguagesystem.

•  The distribu6on of dependency distances of RL1 of secondlanguage learners’ wri6ngs cannot fit any exponen6aldistribu6on or power law distribu6on models. However, thedistribu6on of dependency distances of RL2 and naturallanguagecanwellfittheZipf-Alekseevdistribu6on.Projec6vityisthebackgroundmechanismthatcausesthisphenomenon.

•  The parameters in the Zipf-Alekseev distribu6on of RL2 haveno correla6on with second language learners’ languageproficiency. This can be explained by syntax. Compared withnatural language, though RL2 is projec6ve as naturallanguages,itdoesn’tobeysyntac6crules.

•  The current study corroborates that DDM is a languageuniversalnotonlypresentintheuseoffirstlanguage,butalsoin the use of second language. This helps clarify therela6onship between human cogni6on and second language.There is also a threshold that theMDDs of second languagedon’texceedanditiswithinworkingmemorycapacity.Studieson the dependency distances in rela6on to the cogni6vedemandsonhumancogni6vesystemwillremaintheresearchfocus for linguists or scholars in the field of cogni6on andpsycholinguis6cs.

ThankYou!

top related