explicit causal connections between the acquisition of ... · evidence from dynamical systems...

Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning, pages 73–81,Berlin, Germany, August 11, 2016. c©2016 Association for Computational Linguistics

Explicit Causal Connections between the Acquisition of Linguistic Tiers:Evidence from Dynamical Systems Modeling

Daniel Spokoyny and Jeremy IrvinCollege of Creative Studies

University of California, Santa BarbaraSanta Barbara, CA 93106, USA

[email protected]@umail.ucsb.edu

Fermín Moscoso del Prado MartínDepartment of Linguistics

University of California, Santa BarbaraSanta Barbara, CA 93106, USA

[email protected]

Abstract

In this study, we model the causal links be-tween the complexities of different macro-scopic aspects of child language. We con-sider pairs of sequences of measurementsof the quantity and diversity of the lexi-cal and grammatical properties. Each pairof sequences is taken as the trajectory of ahigh-dimensional dynamical system, someof whose dimensions are unknown. Weuse Multispatial Convergent Cross Map-ping to ascertain the directions of causal-ity between the pairs of sequences. Ourresults provide support for the hypothesisthat children learn grammar through dis-tributional learning, where the generaliza-tion of smaller structures enables general-izations at higher levels, consistent withthe proposals of construction-based ap-proaches to language.

1 Introduction

A crucial question in language acquisition con-cerns how (or, according to some, whether) chil-dren learn the grammars of their native languages.Some researchers, mainly coming from the gener-ative tradition, argue that, although the grammati-cal rules are possibly ‘innate’ (e.g., Pinker, 1994),children still need to learn how to map the dif-ferent semantic/grammatical roles onto the differ-ent options offered by Universal Grammar (e.g.,‘parameter-setting’). The evidence, however, doesnot seem to support this hypothesis. For instance,Bowerman (1990) notes that the type of seman-tic aspects learned by the child do not match wellinto the prototypical roles that would be requiredto map into hard linguistic rules (e.g., learning anAGENT category to map onto the SUBJECT syntac-tic role). Other researchers (e.g., Goldberg, 2003;

Tomasello, 1992; Tomasello, 2005) propose thatthere is a gradual increase in the generality of thestructures learned by the child, which are slowlyacquired through distributional learning. Such apicture is strongly supported by the remarkably lit-tle creativity exhibited by children, most of whoseutterances are often literal repetitions of those thatthey have previously heard (Lieven et al., 1997;Pine and Lieven, 1993), with little or no general-ization in the early stages. It appears as thoughchildren progressively and conservatively increasethe level at which they generalize linguistic con-structions, building from the words upwards, inwhat some have termed ‘lexically-based positionalanalysis’ (Lieven et al., 1997).

The Theory of Dynamical Systems offers pow-erful tools for modeling human development (e.g,Smith and Thelen, 2003; van Geert, 1991). It pro-vides a mathematical framework for implementingthe principle that development involves the mutualand continuous interaction of multiple levels ofthe developing system, which simultaneously un-fold over many time-scales. Typically, a dynam-ical system is described by a system of coupleddifferential equations governing the temporal evo-lution of multiple parts of the system and their in-terrelations. One difficulty that arises when tryingto model a dynamical system as complex as thedevelopment of language is that many factors thatare important for the evolution of the system mightnot be available or might not be easily measurableor –even worse– there are additional variables rel-evant for the system of which the modeler is noteven aware. In this respect, a crucial developmentwas the discovery that, in a deterministic coupleddynamical system –even in the presence of noise–the dynamics of the whole system can be satisfac-torily recovered using measurements of a single ofthe system’s variables (Takens’ Embedding Theo-rem; Takens, 1981).

73

The finding above opens an interesting avenuefor understanding the processes involved in lan-guage acquisition. In the same way that systemsof differential equations can be used to model theevolution of ecosystems (e.g., predator-prey sys-tems), one could take measurements of the de-tailed properties of child language, and build adetailed system of equations capturing the macro-scopic dynamics of the process. However, in orderto achieve this, it is necessary to ascertain the waysin which different measured variables in the sys-tem affect each other. This problem goes beyondestimating correlations (as could be obtained, forinstance, using regression models), as one needs todetect asymmetrical causal relations between thevariables of interest, so that these causal influencescan be incorporated into the models.

In this study, we investigate the causal relationsdifferent macroscopic-level measures characteriz-ing the level of development of different tiers childlanguage (i.e., number of words produced, lexicaldiversity, inflectional diversity and mean length ofutterances), using the longitudinal data providedin the Manchester Corpus (Theakston et al., 2001).In order to detect causal relations between the dif-ferent measures, we make use of state space re-construction relying on Takens (1981)’s Embed-ding Theorem, and recently developed techniquesfor assessing the strength of causal relations in dy-namical systems (Multispatial Convergent CrossMapping; Clark et al., 2015). Here, we providea detailed picture of the causal connections be-tween the development of different aspects of achild’s language (while acquiring English). Ourresult provide support for theories that advocatedistributional learning of linguistic constructionsby gradual generalizations from the level of wordsto larger scale constructions.

2 Causality Detection in DynamicalSystems

Whenever two variables are correlated, there mustexist some causal link between them. Namely, ifvariables A and B are found to be correlated, thenone of four possibilities must be true: (a) A causesB, (b) B causes A, (c) A and B form a feedbackloop, each causing the other, or (d) there is a thirdvariable C causing both A and B. For studying theinteractions of species within ecosystems, Sugi-hara et al. (2012) introduced Convergent CrossMapping (CCM), a causality-detection technique

that is valid for non-separable systems, is capa-ble of identifying weakly coupled variables evenin the presence of noise, and –crucially– can dis-tinguish direct causal relations between variablesfrom effects of shared driving variables (i.e., inpossibility (d) above, CCM would not find causal-ity).

Figure 1: Reconstructed manifold for Lorenz’ssystem (M ; top), as well as the shadow manifoldsreconstructed considering only X (MX ; bottom-left) and Y (MY ; bottom-right) (reprinted withpermission from Sugihara et al., 2012).

For instance, consider E. Lorenz’s often stud-ied dynamical system, which includes three cou-pled variables X(t), Y (t), and Z(t) whose co-evolution is described by the system of differentialequations

dXdt

= σ(Y −X)

dYdt

= X(ρ− Z)− YdZdt

= XY − βZ

. (1)

The first equation in this system indicates thatthere is a relation by which Y causes X , as thechange in X (i.e., its future value) depends onthe value of Y (i.e., the future of X depends onthe past of Y even after the past of X itself hasbeen considered), a causal relation whose strengthis indexed by parameter σ. The manifold definedby these three variables (Lorenz’s famous strangeattractor), which we can denote by M , is plot-ted in the top of Fig. 1. In many circumstances,however, not all variables of the system are avail-able (some might be difficult to measure, or wemight not even be aware of their relevance). Itis at this point that Takens (1981)’s EmbeddingTheorem comes into play. Informally speaking,

74

the theorem states that the properties of a coupleddynamical system’s attractor can be recovered us-ing only measurements from a single one of itsvariables. This is achieved by considering mul-tiple versions of the same variable lagged in time,that is, instead of plotting (X[t], Y [t], Z[t]), whenonly measurements ofX are available, we can plot(X[t], X[t+ τ ], . . . , X[t+ (E − 1)τ ]). These re-constructed manifolds are termed “shadow” man-ifolds. MX denotes the shadow manifold of Mreconstructed on the basis of X alone. There arewell-studied techniques for finding the appropri-ate values for the parameters for the lag τ andthe number of dimensions E (c.f., Abarbanel etal., 1993) so that the properties of the originalmanifold M are recovered by the shadow mani-fold MX . Fig. 1 illustrates this point by plottingthe shadow manifolds MX (bottom-left) and MY

(bottom-right) for the Lorenz system. Notice howboth shadow manifolds recover much of the origi-nal’s structure, using only knowledge of one of itsthree variables.

Each point in the original manifold M mapsonto points in its shadow manifolds, as is illus-trated by the points labelled m(t), x(t), and y(t)in Fig. 1. The preservation of the topologicalproperties of the original manifold in its shadowmanifolds entails that points that are close-by inthe original manifold will also be close-by in itsshadow versions. This implies that, for causallylinked variables within the same dynamical sys-tem, the state of one variable can identify the statesof the others. Sugihara et al. (2012) noticed that,when one variable X stochastically drives anothervariable Y , information about the states of X canbe recovered from Y , but not vice-versa. This isthe basic insight of the CCM method. To test forcausality from X to Y , CCM looks for the signa-ture of X in Y ’s time series by seeing whether thetime indices of nearby points on MY can be usedto identify nearby points on MX . Crucially, in or-der to distinguish causation from mere correlation,CCM requires convergence, that is, that cross-mapped estimates improve in estimation accuracywith the sample size (i.e., “library size”) used forreconstructing the manifolds. As the library sizeincreases, the trajectories defining the manifoldsfill in, resulting in closer nearest neighbors anddeclining estimation error, which is reflected in ahigher correlation coefficient between the points inthe neighborhoods of the shadow manifolds. Con-

vergence then becomes the necessary conditionfor inferring causation. Using both artificial sys-tems and ecological time-series with known dy-namics, Sugihara and his colleagues demonstratedthat this technique successfully recovers true di-rectional causal relations when these are present,and –crucially– is able to discard spurious causa-tion in the case when both variables are causallydriven by a third, unknown, variable, but there isno true direct causation between them.

An inconvenience of CCM, and in general oftechniques that rely on manifold reconstruction,is that they generally require that relatively longtime-series of the behavior of the system are avail-able. Such long series are, however, very diffi-cult, if not impossible, to obtain in many fields,including of course language acquisition. One canhowever obtain multiple short time series from dif-ferent instances of a similar dynamical system.In ecology, for instance, one can obtain short se-quences of measurements of the population den-sities of a group of species measured at differ-ent places and times. In language acquisition, wemight have multiple, relatively short longitudinalsequences of measurements from different chil-dren. With this in mind Clark et al. (2015) devel-oped Multispatial CCM (mCCM), an extension ofCCM able to infer causal relations from multipleshort time-series measured at different sites, mak-ing use of dewdrop regression (Hsieh et al., 2008)to take the additional heterogeneity into account.

3 Materials and Methods

3.1 Materials

We obtained from the CHILDES database(MacWhinney, 2000) the transcriptions containedin the Manchester Corpus (Theakston et al., 2001).This corpus contains annotated transcripts of au-dio recordings from a longitudinal study of 12British English-speaking children (6 girls and 6boys) between the ages of approximately two andthree years. The children were recorded at theirhomes for an hour while they engaged in normalplay activities with their mothers. Each child wasrecorded on two separate occasions in every three-week period for one year. Each recording ses-sion is divided into two half-hour periods. Theannotations include the lemmatized form of thewords produced by the children (incomplete wordsand small word-internal errors were manually cor-rected in the lemmatization).

75

In order to increase the sample size in each pe-riod, we followed a sliding window technique of(Irvin et al., in press): We computed measures forthe samples contained in overlapping windows ofthree consecutive corpus files. In this way, at eachpoint we obtained samples originating from twofiles from the same recording session, and a filefrom either the previous or the next recording ses-sion.

3.2 Measures of Linguistic Development

As in previous studies (Irvin et al., in press;Moscoso del Prado Martín, in press), in order tomeasure the overall amount of speech produced byeach child, we counted the total number of wordtokens produced by each child in each temporalwindow. We refer to this measure as the child’sloquacity.

In order to measure the diversity of the wordsused by the children, we use the lexical diversitymeasure (Irvin et al., in press; Moscoso del PradoMartín, in press). This is just the information en-tropy (Shannon, 1948) of the probability distribu-tion of word lemmas found in the sample,

H[L] =∑`∈L

p(`) · log1

p(`), (2)

where L refers to the set of word lemmas foundin a sample, and p(`) is the probability with whichthe particular lemma ` is found in that sample. En-tropy estimates obtained using Eq. 2 using maxi-mum likelihood estimates of the probabilities areknown to be strongly biased (Miller, 1955), withthe bias magnitude correlating with the size ofthe sample used. Importantly, the sample size isnothing else than the loquacity measure describedabove. Therefore, using this plain maximum like-lihood method would result in spurious correla-tions. For this reason, Moscoso del Prado Martín(in press) recommends using the bias-adjusted en-tropy estimator (Chao et al., 2013, see AppendixA) instead of Eq. 2.

In order to measure the acquisition of inflec-tional morphological paradigms, we make useof the inflectional diversity measure (Moscosodel Prado Martín, in press). This is amacroscopic generalization of inflectional entropy(Moscoso del Prado Martín et al., 2004), a mea-sure that is known to index morphological in-fluences on adult lexical processing (Baayen andMoscoso del Prado Martín, 2005; Moscoso del

Prado Martín et al., 2004) as well as in child lan-guage acquisition (Stoll et al., 2012). The inflec-tional entropy of a lemma ` (H[W |`]) is the in-formation entropy of the inflected variants of thatlemma. Our inflectional diversity is just the av-erage value of inflectional entropy across all lem-mas,

H[W |L] = H[W,L]−H[L], (3)

where H[L] is the lexical diversity measure de-scribed above, and H[W,L] is the joint entropybetween the inflected word forms and their corre-sponding lemmas,

H[W,L] =∑`∈L

∑w∈W

p(w, `) · log1

p(w, `), (4)

where L denotes the set of all distinct lemmas en-countered in the sample,W is the set of all distinctinflected word forms encountered, and p(w, `) isthe joint probability with which lemma ` occursas the specific inflected form w. Inflectional di-versity takes non-negative values, measuring howlarge are the average inflectional paradigms usedin the language sample. Estimating H[W,L] us-ing Eq. 4 is subject to the same estimation biasesthat were described for lexical diversity. There-fore, we also follow Moscoso del Prado Martín (inpress) in using the bias-adjusted estimate (Chao etal., 2013, see Appendix A) for this magnitude, andthen combining it with the lexical diversity usingEq. 3 to obtain our inflectional diversity estimates.

Finally, in order to measure the degree of syn-tactic development of the children we used theirmean length of utterances (MLU). Instead of mea-suring MLU in morphemes (Brown, 1973), weused the simpler, but equally accurate measure innumber of words (c.f., Parker and Bronson, 2005).In these ages, MLUs are well known to providean accurate measure of the syntactic richness ofthe utterances produced (Brown, 1973), and infact correlate almost perfectly with explicit mea-surements of grammatical diversity (Moscoso delPrado Martín, in press).

3.3 Reconstruction of Shadow Manifolds

Using the windowing technique, for each child weobtained four time series, one corresponding toeach of the four measures described above: lo-quacity, lexical diversity, inflectional diversity, andMLU. These time series are plotted in Fig. 2

76

(a) (b)

(c) (d)

Figure 2: Evolution of the measures studied as a function of the children’s ages (in days). (a) Evolutionof the loquacity (measured in number of word tokens produced) for each of the twelve children. (b) Evo-lution of the lexical diversity (measured in nats per word) for each of the twelve children. (c) Evolutionof the inflectional diversity (measured in nats per word) for each of the twelve children. (d) Evolution ofthe MLU (in number of words per utterance) for each of the twelve children.

Parameter Loquacity Lexical Inflectional MLUDiversity Diversityτ 3 2 3 3E 3 3 2 4

Table 1: Parameter values used in the reconstruc-tion of the shadow attractors based on each of thefour measures.

In order to ensure that applying the non-lineardynamics techniques on these time series wassensible, the series were checked to ensure thatthey contained non-linear signal not dominated bynoise. This was achieved using a prediction test(Clark et al., 2015): We ensured that, for all fourvariables, the ability to predict future values sig-nificantly decreased as one increases the distancein the future at which the predictions are beingmade. This increasing unpredictability is the hall-mark of non-linear dynamical systems. Therefore,we could safely proceed to reconstruct the shadowattractors.

Following Clark et al. (2015), we reconstructed

the shadow attractors from each of these collec-tions of time series. The optimal time-lags (τ )for constructing the shadow manifolds were es-timated as the first local minimum of the laggedself-information in each of the time series (c.f.,Abarbanel et al., 1993). The optimal embeddingdimensionalities (E) were estimated by optimiz-ing next-step prediction accuracy. The estimateswere not found to differ significantly across chil-dren, and therefore for each measure, we used asingle estimate of (τ, E) for all children. The es-timated optimal parameter values used for the re-construction of each shadow attractor are given inTable 1.

3.4 Detection of Causal Relationships

The presence of directional causal relations wastested for each of the six possible pairs of vari-ables using mCCM. We performed 1,000 boot-strapping iterations for assessing the p-values of

77

the relations.1 Finally, to account for our lackof a priori predictions on the causal directions tobe tested, the p-values were adjusted for multiplecomparisons using the false discovery rate pro-cedure for correlated data (FDR; Benjamini andYekutieli, 2001).

4 Results

As plotted in Fig. 2, the four groups of time-seriesconsidered here exhibit different patterns of devel-opment. On the one hand, the loquacity, inflec-tional diversity, and MLU series show evidence ofa more or less linear increase along the child’s de-velopment, with their values towards the end of thestudied interval being close to what was found fortheir mothers in those same conversations. On theother hand, the lexical diversity measure exhibitsquite constant patterns across all children, withtheir values being pretty much indistinguishablefrom those observed for their mothers. This lat-ter pattern is slighly different in two chilren (Ruthand Nick), who seem to be experiencing their ‘vo-cabulary burst’ later than the rest of the childrendid. In fact, if one examines panel (c) in detail, onesees that the inflectional diversity curves for thesetwo children only begin their linear increases af-ter the children have experienced their vocabularybursts. A similar pattern can be seen in the MLUcurves (panel (d)) for these two particular chil-dren, with syntactic development apparently be-ing delayed by their late vocabulary bursts. Thesetwo patterns suggest that the development of bothgrammatical components of their language (inflec-tional morphology and syntax) depends on hav-ing attained a certain degree of vocabulary rich-ness. However, just examining these curves doesnot provide explicit evidence on whether these hy-pothesized causal relationships are actually reli-able ones or they are just statistical mirages. ThemCCM method addresses such question directly.

Fig. 3 plots the results of mCCM for each pair ofreconstructed shadow manifolds. The curves plothow the correlations between nearest neighborsacross shadow attractors evolve as one considersincreasingly larger library sizes. The p-values re-port whether these correlation values are signifi-cantly increasing (the p-values are obtained by aMonte Carlo method with 1,000 resamplings, andfurther corrected for the twelve comparisons using

1All computations were done using R packagemultispatialCCM (Clark et al., 2015).

the FDR procedure).Using the p-values in Fig. 3 enables the recon-

struction of the network of causal relations de-picted in Fig. 4. In this graph, the causal relationbetween the loquacity and the lexical diversity isconsidered weaker than the rest. The reason forthis is that the comparisons reported here are infact part of a larger study considering many morecomparisons (including many factors of the moth-ers as well), on which we did not have any cleara priori predictions on the relations that would befound. When applying the FDR method on thewhole set of 56 comparisons that we actually con-sidered, the relation plotted by the dashed arrowis in fact not significant. In short, one should nottrust the reliability of that particular relation.

Considering only the fully reliable relations,one finds that, as was suspected from the curves inFig. 2, there is an explicit causal relation betweenthe development of vocabulary richness (i.e., lex-ical diversity) and the acquisition of inflectionalparadigms (i.e., inflectional diversity). The in-crease in lexical diversity indeed causes the de-velopment of inflectional paradigms. In turn, thatthe inflectional paradigms begin to be in place en-ables the child to begin generalizing more syntac-tic relationships (as is reflected by the feedbackloop found between the inflectional diversity andMLU manifolds). Importantly, that the inflectionalparadigms are developed is also strongly coupled(i.e., forms a feedback loop) with the increase thechildren’s overall loquacity; once children begin toget a hold of grammar (inflection and syntax) theyare enabled to speak more, which in turn furtherstheir ability to generalize morphological relationsand –by association– syntactic relations.

5 Discussion

In this study, we have –for the first time– docu-mented the explicit causal relations between dif-ferent tiers of children’s linguistic development.At a macroscopic level, we find strictly causal re-lations between the acquisition of vocabulary, in-flectional paradigms, and syntactic relationships.As schematized in Fig. 4, the development of asufficiently large vocabulary is a crucial triggerfor the successful acquisition of the grammaticalaspects of language, which are in turn necessaryfor children to be able to speak more. These re-sults are consistent with theories advocating theimportance distributional learning for the acqui-

78

Figure 3: Results of mCCM between each pair of variables. The p-values are FDR-corrected consideringthe twelve comparisons reported here.

sition of constructions (Lieven et al., 1997; Pineand Lieven, 1993; Tomasello, 1992; Tomasello,2005). It shows how the level of generality ofthe constructions is progressively increased (Gold-berg, 2003) by the use of ‘lexically-based posi-tional analysis’ (Lieven et al., 1997) to achieveearly grammatical generalizations.

The picture of causal relations observed herecould be put into an informal narrative as follows:The acquisition of sufficient lexical forms enableschildren to generalize their relations into inflec-tional paradigms. When a sufficient command ofthe language’s inflectional morphology has arisen,children are able to begin generalizing syntacticrelations. The presence of these early syntacticdevelopments in turn serves to increase the child’sawareness of the functional roles served by differ-ent paradigm members. From this point, one ob-serves the strong bidirectional coupling betweenthe development of syntax and inflectional mor-phology. An increasing awareness of the func-tional roles of the individual forms within theseparadigms, and noticing the formal relations be-tween them, in turn trigger further generalizationsof the paradigms into inflectional classes (Milinet al., 2009), further increasing the productivity ofthe inflectional morphology system.

This study also stresses the importance ofmacroscopic level linguistic analyses. Whereasmuch research in language acquisition has focusedon the acquisition of specific individual construc-tions (microscopic level) or groups thereof (meso-scopic level), the investigation of the properties ofthe whole lexicon, inflectional and syntactic sys-tems uncovers relations which are difficult to pin-point at the other levels. This fits in well withthe multiscale investigation of language develop-ment proposed from the point of view of the The-ory of Dynamical Systems (van Geert, 1991). In-deed, one can see, at the mesoscopic level, that–also consistent with the distributional learninghypothesis– there is a causal chain by which thedevelopment of single word utterances triggersthe development of two-word utterances, whichin turn trigger three-word utterances, and so forth(Bassano and van Geert, 2007). The macro-scopic analyses provided here complement thatpicture by indicating how that evolution of utter-ance lengths is strongly coupled with the develop-ment (or ‘growth’ in van Geert’s terms) of gram-matical knowledge.

An innovative aspect of the methods we havedeveloped in this paper is that they provide an ex-plicit procedure for testing whether there are ex-

79

Lexical Diversity

p=.01

��Loquacity

p=.012

!!

p=.03133

Inflectional Diversity

p<.001

::

p<.001

aaSyntactic Diversity (MLU)

p=.012

zz

Figure 4: Reconstructed network of causal relations between the different measures of the children’slinguistic performances. The p-values indicated on the causal arrows are FDR-corrected. The dashed-line denotes a relationship that does not survive FDR correction considering a larger set of variables.

plicit causal relations between the developmentof different aspects of language. Here we haveused the methods at a macroscopic level, but itwould be equally possible to apply them to bothmicroscopic- or mesoscopic-level time series. Pre-vious research on dynamical systems on languageacquisition (e.g., Bassano and van Geert, 2007;Steenbeek and van Geert, 2007; van Geert, 1991)relies on proposing different candidate models interms of systems of differential equations, each in-cluding different sets of causal relations and cou-plings between time series. Our methods, us-ing techniques for explicitly testing causal rela-tions borrowed from ecology (a field whose studybears uncanny similarities with the study of hu-man development), complement the curve-fittingby explicitly testing which couplings and causali-ties should be included in the models, thus signif-icantly reducing the model space that needs to beexplored.

ReferencesHenry D. I. Abarbanel, Reggie Brown, John J.

Sidorowich, and Lev Sh. Tsimring. 1993. The anal-ysis of observed chaotic data in physical systems.Reviews of Modern Physics, 65:1331–1392.

R. Harald Baayen and Fermín Moscoso delPrado Martín. 2005. Semantic density andpast-tense formation in three Germanic languages.Language, 81:666–698.

Dominique Bassano and Paul L. C. van Geert. 2007.Modeling continuity and discontinuity in utter-ance length: a quantitative approach to changes,transitions and intraindividual variability in earlygrammatical development. Developmental Science,10:588–512.

Yoav Benjamini and Daniel Yekutieli. 2001. The con-trol of the false discovery rate in multiple testing un-der dependency. Annals of Statistics, 29:1165–1188.

Melissa Bowerman. 1990. Mapping thematic rolesonto syntactic functions: are children helped by in-nate linking rules? Linguistics, 28:1253–1289.

Roger Brown. 1973. A first language: the early stages.Harvard University Press, Cambridge, MA.

Anne Chao, Y. T. Wang, and Lou Jost. 2013. Entropyand the species accumulation curve: a novel entropyestimator via discovery rates of new species. Meth-ods in Ecology and Evolution, 4:1091–1100.

Adam Thomas Clark, Hao Ye, Forest Isbell, Ethan R.Deyle, Jane Cowles, G. David Tilman, and GeorgeSugihara. 2015. Spatial convergent cross mappingto detect causal relationships from short time series.Ecology, 96:1174–1181.

Adele E. Goldberg. 2003. Constructions: a new theo-retical approach to language. TRENDS in CognitiveSciences, 7:219–224.

Chih-hao Hsieh, Christian Anderson, and George Sug-ihara. 2008. Extending nonlinear analysis to shortecological time series. The American Naturalist,171:71–80.

Jeremy Irvin, Daniel Spokoyny, and Fermín Moscosodel Prado Martín. in press. Dynamical systemsmodeling of the child–mother dyad: Causality be-tween child-directed language complexity and lan-guage development. In Anna Papafragou, JohnTrueswell, Dan Grodner, and Dan Mirman, editors,Proceedings of the 38th Annual Conference of theCognitive Science Society. Cognitive Science Soci-ety, Austin, TX.

Elena V. M. Lieven, Julian M. Pine, and Gilliam Bald-win. 1997. Lexically-based learning and earlygrammatical development. Journal of Child Lan-guage, 24:187–219.

80

Brian MacWhinney. 2000. The CHILDES Project:Tools for analyzing talk, volume 2: The database.Lawrence Erlbaum Associates, Mahwah, NJ, 3rdedition.

Petar Milin, Dušica Filipovic Ðurdevic, and FermínMoscoso del Prado Martín. 2009. The simultane-ous effects of inflectional paradigms and classes onlexical recognition: Evidence from Serbian. Journalof Memory and Language, 60:50–64.

George Miller. 1955. Note on the bias of informa-tion estimates. In Henry Quastler, editor, Informa-tion Theory in Psychology: Problems and Methods,pages 95–100. Free Press, Glencoe, IL.

Fermín Moscoso del Prado Martín, Aleksandar Kostic,and R. Harald Baayen. 2004. Putting the bits to-gether: an information theoretical perspective onmorphological processing. Cognition, 94:1–18.

Fermín Moscoso del Prado Martín. in press. Vocabu-lary, grammar, sex, and aging. Cognitive Science.

Matthew D. Parker and Kent Brorson. 2005. A com-parative study between mean length of utterance inmorphemes (MLUm) and mean length of utterancein words (MLUw). First Language, 25:365–376.

Julian M. Pine and Elena V. M. Lieven. 1993. Re-analysing rote-learned phrases: individual differ-ences in the transition to multiword speech. Journalof Child Language, 20:43–60.

Steven Pinker. 1994. The language instinct: How themind creates language. Harper-Collins, New York,NY.

Claude E. Shannon. 1948. A mathematical theory ofcommunication. The Bell System Technical Journal,27:379–423, 623–656.

Linda Smith and Esther Thelen. 2003. Development asa dynamic system. TRENDS in Cognitive Sciences,7:343–348.

Hederien W. Steenbeek and Paul L. C. van Geert. 2007.A theory and dynamic model of dyadic interaction:Concerns, appraisals, and contagiousness in a devel-opmental context. Developmental Review, 27:1–40.

Sabine Stoll, Bathasar Bickel, Elena Lieven, Netra P.Paudyal, Goma Banjade, Toya N. Bhatta, Mar-tin Gaenszle, Judith Pettigrew, Ichchha Purna Rai,Manoj Rai, and Novel Kishore Rai. 2012. Nounsand verbs in Chintang: Children’s usage and sur-rounding adult speech. Journal of Child Language,39:284–321.

George Sugihara, Robert May, Hao Ye, Chih-haoHsieh, Ethan R. Deyle, Michael Fogarty, andStephan Munch. 2012. Detecting causality in com-plex ecosystems. Science, 338:496–500.

Floris Takens. 1981. Detecting strange attractors inturbulence. In D. A. Rand and L.-S. Young, editors,Dynamical Systems and Turbulence, pages 366–381.Springer Verlag, Berlin, Germany.

Anna L. Theakston, Elena V. M. Lieven, Julian M.Pine, and Caroline F. Rowland. 2001. The role ofperformance limitations in the acquisition of verb-argument structure: an alternative account. Journalof Child Language, 28:127–152.

Michael Tomasello. 1992. First Verbs: a Case Study ofEarly Grammatical Development. Cambridge Uni-versity Press, Cambridge, England.

Michael Tomasello. 2005. Beyond formalities: Thecase of language acquisition. The Linguistic Review,22:183–197.

Paul L. C. van Geert. 1991. A dynamic systems modelof cognitive and language growth. PsychologicalReview, 98:3–53.

A Bias-Adjusted Entropy Estimator

The bias-adjusted entropy estimator (Chao et al.,2013) relies on properties of the accumulationcurve of the number of distinct words observed(i.e., the species accumulation curve in the biolog-ical terms of the original paper). The estimator isgiven by

H =∑

1≤Fi≤n−1

[Fi

n

(n−1∑k=Fi

1k

)]

− f1

n(1−A)1−n

[log(A) +

n−1∑r=1

1r(1−A)r

],

where Fi are the word frequencies observed in thesample, n is the number of tokens in the corpus,and

A =

2f2

(n− 1)f1 + 2f2if f2 > 0,

2(n− 1)(f1 − 1) + 2

if f2 = 0, f1 > 0,

1 if f1 = f2 = 0,

.

with f1 and f2 being the number of word types thatwere encountered exactly once or twice respec-tively (i.e., the numbers of hapax legomena anddis legomena). This estimator is demonstrated tobe accurate and unbiased for word frequency dis-tributions (Moscoso del Prado Martín, in press).

81

explicit causal connections between the acquisition of ... · evidence from dynamical systems...

Documents