schema theorem in language acquisition a rags to riches story boot-la, indiana university, april 23,...

Schema Theorem in Language Acquisition

A Rags to Riches Story

BOOT-LA, Indiana University, April 23, 2003


Poverty of the Stimulus

“The poverty-of-the-stimulus argument, otherwise known as Plato’s Problem, claims that the nature of language knowledge is such that it could not have been acquired from the actual samples of language available to the human child.” Cook & Newson(1996:86)



What counts as evidence?• positive evidence requirement: no correction,

explanation etc.• occurrence requirement: must occur in normal

language situations• uniformity requirement: must be available to

all children regardless of culture, class, language

• take-up requirement: must be used by children



Rational Steps for Inclusion in UG/LADA. A native speaker of a particular language knows

a particular aspect of syntax. Ex. structure-dependency, Binding Principles, etc.

B. This aspect of syntax could not have been acquired from the language input available to children.

C. This aspect of syntax is not learnt from outside.

D. This aspect of syntax is built-in to the mind.

Cook & Newson(1996:86)



A Problem:A. A native speaker of a particular language knows

a particular aspect of syntax. Ex. structure-dependency, Binding Principles, etc.

B. This aspect of syntax could not have been acquired from the language input available to children.

C. This aspect of syntax is not learnt from outside.

D. This aspect of syntax is built-in to the mind.



• “Step B” is in practice assumed, and rarely rigorously demonstrated

• increasingly we find existence proofs of acquisition tasks previously believed impossible via statistical, data-driven methods (ex. Chalmers, 1990; Elman, 1995)



Faulty “Step B” Reasoning:a) Helen said that Janei voted for herselfi.

b)*Heleni said that Jane voted for herselfi.

Cook & Newson (1996:84)

• “no context could let them unerringly distinguish the binding of anaphors and of pronominals.”

• implicitly assumes that at this point, the only utterances / experience the child has access to are these two possible interpretations

• in fact, by the time children produce / understand sentences of this level of complexity, they’ve had extensive experience producing and interpreting anaphors and pronominals (O’Grady, 1997)

• moreover, from the outset children show a bias towards binding to the nearest antecedent – they have the most trouble with sentences like:

*Helen said that Janei voted for heri.



Faulty “Step B” Reasoning:a) It is likely that John will be delayed.

b) It is probable that John will be delayed.

c) John is likely to be delayed.

d)*John is probable to be delayed.

O’Grady (1997:246)

• common argument against analogy as a learning method• denies analogy based on anything but these specific cases – by the time a

child produces / understands sentences such as these, they already have extensive linguistic knowledge that would preclude such naive analogies

• Other studies have shown analogy can be a useful technique for the acquisition of categories and grammatical structure (McLennan, ms.; Tomasello, 2000 for example)


What to do?

• Simply denying UG doesn’t solve our problem since traditional linguists’ intuitions about the input remain unchanged and lead us back to the same conclusions

• Genetic Algorithms seem to have a similar problem – they look more efficient than they possibly could be – similar sense of “getting something for nothing”


Genetic Algorithms

• problem solving technique which is capable of assessing an extremely large and complicated problem space on the basis of a restricted “impoverished” input set

• Three primary elements:1. a population of “chromosomes” (bit string)

2. a fitness function (judges “goodness”)

3. mating and procreation(Holland, 1975; Mitchell, 1996)


Genetic Algorithms

• from purely random beginnings a solution emerges very quickly – even for optimizations that can’t be performed by traditional serial computational methods


Genetic Algorithms

• Schema Theorem: explanation of how GAs work

101

is an instantiation of the categories (schemata):

{***, 1**, *0*, **1, 10*, 1*1, *01, 101}(of a possible 27)

1**

is a category representation of

{100, 101, 110, 111, (1*1, 1*0, 11*, 10*)}


Genetic Algorithms

• If “101” is judged as being 75% fit, it simultaneously guestimates {***, 1**, *0*, **1, 10*, 1*1, *01, 101} as being 75% fit

• Given a population with multiple instantiations, implicit calculation of category fitness becomes more accurate

• Fuzzy judgments are still useful• Selection, biased by fitness, selects not for

highly fit individuals but (implicitly) highly fit categories by targeting highly fit individuals


Genetic Algorithms

the profound insight:

GAs make use of category information without explicit category definitions, explicit

biases, or explicit reference to category information. It implicitly acts on categories

through category instantiations


Genetic Algorithms

• taken in this light it is easier to see how GAs skip a great deal of the computational load through implicit parallelism

• Critical characteristics• use a population of tokens (parallelism)• a selection process that targets / discovers salient /

relevant dimensions of substructure within those tokens


Wealth of the Stimulus


Schema Theorem

tokens

evaluation

outcome

GAs

chromosomes

fitness function

optimal solution

Acquisition

experience

learning

grammar



Experiences• entire sensory experiences that include linguistic

stimuli• importantly, all sensory information impacts

memory and is available to be correlated• infants are exquisitely sensitive to detailed and

correlated sensory information – at least until they learn what to ignore (Rovee-Collier, 1991)

• “population” because stored distributed within the same neural structures – continuous, not digital



Learning• in most basic neural sense – continuous,

correlative, passive• reduces “sensory noise” – reinforces correlated

multimodal sensory experience• a type of “selection” process because salient

dimensions emerge through the process



Grammar• Schematic / analogical (following Tomasello,

2000; Hofstadter; and usage based models)• More subtle correlations, or higher level

correlations will take more time to be distinguished from “noise” – results in a course of development

• Acquisitional prerequisites may exist, but it’s a mistake to believe that relevant information isn’t being collected long before certain phenomena appear – all input has a physiological impact



Traditional Progression1. infants attend to phonetic

features

2. phonetic features allow access to phonological system

3. access to phonology allows access to words and short phrases

4. access to words gives access to syntax

• matches the observed developmental increase in grammatical complexity

• input is only informative to the linguistic module acquired at each stage

• linguistic evidence sets innate parameters

• serial, computationally expensive (thus UG)



Schema Theorem Based Progression

1. Every utterance an infant hears provides a tiny bit of information about the phonetics, phonotactics, phonology, morphology, word categories, syntax, tense and aspect system, pragmatics, semantic categories, diexis, references – every aspect of their language

• will also match the observed developmental increase in grammatical complexity

• input is informative to every aspect of language even though its contribution may not clearly surface or be attended to immediately

• parallel, computationally efficient, flexible, adaptable

• in line with what’s going on in other fields


Conclusion

A population of tokens implicitly carries exponentially more information about the set than the tokens themselves represent. Parallel systems (of which GAs and the brain are examples) that act on that population can make use of category information that is not explicitly stated. Formal systems cannot.

Without changing our observations of the input, development, or the outcome, by taking a more biologically plausible perspective on the information processing going on, we can see that the linguistic environment is far richer than impoverished

schema theorem in language acquisition a rags to riches story boot-la, indiana university, april 23,...

Documents

language acquisition

language acquisition

particular language

language input available

language takeup requirement

acquisition of categories

nature of language knowledge

particular aspect of