schema theorem in language acquisition a rags to riches story boot-la, indiana university, april 23,...
TRANSCRIPT
Schema Theorem in Language Acquisition
A Rags to Riches Story
BOOT-LA, Indiana University, April 23, 2003
Schema Theorem in Language Acquisition
Schema Theorem in Language Acquisition
Poverty of the Stimulus
“The poverty-of-the-stimulus argument, otherwise known as Plato’s Problem, claims that the nature of language knowledge is such that it could not have been acquired from the actual samples of language available to the human child.” Cook & Newson(1996:86)
Schema Theorem in Language Acquisition
Poverty of the Stimulus
What counts as evidence?• positive evidence requirement: no correction,
explanation etc.• occurrence requirement: must occur in normal
language situations• uniformity requirement: must be available to
all children regardless of culture, class, language
• take-up requirement: must be used by children
Schema Theorem in Language Acquisition
Poverty of the Stimulus
Rational Steps for Inclusion in UG/LADA. A native speaker of a particular language knows
a particular aspect of syntax. Ex. structure-dependency, Binding Principles, etc.
B. This aspect of syntax could not have been acquired from the language input available to children.
C. This aspect of syntax is not learnt from outside.
D. This aspect of syntax is built-in to the mind.
Cook & Newson(1996:86)
Schema Theorem in Language Acquisition
Poverty of the Stimulus
A Problem:A. A native speaker of a particular language knows
a particular aspect of syntax. Ex. structure-dependency, Binding Principles, etc.
B. This aspect of syntax could not have been acquired from the language input available to children.
C. This aspect of syntax is not learnt from outside.
D. This aspect of syntax is built-in to the mind.
Schema Theorem in Language Acquisition
Poverty of the Stimulus
• “Step B” is in practice assumed, and rarely rigorously demonstrated
• increasingly we find existence proofs of acquisition tasks previously believed impossible via statistical, data-driven methods (ex. Chalmers, 1990; Elman, 1995)
Schema Theorem in Language Acquisition
Poverty of the Stimulus
Faulty “Step B” Reasoning:a) Helen said that Janei voted for herselfi.
b)*Heleni said that Jane voted for herselfi.
Cook & Newson (1996:84)
• “no context could let them unerringly distinguish the binding of anaphors and of pronominals.”
• implicitly assumes that at this point, the only utterances / experience the child has access to are these two possible interpretations
• in fact, by the time children produce / understand sentences of this level of complexity, they’ve had extensive experience producing and interpreting anaphors and pronominals (O’Grady, 1997)
• moreover, from the outset children show a bias towards binding to the nearest antecedent – they have the most trouble with sentences like:
*Helen said that Janei voted for heri.
Schema Theorem in Language Acquisition
Poverty of the Stimulus
Faulty “Step B” Reasoning:a) It is likely that John will be delayed.
b) It is probable that John will be delayed.
c) John is likely to be delayed.
d)*John is probable to be delayed.
O’Grady (1997:246)
• common argument against analogy as a learning method• denies analogy based on anything but these specific cases – by the time a
child produces / understands sentences such as these, they already have extensive linguistic knowledge that would preclude such naive analogies
• Other studies have shown analogy can be a useful technique for the acquisition of categories and grammatical structure (McLennan, ms.; Tomasello, 2000 for example)
Schema Theorem in Language Acquisition
What to do?
• Simply denying UG doesn’t solve our problem since traditional linguists’ intuitions about the input remain unchanged and lead us back to the same conclusions
• Genetic Algorithms seem to have a similar problem – they look more efficient than they possibly could be – similar sense of “getting something for nothing”
Schema Theorem in Language Acquisition
Genetic Algorithms
• problem solving technique which is capable of assessing an extremely large and complicated problem space on the basis of a restricted “impoverished” input set
• Three primary elements:1. a population of “chromosomes” (bit string)
2. a fitness function (judges “goodness”)
3. mating and procreation(Holland, 1975; Mitchell, 1996)
Schema Theorem in Language Acquisition
Genetic Algorithms
• from purely random beginnings a solution emerges very quickly – even for optimizations that can’t be performed by traditional serial computational methods
Schema Theorem in Language Acquisition
Genetic Algorithms
• Schema Theorem: explanation of how GAs work
101
is an instantiation of the categories (schemata):
{***, 1**, *0*, **1, 10*, 1*1, *01, 101}(of a possible 27)
1**
is a category representation of
{100, 101, 110, 111, (1*1, 1*0, 11*, 10*)}
Schema Theorem in Language Acquisition
Genetic Algorithms
• If “101” is judged as being 75% fit, it simultaneously guestimates {***, 1**, *0*, **1, 10*, 1*1, *01, 101} as being 75% fit
• Given a population with multiple instantiations, implicit calculation of category fitness becomes more accurate
• Fuzzy judgments are still useful• Selection, biased by fitness, selects not for
highly fit individuals but (implicitly) highly fit categories by targeting highly fit individuals
Schema Theorem in Language Acquisition
Genetic Algorithms
the profound insight:
GAs make use of category information without explicit category definitions, explicit
biases, or explicit reference to category information. It implicitly acts on categories
through category instantiations
Schema Theorem in Language Acquisition
Genetic Algorithms
• taken in this light it is easier to see how GAs skip a great deal of the computational load through implicit parallelism
• Critical characteristics• use a population of tokens (parallelism)• a selection process that targets / discovers salient /
relevant dimensions of substructure within those tokens
Schema Theorem in Language Acquisition
Wealth of the Stimulus
Schema Theorem in Language Acquisition
Schema Theorem
tokens
evaluation
outcome
GAs
chromosomes
fitness function
optimal solution
Acquisition
experience
learning
grammar
Schema Theorem in Language Acquisition
Wealth of the Stimulus
Experiences• entire sensory experiences that include linguistic
stimuli• importantly, all sensory information impacts
memory and is available to be correlated• infants are exquisitely sensitive to detailed and
correlated sensory information – at least until they learn what to ignore (Rovee-Collier, 1991)
• “population” because stored distributed within the same neural structures – continuous, not digital
Schema Theorem in Language Acquisition
Wealth of the Stimulus
Learning• in most basic neural sense – continuous,
correlative, passive• reduces “sensory noise” – reinforces correlated
multimodal sensory experience• a type of “selection” process because salient
dimensions emerge through the process
Schema Theorem in Language Acquisition
Wealth of the Stimulus
Grammar• Schematic / analogical (following Tomasello,
2000; Hofstadter; and usage based models)• More subtle correlations, or higher level
correlations will take more time to be distinguished from “noise” – results in a course of development
• Acquisitional prerequisites may exist, but it’s a mistake to believe that relevant information isn’t being collected long before certain phenomena appear – all input has a physiological impact
Schema Theorem in Language Acquisition
Wealth of the Stimulus
Traditional Progression1. infants attend to phonetic
features
2. phonetic features allow access to phonological system
3. access to phonology allows access to words and short phrases
4. access to words gives access to syntax
• matches the observed developmental increase in grammatical complexity
• input is only informative to the linguistic module acquired at each stage
• linguistic evidence sets innate parameters
• serial, computationally expensive (thus UG)
Schema Theorem in Language Acquisition
Wealth of the Stimulus
Schema Theorem Based Progression
1. Every utterance an infant hears provides a tiny bit of information about the phonetics, phonotactics, phonology, morphology, word categories, syntax, tense and aspect system, pragmatics, semantic categories, diexis, references – every aspect of their language
• will also match the observed developmental increase in grammatical complexity
• input is informative to every aspect of language even though its contribution may not clearly surface or be attended to immediately
• parallel, computationally efficient, flexible, adaptable
• in line with what’s going on in other fields
Schema Theorem in Language Acquisition
Conclusion
A population of tokens implicitly carries exponentially more information about the set than the tokens themselves represent. Parallel systems (of which GAs and the brain are examples) that act on that population can make use of category information that is not explicitly stated. Formal systems cannot.
Without changing our observations of the input, development, or the outcome, by taking a more biologically plausible perspective on the information processing going on, we can see that the linguistic environment is far richer than impoverished