sequences, patterns and rules: brain mechanisms of syntax in ... challenges in...jabberwocky...
TRANSCRIPT
Sequences, patterns and rules:Brain mechanisms of syntax in language and math
Stanislas Dehaene and Marie Amalric
Five mechanisms for sequence representation
Ordinal knowledge
1st
…2nd 3rd 1st 3rd2nd
Chunking tokibugikobagopilagikobatokibugopila …
Algebraic patterns totobu … mimitu … gagari … pesipe … A A B A A B A A B A B A (violation)
Transitions and timingtime
predicted
observed
Δt Δt Δt Δt
Dehaene, S., Meyniel, F., Wacongne, C., Wang, L., & Pallier, C. (2015). The Neural Representation of Sequences: From Transition Probabilities to Algebraic Patterns and Linguistic Trees. Neuron, 88(1), 2–19.
Nested symbolicstructures
those car factory workersgifted
A N ND
NPNP
NPDP
N
a a b b b b a a
repeat repeat repeat repeat
reverse
concatconcat
Many arguments support the existence of nested treestructures in the language domain
• Cases of syntactic ambiguity:
• Ellipsis or substitution of any phrase :
• « Syntactic Movement » of phrases (for question formation, topicalization, etc):
• Long-distance dependencies (agreement and binding):
Black taxi driverBlack (taxi driver)(Black taxi) driver
unlockable
Can be explained by recursive substitution rules: NP A NP and NP NP N that generate distinct structures: [A [N N]] or [[A N] N].
‘‘he [drove [to [this [big house]]]’’= ‘‘he drove to this one,’’ ‘‘he drove to it,’’ ‘‘he drove there,’’ ‘‘he did.’’
« Jean wants to buy this toy » « It’s this toy that John wants to buy » or « to buythis toy, that’s what John wants » or « what toy does John want to buy », etc.
« The cars that pass this truck are red »
How it got in my pajamas, I don’t know (Groucho Marx)I shot an elephant in my pajamas…
= un-(lock-able) or (un-lock)-able
Haegeman, L. (2005). Thinking Syntactically: A Guide to Argumentation and Analysis. Wiley.
A capacity to combine basic symbols into nested structures is characteristic of the human language faculty.
Transition probabilities and ordinal number are insufficient – nested representations are needed.
Nested structures also underlie manyother domains of human competence :
Language, mathematics, music, and perhaps also social cognition, tool use…
What is special about the human brain?Perhaps a quick grasp of nested symbolic structures
Did evolution endow the human brainwith a specific ability to representnested symbolic structures?
Brain mechanisms for the constituent structure of language
Pallier, C., Devauchelle, A. D., & Dehaene, S. (2011). Cortical representation of the constituent structure of sentences. PNAS, 108(6), 2522-2527.
Hypothesis: in a region that constructs trees, the activation could increase with the size of the tree, each time a “MERGE” operation is needed to bind two constituents.
(The girl) < (The (nice girl)) < (The (girl (who talks)))
In this example, constituent size is confounded with number of words.
We created stimuli with a fixed number of words or pseudo-words (always 12), but where constituent size was systematically manipulated.
Parametric manipulation of constituent size
Constituent size
Examples (normal prose)
12 words (c12)
I believe that you should accept the proposal of your new associate
6 words (c06) the mouse that eats our cheese two clients examine this nice couch
4 words (c04) mayor of the city he hates this color they read their names
3 words (c03) solving a problem repair the ceiling he keeps reading will buy some
2 words (c02) looking ahead who dies important task his dog few holes they write
1 word (c01) thing very tree where of watching copy tensed they states heart plus
Syntax without lexical semantics or transition probabilities
Constituent size
Examples (Jabberwocky)
12 words (c12)
I tosieve that you should begept the tropufal of your tew viroflate
6 words (c06) the couse that rits our treeve fow plients afomine this kice bloch
4 words (c04) tuyor of the roty he futes this dator they gead their wames
3 words (c03) relging a grathem regair the fraping he meeps bouding will doy some
2 words (c02) troking ahead who mies omirpant fran his gog few biles they grite
1 word (c01) thang very gree where of wurthing napy gunsed they flotes blart trus
To control for transition probability and semantic content, we created Jabberwockystimuli in which content words are replaced by pseudowords.
Predictions of a simple accumulator model
12-word
6-word
4-word
3-word
2-word
1-word
Level of activity Haemodynamicresponse
Time Time
Hypotheses: - Total neural activity increases each time a word is incorporated into the current constituent.- Activity collapses when merging is not longer possible.Prediction: Linearly increasing amplitude and phase of the fMRI response
Pallier, Devauchelle & Dehaene , PNAS 2011.
A left perisylvian network for constituent structure
Increase only for normal proseJabberwocky and normal prose
• Several areas of the left superior sulcus and inferior frontal gyrus, plus the leftputamen, show an increase in brain activity with constituent size•A core set of areas (pSTS, IFGtri, IRGorb) responds identically to normal and to Jabberwocky sentences
-45 33 -6 -51 30 6
-48 -45 3-48 15 -27 -54 -12 -12
-45 -66 24
Larger structures require more activation and more time
Activation increased with constituent size.
We also observe a similar increase in the delay of the BOLD response in most areas.
Surprisingly, this increase builds up as log(n), where n is the size of the maximal constituent
This could reflect either the prediction of forthcoming words, or the chunking of words into nested constituents, thus reflecting the actual tree depth.
Anterior STS
Replication with spoken language
Dorsal Ventral
Lateral
Medial
Written language Increase withconstituent size
Decrease
Spoken language
Dorsal Ventral
Lateral
Medial
p<0.05, FDR corrected
Language : Converging evidence for a core network for the manipulation of syntactic trees
Syntactic movement(Shetreet & Friedmann, 2014)
Reduced activation and lesionsin agrammatic patients
(Tyler et al., 2011)
Syntactic ambiguities(Tyler et al., 2011)
Embedded phrases > Adjunct phrases
Tree manipulation >No manipulation
Extraction of semantic information from syntactic trees(Pattamadilok, Pallier, & Dehaene, Cortex, 2015)
« The kids who exhausted their parents fell asleep » The parents fell asleep; or The dog barked
fMRI signal increases with the
size of syntactic structures,
even in Jabberwocky
Monotonic increase with constituent size(Pallier, Devauchelle & Dehaene, 2011)
Bill Gates met two very tireddancers in Dallas
More sophisticated model :Constituents allow for a compression of the
information into a tree structure.
Number of pending words
+ Number of closed constituents
= Total number of open nodes
How are constituents encoded ? An intracranial studyNelson et al., submitted
Naive model : linear increase with the number of words
Example of an electrode in pSTS
High-gamma activity builds up for successive words, only in the sentence condition
Normal sentences
(ordered by length and constituent structure)
Wordlist
Time (sec)
-1 0 1 2 3 4-4
-3
-2
-1
0
1
2
3
4
dB
aSTS electrodes
Sentences
Word lists
0
0.4
word number in sentence
1 2 3 4 5 6 7
0
0.4
3
4
5
6
7
Stimuluslength
(# words)
hig
h g
amm
a p
ow
er (
dB
)h
igh
gam
ma
po
wer
(d
B)
End of sentence effectsHigh-gamma activity increases
after the last word of a sentence, in proportion to its length
Normal sentences
(ordered by length and constituent structure)
Wordlist
A potential neural correlate of the “sentence wrap-up effect”, the selective slowing of reading time on the last word of the sentence (Warren et al. 2009, Cognition)
Time (sec)
-4 -3 -2 -1 0 1 -4
-3
-2
-1
0
1
2
3
4
dB
Aligned on last word
Regression against sentence length
Sentence end > middle
-3.3
0.0
3.3z-score
Brain activity closely trackssentence-internal phrase structures
Sentences re-orderedby first node close position
time relative to sentence onset (s)
0 1 2
Activity as a function of constituent size
0
2
4studentsTen
studentsTen of GatesBill
studentsTen
sadTen students
sadTen students Billof Gates
hig
h g
am
ma
po
we
r (d
B)
0
2
4
Brain correlates of the number of open nodesh
igh
gam
ma
po
we
r (d
B)
time relative to word onset (s)
Total number of open nodes
2
3
4
5
6
Example electrode :
Open node tracking :
Do single words and multi-word phrase have the same weight?
Across electrodes, the betas of the number of pendingwords and the number of closed constituents are correlatedwith a slope close to 1.
This suggests that single words and multi-word phrases have nearly the same weight: the brain « compresses » phrases, almost down to the size of a single word.
Number of pending words
-0.2 0 0.2 0.4 0.6
-0.2
0
0.2
0.4
0.6
Slope = 1.23 +/- 0.12N
um
ber
of
clo
sed
co
nst
itu
ents
Bill Gates met two very tired dancers in Dallas
1 2 2 3 4 5 5 3 4
0 0 1 1 1 1 2 2 2Number of
closed constituents
Total number of open nodes
1 2 1 2 3 4 3 1 2Number of pending
words
+
=
• The concept of “total number of open nodes” implicitly assumes that a single word and a multi-word phrase, once merged, contribute the same amount of additional brain activity.
• This is implicit in the notion of a “merge” operation that applies to all linguistic objects, regardless of their complexity.
• Can we test this idea?
Do single words and multi-word phrases have the same weight?
A few areas are driven significantly more by closed constituents than by single words, particularly the temporal pole, precuneus, inferior parietal lobule, and SMA.
These regions may play a role in storing the outputs of the merge operation.
Number of pending words Number of closed constituents
-0.3
0.0
0.3
b-value
Closed constituents > Pending words
-3.3
0.0
3.3z-score
Transient activity at the time of constituent formation
Some regions (particularly IFG) show an additional burst of activity at the time of constituent closure (“merge”), proportional to constituent length.
These regions may play a role in the merge operation itself.
IFGtri electrode
5+
3 to 4
1 to 2
Number of nodes closing
012 to 34+
hig
h g
am
ma
po
we
r (d
B)
b
Number of nodes closing
Sentence middle
time relative to word onset (s)
Sentence end
*
-3.3
0.0
3.3
z-score
Summary: the brain compresses the information in sentences
This finding may explain- why fMRI increases sub-
linearly (logarithmically) withthe number of words
- why memory is better for sentences than for word lists.
2 4 6 8 10
2
4
6
8
number of open nodes
Ordinal number of word in sentence
y = x
Open node tracking Transient merge activity
Brain activity does not merely increase with every new word, but also transiently decreases whenever a phrase-building operation compresses several words into a single node
What is the language of mathematics?
Galileo: « This book [the universe] is written in the mathematical language, and the symbols are triangles, circles and other geometrical figures, without whose help it is impossible to comprehend a single word of it. »
How does mathematical language relate to natural language?
According to Noam Chomsky, “the origin of the mathematical capacity lies in an abstraction from linguistic operations”.
According to Albert Einstein (and many other physicists and mathematicians), « words and language, whether written or spoken, do not seem to play any part in my thought processes. The psychological entities that serve as building blocks for my thought are certain signs or images, more or less clear, that I can reproduce and recombine at will.»
Neuronal recycling model: High-level math may recycle areas originally involved in space, time and number (StAN).
Do these core systems need language as a sort of “glue” that interconnects them (Elizabeth Spelke)?
In some cases, Broca’s area is used for number processing.
Hung, Y.-H., Pallier, C., Dehaene, S., Lin, Y.-C., Chang, A., Tzeng, O. J.-L., & Wu, D. H. (2015). Neural correlates of merging number words. NeuroImage.
U C D U M U C D U six cent soixante deux mille neuf cent quarante sept
U M C D U U C D U deux mille cent trente cinq quatre cent vingt huit
M U U C D D U U C mille deux neuf cent quarante vingt huit cinq cent
D C C U U U U D C soixante cent cent neuf six huit cinq trente cent
The constituent structure of math expressionsdoes not rely on language areas
Maruyama, M., Pallier, C., Jobert, A., Sigman, M., & Dehaene, S. (2012). The cortical representation of simple mathematical expressions. NeuroImage, 61(4), 1444–1460.
Level
0
1
2
3
Right inferior temporal
1
2
expression 1 expression 2
0 1000 2000 ms
MEG + fMRI
The structure of mathematical expressions is encodedin lateral ventral temporal and intraparietal cortices, not in language areas.
Origins of the brain networks for high-level mathematics in professional mathematicians
Amalric & Dehaene, PNAS 2016
Subjects = Professional mathematicians (n=15) Comparison with professors of humanities of matched academic standing, but without
mathematical training (n=15).
Main task = perform a fast intuitive judgment on spoken statements (classify them as true, false, or meaningless)
+ Calculation localizer : « please compute seven minus three » vs hearing control sentences.
+ Visual localizer : one-back task with various categories of stimuli:
Sentence presentation
Reflectionperiod
Motor response
Resting period
1 s mean = 4.6 ± 0.9 s 4 s 2 s 7 s
Alertingsound
Alertingsound
Brain areas for mathematical expertise in mathematicians :ventrolateral temporal, intraparietal, and dorsal prefrontal cortices
Meaningfulmath > non-math in
mathematicians
Contrast Math > non-Mathrestricted to meaningful stimuli, during reflection period
Even
t-re
late
d a
vera
ge
L IPS[-52 -43 56]
L fusiform[-52 -56 -15]
R fusiform[55 -52 -18]
R IPS[55 -35 56]
time (s)
statement statement
statement statement
L BA 44d[-46 6 31]
Left pSTS/AG [-53 -67 27] Right pSTS/AG [58 -65 28]
Mathematicians Controls Mathematicians Controls
% bold
Left MTG [-62 -12 -20] Right MTG [64 -7 -21]
Mathematicians Controls Mathematicians Controls
Meaningful math > non-mathin mathematicians
Meaningful non-math > mathin both groups
AnalysisAlgebraTopologyGeometryNon-math
Activation to meaningful sentences in:
General semantic knowledge activates areas completely different from those involved in mathematical thinking
An independent contrast: meaningful > meaningless sentences
Meaningful math > Meaningless mathin mathematicians
Meaningful non-math > Meaningless non-mathin both groups
Interaction: Meaningful math > meaningless mathin Mathematicians > Controls
L IPS [-53, -43, 57]
0 5 10 15 20
-2
-1
0
1
2
0 5 10 15 20
-2
-1
0
1
2
Mathematicians Controls
L IT [-52, -56, -15]
0 5 10 15 20
-1
-0.5
0
0.5
1
0 5 10 15 20
-1
-0.5
0
0.5
1
Mathematicians Controls
L aMTG [-62 -12 -20] L pSTS/AG [-53 -67 27]
Time (s)0 5 10 15 20
-1
-0.5
0
0.5
1
1.5
0 5 10 15 20
-1
-0.5
0
0.5
1
1.5
Mathematicians Controls
0 5 10 15 20-1
-0.5
0
0.5
1
1.5
0 5 10 15 20-1
-0.5
0
0.5
1
1.5
Mathematicians Controls
Meaningful Math
Meaningless Math
Meaningful Non-math
Meaningless Non-math
Language and math areas are distinct
1
2
3
4
56
7
AnalysisAlgebraTopologyGeometryNon-math
0 5 10 15 20
-1
0
1
2
3
statement
Temporal pole0 5 10 15 20
-3-2-101234567
statement
Anterior temporal
Language areas are only transiently activatedduring sentence presentation
Math “recycles” the cortical networks for number recognition and calculation.
z = 52z = -14
Math > Non-math reflection
Numbers > Other pictures
Calculation > Sentence processing
Intersection
Parietal areas for number sense
(Dehaene et al 2003)
The visual number form area
(Shum, Hermes, Parvizi)
The overlap between high-level mathematics and elementarynumber processing is not due to the presence of numbers in
our math statements
• Our mathematical statements carefully avoided any direct mention of numbers or arithmetic facts.
• However, some contained an occasional indirect reference to numbers or to fractions (e.g. ℝ2, unit sphere, semi-major axis, etc).
• We therefore reanalyzed the results after systematic exclusion of such statements.
• The math network (activation for math > non-math in mathematicians) remains virtually unchanged.
arithmetic
language
overlap
Monti et al., 2012
- Deeply aphasic patients may still process
complex algebraic expressions (e.g. Varley et
al. 2005)
- Processing hierarchical algebraic
expressions, relative to lists, causes no
activation in Broca’s area
or only in its dorsal opercular part during
explicit calculation.
- Manipulations of tree structures activate
very different regions for language and
arithmetic (Monti et al., 2012)
eg. “Y gave X to Z” and “It was X that Y gave to Z”
vs. “Y is greater than Z divided by X” and “X
times Y is greater than Z”
Converging evidence for a dissociation between the syntax of language and math
Friedrich and Friederici, 2009
Makuuchi et al., 2012
-51, 2, 46-45, 5, 37
Nakai & Sakai, 2014
-51, 3, 39Calculation: Dehaene et al 2003
A simplified “language of geometry”
Marie Amalric, Liping Wang
Subjects see a moving« animal » (dot) and have to anticipate where it is going to go next.
A variety of sequences are compared.
A simplified “language of geometry”
Marie Amalric, Liping Wang
+1 or H
+2-1 or A
-2
0
B
V PExample 1: “four segments”
Formula = [H^2]^4{+1}
Example 2: “two rectangles”Formula = [[-1,-3]^2]^2<+2>
1
3
5
7
2
4
6
8
H
H
H
H
1
8
7
4
2
5
6
3
Mariano Sigman and Santiago Figueira helped us design a formal language that can describe the sequence regularitiesin a very compact manner.
Adults, children and even uneducated Mundurucucan predict the next location
Marie Amalric, Liping Wang
Minimal description length, in our « language of geometry »,predicts error rates
French adults
5 6 7 8 9 10 11 12 13 14 15 160
0.2
0.4
0.6
0.8
1
ρ = 0.75
% errors
Minimal description length (MDL)
Minimal description length (a.k.a Kolmogorov complexity) is the length of the shortest program that captures a given sequence.It is a good predictor of the difficulty of learning and/or memorizing a sequence.
5 6 7 8 9 10 11 12 13 14 15 160
0.2
0.4
0.6
0.8
1
French preschoolers
Mundurucu adults
5 6 7 8 9 10 11 12 13 14 15 160
0.2
0.4
0.6
0.8
1
Minimal Description Length (MDL)
2crosses 4diagonals
ρ = 0.52
ρ = 0.59
2crosses 4diagonals
Our « language of geometry » predicts errors at each step
A good framework: minimal description length (Kolmogorov complexity)
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
3456780
0.5
1
% Correct Adults:
Children (Same model with reduced instruction set (Recursion, +4):
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
Mundurucu:
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
4 5 6 7 80
0.5
1
Repeat 2arcs 2squares 4seg 4diag 2rect 2crosses Irregular
The internal representation of geometrical sequences takes the form of a language of thought with:
A set of primitive instructions that combine into “mental programs”.
Instructions for repeating regular series of operations: simple concatenation or repetition with variation.
Application of Occam’s razor by selecting the most parsimonious program that accounts for the observed sequence.
Eye-tracking anticipations betrayan implicit understanding of sequences
The dorsal part of the IFG is active in proportion toMinimal Description Length (K complexity)
Math > Non-Math Non-Math > Math
Xiang, Norris & Hagoort, Cerebral Cortex, 2009
Mathematics ?
A hypothesis: multiple parallel circuits for symbolic nested structures in the human species
« Broca’s area » may operate as a binding system that transiently merges or unifies the representations present in other brain regions in order to formsymbolic nested structures.
Multiple parallel circuits may create nested structures in :- phonology- syntax- semantics- MathematicsEtc…
Using fMRI to shed light on human uniqueness
during sequence processing
Liping Wang and Bechir JarrayaCurrent Biology, 2015
We have begun to perform studies of minimal sequence learning (“mini-languages”)in humans and in macaque monkeys.
Our strategy:- Naïve monkeys, simply trained
to fixate and remain quiet in the scanner.
- Compared to humans in the same task
- Exposed to simple auditory rules- fMRI responses to novel deviant
sequences reveals what monkeys understand. Wim Vanduffel Bechir JarrayaLiping Wang
Can monkeys grasp abstract patterns such as aaaB: « 3 tones, then another » ?
Rare test stimuli (2 samples)
N- S- (New exemplars of same rule)
700 Hz
1120 Hz
1792 Hz
N- S+ (Sequence deviants)
N+ S- (Number deviants)
N+ S+ (Double deviants)
frequency
500 Hz
800 Hz
1280 Hz
2048 Hz
time
Habituation stimuli (example of rule AAAB)
Sample 1 Sample 2 Sample 3 Sample 4
Do monkeys understand abstract auditory patterns?
Monkey
Human
fMRI
Wang, Uhrig, Jarraya & Dehaene, Current Biology, 2015
Brain responses to Number Change in Monkeys
N+S- > N-S-
*: Main effect of Number
Wang, Uhrig, Jarraya & Dehaene, Current Biology, 2015
p<0.005, corrected
Neurons tuned to number in untrained monkeys(Viswanathan and Nieder, PNAS, 2014)
Brain responses to Sequence change in Monkeys
N-S+ > N-S- p<0.005, corrected
*: Main effect of Sequence
Wang, Uhrig, Jarraya & Dehaene, Current Biology, 2015
Summary in monkeys
Bizley & Cohen, Nat Rev Neurosci, 2013
Dorsal and ventral auditory pathways in monkeys
Monkeys possess sophisticated capacities for representing the abstractproperties of auditory sequences (independently of changes in pitch or tempo)
• The number of tones
• The sequential structure:whether the last tone differs from the previous ones
What is special to humans?Number and sequence effects intersect in human IFG and pSTS
Number
Sequence
No such intersection in the monkey
IFGIFG
pSTS pSTS
Wang, Uhrig, Jarraya & Dehaene, Current Biology, 2015
PFC IFG / F5
Number and sequence: Overlap in humans, dissociation in monkeys
Wang, Uhrig, Jarraya & Dehaene, Current Biology, 2015
In prefrontal cortex- Humans show similar
patterns of activity for number and sequence change
- Monkeys show a negative correlation indicating a segregation of number and sequence patterns
Our hypothesis:• Monkeys can detect abstract features such as
« four tones » or « one item is different ».• But only humans can conceive of an abstract
representation of the entire pattern, such as « 3 tones, then another one ».
three tones one other
then
Part A Part B
Conclusionsand new hypotheses
• Distinct parallel circuits seem to encode the complexstructures underlying language and mathematics.
• The language circuit also responds to elementaryauditory sequences.
• Both circuits may be uniquely developed in humans.• Monkeys possess sophisticated capacities for
representing the abstract numerical and sequencepatterns of auditory series.
• Only humans possess cortical circuitry in the IFG and pSTS capable of integrating this information.
• Hypothesis: The human IFG may operate as a binding system that integrates representations from otherbrain areas and transiently « merges » them intocomplex, nested tree structures.
• In the future, this method should allow us to compare the neural codes for sequential and spatial structures in monkeys and humans.
Core syntax Semantics
Pallier, Devauchelle & Dehaene, PNAS 2011
Math
General knowledge
Thank you for your attention!
Christophe Pallier Liping Wang Bechir Jarraya
Marie Amalric