sequences, patterns and rules: brain mechanisms of syntax in ... challenges in...jabberwocky...

Sequences, patterns and rules:Brain mechanisms of syntax in language and math

Stanislas Dehaene and Marie Amalric

Five mechanisms for sequence representation

Ordinal knowledge

1st

…2nd 3rd 1st 3rd2nd

Chunking tokibugikobagopilagikobatokibugopila …

Algebraic patterns totobu … mimitu … gagari … pesipe … A A B A A B A A B A B A (violation)

Transitions and timingtime

predicted

observed

Δt Δt Δt Δt

Dehaene, S., Meyniel, F., Wacongne, C., Wang, L., & Pallier, C. (2015). The Neural Representation of Sequences: From Transition Probabilities to Algebraic Patterns and Linguistic Trees. Neuron, 88(1), 2–19.

Nested symbolicstructures

those car factory workersgifted

A N ND

NPNP

NPDP

N

a a b b b b a a

repeat repeat repeat repeat

reverse

concatconcat

Many arguments support the existence of nested treestructures in the language domain

• Cases of syntactic ambiguity:

• Ellipsis or substitution of any phrase :

• « Syntactic Movement » of phrases (for question formation, topicalization, etc):

• Long-distance dependencies (agreement and binding):

Black taxi driverBlack (taxi driver)(Black taxi) driver

unlockable

Can be explained by recursive substitution rules: NP A NP and NP NP N that generate distinct structures: [A [N N]] or [[A N] N].

‘‘he [drove [to [this [big house]]]’’= ‘‘he drove to this one,’’ ‘‘he drove to it,’’ ‘‘he drove there,’’ ‘‘he did.’’

« Jean wants to buy this toy » « It’s this toy that John wants to buy » or « to buythis toy, that’s what John wants » or « what toy does John want to buy », etc.

« The cars that pass this truck are red »

How it got in my pajamas, I don’t know (Groucho Marx)I shot an elephant in my pajamas…

= un-(lock-able) or (un-lock)-able

Haegeman, L. (2005). Thinking Syntactically: A Guide to Argumentation and Analysis. Wiley.

A capacity to combine basic symbols into nested structures is characteristic of the human language faculty.

Transition probabilities and ordinal number are insufficient – nested representations are needed.

Nested structures also underlie manyother domains of human competence :

Language, mathematics, music, and perhaps also social cognition, tool use…

What is special about the human brain?Perhaps a quick grasp of nested symbolic structures

Did evolution endow the human brainwith a specific ability to representnested symbolic structures?

Brain mechanisms for the constituent structure of language

Pallier, C., Devauchelle, A. D., & Dehaene, S. (2011). Cortical representation of the constituent structure of sentences. PNAS, 108(6), 2522-2527.

Hypothesis: in a region that constructs trees, the activation could increase with the size of the tree, each time a “MERGE” operation is needed to bind two constituents.

(The girl) < (The (nice girl)) < (The (girl (who talks)))

In this example, constituent size is confounded with number of words.

We created stimuli with a fixed number of words or pseudo-words (always 12), but where constituent size was systematically manipulated.

Parametric manipulation of constituent size

Constituent size

Examples (normal prose)

12 words (c12)

I believe that you should accept the proposal of your new associate

6 words (c06) the mouse that eats our cheese two clients examine this nice couch

4 words (c04) mayor of the city he hates this color they read their names

3 words (c03) solving a problem repair the ceiling he keeps reading will buy some

2 words (c02) looking ahead who dies important task his dog few holes they write

1 word (c01) thing very tree where of watching copy tensed they states heart plus

Syntax without lexical semantics or transition probabilities

Constituent size

Examples (Jabberwocky)

12 words (c12)

I tosieve that you should begept the tropufal of your tew viroflate

6 words (c06) the couse that rits our treeve fow plients afomine this kice bloch

4 words (c04) tuyor of the roty he futes this dator they gead their wames

3 words (c03) relging a grathem regair the fraping he meeps bouding will doy some

2 words (c02) troking ahead who mies omirpant fran his gog few biles they grite

1 word (c01) thang very gree where of wurthing napy gunsed they flotes blart trus

To control for transition probability and semantic content, we created Jabberwockystimuli in which content words are replaced by pseudowords.

Predictions of a simple accumulator model

12-word

6-word

4-word

3-word

2-word

1-word

Level of activity Haemodynamicresponse

Time Time

Hypotheses: - Total neural activity increases each time a word is incorporated into the current constituent.- Activity collapses when merging is not longer possible.Prediction: Linearly increasing amplitude and phase of the fMRI response

Pallier, Devauchelle & Dehaene , PNAS 2011.

A left perisylvian network for constituent structure

Increase only for normal proseJabberwocky and normal prose

• Several areas of the left superior sulcus and inferior frontal gyrus, plus the leftputamen, show an increase in brain activity with constituent size•A core set of areas (pSTS, IFGtri, IRGorb) responds identically to normal and to Jabberwocky sentences

-45 33 -6 -51 30 6

-48 -45 3-48 15 -27 -54 -12 -12

-45 -66 24

Larger structures require more activation and more time

Activation increased with constituent size.

We also observe a similar increase in the delay of the BOLD response in most areas.

Surprisingly, this increase builds up as log(n), where n is the size of the maximal constituent

This could reflect either the prediction of forthcoming words, or the chunking of words into nested constituents, thus reflecting the actual tree depth.

Anterior STS

Replication with spoken language

Dorsal Ventral

Lateral

Medial

Written language Increase withconstituent size

Decrease

Spoken language

Dorsal Ventral

Lateral

Medial

p<0.05, FDR corrected

Language : Converging evidence for a core network for the manipulation of syntactic trees

Syntactic movement(Shetreet & Friedmann, 2014)

Reduced activation and lesionsin agrammatic patients

(Tyler et al., 2011)

Syntactic ambiguities(Tyler et al., 2011)

Embedded phrases > Adjunct phrases

Tree manipulation >No manipulation

Extraction of semantic information from syntactic trees(Pattamadilok, Pallier, & Dehaene, Cortex, 2015)

« The kids who exhausted their parents fell asleep » The parents fell asleep; or The dog barked

fMRI signal increases with the

size of syntactic structures,

even in Jabberwocky

Monotonic increase with constituent size(Pallier, Devauchelle & Dehaene, 2011)

Bill Gates met two very tireddancers in Dallas

More sophisticated model :Constituents allow for a compression of the

information into a tree structure.

Number of pending words

+ Number of closed constituents

= Total number of open nodes

How are constituents encoded ? An intracranial studyNelson et al., submitted

Naive model : linear increase with the number of words

Example of an electrode in pSTS

High-gamma activity builds up for successive words, only in the sentence condition

Normal sentences

(ordered by length and constituent structure)

Wordlist

Time (sec)

-1 0 1 2 3 4-4

-3

-2

-1

0

1

2

3

4

dB

aSTS electrodes

Sentences

Word lists

0

0.4

word number in sentence

1 2 3 4 5 6 7

0

0.4

3

4

5

6

7

Stimuluslength

(# words)

hig

h g

amm

a p

ow

er (

dB

)h

igh

gam

ma

po

wer

(d

B)

End of sentence effectsHigh-gamma activity increases

after the last word of a sentence, in proportion to its length

Normal sentences

(ordered by length and constituent structure)

Wordlist

A potential neural correlate of the “sentence wrap-up effect”, the selective slowing of reading time on the last word of the sentence (Warren et al. 2009, Cognition)

Time (sec)

-4 -3 -2 -1 0 1 -4

-3

-2

-1

0

1

2

3

4

dB

Aligned on last word

Regression against sentence length

Sentence end > middle

-3.3

0.0

3.3z-score

Brain activity closely trackssentence-internal phrase structures

Sentences re-orderedby first node close position

time relative to sentence onset (s)

0 1 2

Activity as a function of constituent size

0

2

4studentsTen

studentsTen of GatesBill

studentsTen

sadTen students

sadTen students Billof Gates

hig

h g

am

ma

po

we

r (d

B)

0

2

4

Brain correlates of the number of open nodesh

igh

gam

ma

po

we

r (d

B)

time relative to word onset (s)

Total number of open nodes

2

3

4

5

6

Example electrode :

Open node tracking :

Do single words and multi-word phrase have the same weight?

Across electrodes, the betas of the number of pendingwords and the number of closed constituents are correlatedwith a slope close to 1.

This suggests that single words and multi-word phrases have nearly the same weight: the brain « compresses » phrases, almost down to the size of a single word.

Number of pending words

-0.2 0 0.2 0.4 0.6

-0.2

0

0.2

0.4

0.6

Slope = 1.23 +/- 0.12N

um

ber

of

clo

sed

co

nst

itu

ents

Bill Gates met two very tired dancers in Dallas

1 2 2 3 4 5 5 3 4

0 0 1 1 1 1 2 2 2Number of

closed constituents

Total number of open nodes

1 2 1 2 3 4 3 1 2Number of pending

words

+

=

• The concept of “total number of open nodes” implicitly assumes that a single word and a multi-word phrase, once merged, contribute the same amount of additional brain activity.

• This is implicit in the notion of a “merge” operation that applies to all linguistic objects, regardless of their complexity.

• Can we test this idea?

Do single words and multi-word phrases have the same weight?

A few areas are driven significantly more by closed constituents than by single words, particularly the temporal pole, precuneus, inferior parietal lobule, and SMA.

These regions may play a role in storing the outputs of the merge operation.

Number of pending words Number of closed constituents

-0.3

0.0

0.3

b-value

Closed constituents > Pending words

-3.3

0.0

3.3z-score

Transient activity at the time of constituent formation

Some regions (particularly IFG) show an additional burst of activity at the time of constituent closure (“merge”), proportional to constituent length.

These regions may play a role in the merge operation itself.

IFGtri electrode

5+

3 to 4

1 to 2

Number of nodes closing

012 to 34+

hig

h g

am

ma

po

we

r (d

B)

b

Number of nodes closing

Sentence middle

time relative to word onset (s)

Sentence end

*

-3.3

0.0

3.3

z-score

Summary: the brain compresses the information in sentences

This finding may explain- why fMRI increases sub-

linearly (logarithmically) withthe number of words

- why memory is better for sentences than for word lists.

2 4 6 8 10

2

4

6

8

number of open nodes

Ordinal number of word in sentence

y = x

Open node tracking Transient merge activity

Brain activity does not merely increase with every new word, but also transiently decreases whenever a phrase-building operation compresses several words into a single node

What is the language of mathematics?

Galileo: « This book [the universe] is written in the mathematical language, and the symbols are triangles, circles and other geometrical figures, without whose help it is impossible to comprehend a single word of it. »

How does mathematical language relate to natural language?

According to Noam Chomsky, “the origin of the mathematical capacity lies in an abstraction from linguistic operations”.

According to Albert Einstein (and many other physicists and mathematicians), « words and language, whether written or spoken, do not seem to play any part in my thought processes. The psychological entities that serve as building blocks for my thought are certain signs or images, more or less clear, that I can reproduce and recombine at will.»

Neuronal recycling model: High-level math may recycle areas originally involved in space, time and number (StAN).

Do these core systems need language as a sort of “glue” that interconnects them (Elizabeth Spelke)?

In some cases, Broca’s area is used for number processing.

Hung, Y.-H., Pallier, C., Dehaene, S., Lin, Y.-C., Chang, A., Tzeng, O. J.-L., & Wu, D. H. (2015). Neural correlates of merging number words. NeuroImage.

U C D U M U C D U six cent soixante deux mille neuf cent quarante sept

U M C D U U C D U deux mille cent trente cinq quatre cent vingt huit

M U U C D D U U C mille deux neuf cent quarante vingt huit cinq cent

D C C U U U U D C soixante cent cent neuf six huit cinq trente cent

The constituent structure of math expressionsdoes not rely on language areas

Maruyama, M., Pallier, C., Jobert, A., Sigman, M., & Dehaene, S. (2012). The cortical representation of simple mathematical expressions. NeuroImage, 61(4), 1444–1460.

Level

0

1

2

3

Right inferior temporal

1

2

expression 1 expression 2

0 1000 2000 ms

MEG + fMRI

The structure of mathematical expressions is encodedin lateral ventral temporal and intraparietal cortices, not in language areas.

Origins of the brain networks for high-level mathematics in professional mathematicians

Amalric & Dehaene, PNAS 2016

Subjects = Professional mathematicians (n=15) Comparison with professors of humanities of matched academic standing, but without

mathematical training (n=15).

Main task = perform a fast intuitive judgment on spoken statements (classify them as true, false, or meaningless)

+ Calculation localizer : « please compute seven minus three » vs hearing control sentences.

+ Visual localizer : one-back task with various categories of stimuli:

Sentence presentation

Reflectionperiod

Motor response

Resting period

1 s mean = 4.6 ± 0.9 s 4 s 2 s 7 s

Alertingsound

Alertingsound

Brain areas for mathematical expertise in mathematicians :ventrolateral temporal, intraparietal, and dorsal prefrontal cortices

Meaningfulmath > non-math in

mathematicians

Contrast Math > non-Mathrestricted to meaningful stimuli, during reflection period

Even

t-re

late

d a

vera

ge

L IPS[-52 -43 56]

L fusiform[-52 -56 -15]

R fusiform[55 -52 -18]

R IPS[55 -35 56]

time (s)

statement statement

statement statement

L BA 44d[-46 6 31]

Left pSTS/AG [-53 -67 27] Right pSTS/AG [58 -65 28]

Mathematicians Controls Mathematicians Controls

% bold

Left MTG [-62 -12 -20] Right MTG [64 -7 -21]

Mathematicians Controls Mathematicians Controls

Meaningful math > non-mathin mathematicians

Meaningful non-math > mathin both groups

AnalysisAlgebraTopologyGeometryNon-math

Activation to meaningful sentences in:

General semantic knowledge activates areas completely different from those involved in mathematical thinking

An independent contrast: meaningful > meaningless sentences

Meaningful math > Meaningless mathin mathematicians

Meaningful non-math > Meaningless non-mathin both groups

Interaction: Meaningful math > meaningless mathin Mathematicians > Controls

L IPS [-53, -43, 57]

0 5 10 15 20

-2

-1

0

1

2

0 5 10 15 20

-2

-1

0

1

2

Mathematicians Controls

L IT [-52, -56, -15]

0 5 10 15 20

-1

-0.5

0

0.5

1

0 5 10 15 20

-1

-0.5

0

0.5

1


L aMTG [-62 -12 -20] L pSTS/AG [-53 -67 27]

Time (s)0 5 10 15 20

-1

-0.5

0

0.5

1

1.5

0 5 10 15 20

-1

-0.5

0

0.5

1

1.5


0 5 10 15 20-1

-0.5

0

0.5

1

1.5

0 5 10 15 20-1

-0.5

0

0.5

1

1.5


Meaningful Math

Meaningless Math

Meaningful Non-math

Meaningless Non-math

Language and math areas are distinct

1

2

3

4

56

7

AnalysisAlgebraTopologyGeometryNon-math

0 5 10 15 20

-1

0

1

2

3

statement

Temporal pole0 5 10 15 20

-3-2-101234567

statement

Anterior temporal

Language areas are only transiently activatedduring sentence presentation

Math “recycles” the cortical networks for number recognition and calculation.

z = 52z = -14

Math > Non-math reflection

Numbers > Other pictures

Calculation > Sentence processing

Intersection

Parietal areas for number sense

(Dehaene et al 2003)

The visual number form area

(Shum, Hermes, Parvizi)

The overlap between high-level mathematics and elementarynumber processing is not due to the presence of numbers in

our math statements

• Our mathematical statements carefully avoided any direct mention of numbers or arithmetic facts.

• However, some contained an occasional indirect reference to numbers or to fractions (e.g. ℝ2, unit sphere, semi-major axis, etc).

• We therefore reanalyzed the results after systematic exclusion of such statements.

• The math network (activation for math > non-math in mathematicians) remains virtually unchanged.

arithmetic

language

overlap

Monti et al., 2012

- Deeply aphasic patients may still process

complex algebraic expressions (e.g. Varley et

al. 2005)

- Processing hierarchical algebraic

expressions, relative to lists, causes no

activation in Broca’s area

or only in its dorsal opercular part during

explicit calculation.

- Manipulations of tree structures activate

very different regions for language and

arithmetic (Monti et al., 2012)

eg. “Y gave X to Z” and “It was X that Y gave to Z”

vs. “Y is greater than Z divided by X” and “X

times Y is greater than Z”

Converging evidence for a dissociation between the syntax of language and math

Friedrich and Friederici, 2009

Makuuchi et al., 2012

-51, 2, 46-45, 5, 37

Nakai & Sakai, 2014

-51, 3, 39Calculation: Dehaene et al 2003

A simplified “language of geometry”

Marie Amalric, Liping Wang

Subjects see a moving« animal » (dot) and have to anticipate where it is going to go next.

A variety of sequences are compared.

A simplified “language of geometry”


+1 or H

+2-1 or A

-2

0

B

V PExample 1: “four segments”

Formula = [H^2]^4{+1}

Example 2: “two rectangles”Formula = [[-1,-3]^2]^2<+2>

1

3

5

7

2

4

6

8

H

H

H

H

1

8

7

4

2

5

6

3

Mariano Sigman and Santiago Figueira helped us design a formal language that can describe the sequence regularitiesin a very compact manner.

Adults, children and even uneducated Mundurucucan predict the next location


Minimal description length, in our « language of geometry »,predicts error rates

French adults

5 6 7 8 9 10 11 12 13 14 15 160

0.2

0.4

0.6

0.8

1

ρ = 0.75

% errors

Minimal description length (MDL)

Minimal description length (a.k.a Kolmogorov complexity) is the length of the shortest program that captures a given sequence.It is a good predictor of the difficulty of learning and/or memorizing a sequence.

5 6 7 8 9 10 11 12 13 14 15 160

0.2

0.4

0.6

0.8

1

French preschoolers

Mundurucu adults

5 6 7 8 9 10 11 12 13 14 15 160

0.2

0.4

0.6

0.8

1

Minimal Description Length (MDL)

2crosses 4diagonals

ρ = 0.52

ρ = 0.59

2crosses 4diagonals

Our « language of geometry » predicts errors at each step

A good framework: minimal description length (Kolmogorov complexity)

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

3456780

0.5

1

% Correct Adults:

Children (Same model with reduced instruction set (Recursion, +4):

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

Mundurucu:

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

4 5 6 7 80

0.5

1

Repeat 2arcs 2squares 4seg 4diag 2rect 2crosses Irregular

The internal representation of geometrical sequences takes the form of a language of thought with:

A set of primitive instructions that combine into “mental programs”.

Instructions for repeating regular series of operations: simple concatenation or repetition with variation.

Application of Occam’s razor by selecting the most parsimonious program that accounts for the observed sequence.

Eye-tracking anticipations betrayan implicit understanding of sequences

The dorsal part of the IFG is active in proportion toMinimal Description Length (K complexity)

Math > Non-Math Non-Math > Math

Xiang, Norris & Hagoort, Cerebral Cortex, 2009

Mathematics ?

A hypothesis: multiple parallel circuits for symbolic nested structures in the human species

« Broca’s area » may operate as a binding system that transiently merges or unifies the representations present in other brain regions in order to formsymbolic nested structures.

Multiple parallel circuits may create nested structures in :- phonology- syntax- semantics- MathematicsEtc…

Using fMRI to shed light on human uniqueness

during sequence processing

Liping Wang and Bechir JarrayaCurrent Biology, 2015

We have begun to perform studies of minimal sequence learning (“mini-languages”)in humans and in macaque monkeys.

Our strategy:- Naïve monkeys, simply trained

to fixate and remain quiet in the scanner.

- Compared to humans in the same task

- Exposed to simple auditory rules- fMRI responses to novel deviant

sequences reveals what monkeys understand. Wim Vanduffel Bechir JarrayaLiping Wang

Can monkeys grasp abstract patterns such as aaaB: « 3 tones, then another » ?

Rare test stimuli (2 samples)

N- S- (New exemplars of same rule)

700 Hz

1120 Hz

1792 Hz

N- S+ (Sequence deviants)

N+ S- (Number deviants)

N+ S+ (Double deviants)

frequency

500 Hz

800 Hz

1280 Hz

2048 Hz

time

Habituation stimuli (example of rule AAAB)

Sample 1 Sample 2 Sample 3 Sample 4

Do monkeys understand abstract auditory patterns?

Monkey

Human

fMRI

Wang, Uhrig, Jarraya & Dehaene, Current Biology, 2015

Brain responses to Number Change in Monkeys

N+S- > N-S-

*: Main effect of Number


p<0.005, corrected

Neurons tuned to number in untrained monkeys(Viswanathan and Nieder, PNAS, 2014)

Brain responses to Sequence change in Monkeys

N-S+ > N-S- p<0.005, corrected

*: Main effect of Sequence


Summary in monkeys

Bizley & Cohen, Nat Rev Neurosci, 2013

Dorsal and ventral auditory pathways in monkeys

Monkeys possess sophisticated capacities for representing the abstractproperties of auditory sequences (independently of changes in pitch or tempo)

• The number of tones

• The sequential structure:whether the last tone differs from the previous ones

What is special to humans?Number and sequence effects intersect in human IFG and pSTS

Number

Sequence

No such intersection in the monkey

IFGIFG

pSTS pSTS


PFC IFG / F5

Number and sequence: Overlap in humans, dissociation in monkeys


In prefrontal cortex- Humans show similar

patterns of activity for number and sequence change

- Monkeys show a negative correlation indicating a segregation of number and sequence patterns

Our hypothesis:• Monkeys can detect abstract features such as

« four tones » or « one item is different ».• But only humans can conceive of an abstract

representation of the entire pattern, such as « 3 tones, then another one ».

three tones one other

then

Part A Part B

Conclusionsand new hypotheses

• Distinct parallel circuits seem to encode the complexstructures underlying language and mathematics.

• The language circuit also responds to elementaryauditory sequences.

• Both circuits may be uniquely developed in humans.• Monkeys possess sophisticated capacities for

representing the abstract numerical and sequencepatterns of auditory series.

• Only humans possess cortical circuitry in the IFG and pSTS capable of integrating this information.

• Hypothesis: The human IFG may operate as a binding system that integrates representations from otherbrain areas and transiently « merges » them intocomplex, nested tree structures.

• In the future, this method should allow us to compare the neural codes for sequential and spatial structures in monkeys and humans.

Core syntax Semantics

Pallier, Devauchelle & Dehaene, PNAS 2011

Math

General knowledge

Thank you for your attention!

Christophe Pallier Liping Wang Bechir Jarraya

Marie Amalric

sequences, patterns and rules: brain mechanisms of syntax in ... challenges in...jabberwocky...

Documents