on the non-context-freeness of romanian
TRANSCRIPT
On the non-context-freeness of Romanian
Nicholas Longenbaugh
Harvard College
October 2, 2013
Nicholas Longenbaugh October 2, 2013 1 / 34
Overview
1 Formal language theoryComplexity and the Chomsky hierarchyWhy weak generative capacity?Weak generative capacity of natural language
2 Romanian is weakly non-context-freeWh-elements in RomanianAccount
3 Weak non-context-freeness in perspectiveSwiss-GermanSwedishBambaraRomanian
4 Conclusion
Nicholas Longenbaugh October 2, 2013 2 / 34
Formal language theory Complexity and the Chomsky hierarchy
Complexity and the Chomsky hierarchy.
Complexity: how sophisticated are the rules we need to generate orrecognize the strings of a particular language?
We can characterize languages according to their complexity
Regular: {an : n ≥ 0}.Context free: {anbn : n ≥ 0}.Context sensitive: {anbncn : n ≥ 0}.
Nicholas Longenbaugh October 2, 2013 3 / 34
Formal language theory Complexity and the Chomsky hierarchy
Complexity and the Chomsky hierarchy
For each of these classes, there is associated a cannonical formalism:
Regular: finite state machinesContext free: context free grammarsContext sensitive: linear bounded nondeterministic Turing machineRecursively enumerable: enumerative Turing machine
Generative capacity: for a particular grammar or other languagetheoretic device, two relevant concepts
Strong generative capacity: the set of structures that can begeneratedWeak generative capacity: the set of strings that can be generated
Q: what is the generative capacity of natural languge?
Focus on weak generative capacity
Nicholas Longenbaugh October 2, 2013 4 / 34
Formal language theory Why weak generative capacity?
Why care about weak generative capacity?
Constrain our models & rule out approaches that aren’t formallycomplex enough
Gain insight into the formal properties of the language faculty
Limits on the parsing/processing tools in the brainLimits on language production
Complexity parameter view (Deutscher (2010), Everett (2005), Givon(2009), Heine and Kuteva (2007), Wray and Grace (2007), Sauerland(to appear))
Nicholas Longenbaugh October 2, 2013 5 / 34
Formal language theory Weak generative capacity of natural language
Weak generative capacity of natural language
Claim: natural language is weakly and strongly non-context-free
Various “proofs” over the years (Bar-Hillel and Shamir (1960),Chomsky (1959), Postal (1964), Elster (1978)), all invalid (Pullumand Gazdar, 1982)
Valid proofs of weak non-context-freeness in various languages:
Shieber (1985): Swiss-GermanCuly (1985): BambaraMiller (1991): SwedishHigginbotham (1987): English
Nicholas Longenbaugh October 2, 2013 6 / 34
Formal language theory Weak generative capacity of natural language
Weak generative capacity of natural language
Claim: natural language is weakly and strongly non-context-free
Various “proofs” over the years (Bar-Hillel and Shamir (1960),Chomsky (1959), Postal (1964), Elster (1978)), all invalid (Pullumand Gazdar, 1982)
Valid proofs of weak non-context-freeness in various languages:
Shieber (1985): Swiss-GermanCuly (1985): BambaraMiller (1991): SwedishHigginbotham (1987) : English NOPE!
Claim: Romanian is also weakly non-context-free
Nicholas Longenbaugh October 2, 2013 7 / 34
Romanian is weakly non-context-free Wh-elements in Romanian
Case morphology on Romanian wh-elements
Romanian overtly distinguishes accusative and dative case onwh-elements
(1)
Accusative Dativewho (pe) cine cuiwhich (pe) care carui
Nicholas Longenbaugh October 2, 2013 8 / 34
Romanian is weakly non-context-free Wh-elements in Romanian
Verb subcategorization
We can classify verbs as to whether they select accusative or dativecomplements
roga, to ask, and lasa, to allow, require accusative complements
(2) a. Pe cinewho.acc
aihave
rugat?asked
“Who did you ask?”b. *Cui
who.dati-aicl-have
rugat?asked
(3) a. Pe cinewho.acc
aihave
lasatallowed
sato
vizitezevisit
Bucurestiul?Bucharest
“Who did you allow to visit Bucharest?”b. *Cui
who.dati-aicl-have
lasatallowed
sato
vizitezevisit
Bucurestiul?Bucharest
Nicholas Longenbaugh October 2, 2013 9 / 34
Romanian is weakly non-context-free Wh-elements in Romanian
Verb subcategorization
We can classify verbs as to whether they select accusative or dativecomplements
spune, to tell, and scrie, to write, subcategorize for a dativecomplement
(4) a. Cuiwho.dat
i-aicl-have
spus?told
“Who did you tell?”b. *Pe cine
who.accaihave
spus?told
(5) a. Cuiwho.dat
i-aicl-have
scris?written
“Who did you write to?”b. *Pe cine
who.accaihave
scris?written
Nicholas Longenbaugh October 2, 2013 10 / 34
Romanian is weakly non-context-free Wh-elements in Romanian
Multiple wh-questions
In sentence with multiple wh-elements, all must be extracted tointerrogative clause initial position
True even when the wh-elements originate in different clauses!
(6) a. Pe cineiwho.acc
cuijwho.dat
aihave
rugatasked
ti sa-ito-cl
spunatell
tj povestea?story
“Who did you ask to tell who the story?”b. *Pe cine ai rugat sa-i spuna cui povestea?c. *Cui ai rugat pe cine sa-i spuna povestea?d. *Ai rugat pe cine sa-i spuna cui povestea?
Nicholas Longenbaugh October 2, 2013 11 / 34
Romanian is weakly non-context-free Wh-elements in Romanian
Multiple wh-questions
(7) a. [Pe carewhich.acc
baiat]iboy.acc
[caruiwhich.dat
fete]jgirl.dat
l-aicl-have
rugatasked
ti sa-ito-cl
spunatell
tj
povestea?story?“Which boy did you ask to tell which girl the story?”
b. Pe cineiwho.acc
cuijwho.dat
aihave
lasatallowed
ti sa-ito-cl
scriewrite
tj scrisoarea?letter?
“Who did you allow to write who the letter?”c. [Pe care
Which.accfata]igirl.acc
[caruiwhich.dat
baiat]jboy.dat
aihave
lasat-oallowed-cl
ti sa-ito-cl
scriewrite
tj scrisoarea?letter
“Which girl did you allow to write which boy the letter?”
Nicholas Longenbaugh October 2, 2013 12 / 34
Romanian is weakly non-context-free Wh-elements in Romanian
Multiple wh-questions
We aren’t limited to two wh-elements either:
(8) Pe cineiwho.acc
cuijwho.dat
lawith
cekwhat
aihave
vrutwanted
sato
lasiallow
ti sa-ito-cl
spunatell
tj tk?
“Who did you want to allow to tell who what?”
(9) Pe cineiwho.acc
pe cinejwho.acc
cuikwho.dat
aihave
vrutwanted
sato
lasiallow
ti sato
laseallow
tj sa-ito-cl
spunatell
tk povesta?story
“Who did you want to allow to allow who to tell who the story?”
(10) [Pe carewhich.acc
baiat]iboy.acc
[careiwhich.dat
fete]jgirl.dat
[careiwhich.dat
femei]kwoman.dat
aihave
vrutwanted
sa-lto-cl
lasiallow
ti sa-ito-cl
spunatell
tj sa-ito-cl
scriewrite
tk scrisoarea?letter
“Which boy have you wanted to allow to tell which girl to write which womanthe letter?”
Nicholas Longenbaugh October 2, 2013 13 / 34
Romanian is weakly non-context-free Account
Summary
ruga, lasa and spune, scrie subcategorize for only accusative and onlydative complements, respectively
All wh-elements must be extracted to clause initial position
Conclusion: Romanian permits structures of the following form
(11) wh-element.accn wh-element.datm you wanted verb1.accn verb2.datm
something.
Or, explicitly:
(12) (Pe cine)n
who.accn(cui)m
who.datmaihave
vrutwanted
(sa(to-cl
rogi)n
ask)n(sa-i(to-cl
spuna)m
tell)mpovestea?story
“Who did you want to ask to ask who . . . to tell who to tell who the story?”
Nicholas Longenbaugh October 2, 2013 14 / 34
Romanian is weakly non-context-free Account
Strategy
Homomorphism: function from strings (words) to symbols (letters)
f (romanian) = a
Intersection of language L and language L′
L ∩ L′ = {w : w ∈ L and w ∈ L′}: everything in L and in L′
Context-free languages are closed under i) image underhomomorphism, ii) intersection with regular languages
Apply a homomorphism to Romanian to simplify the representation(words 7→ symbols)
Intersect Romanian with a regular language to filter out only thestrings in (12)
Arrive at the non-context-free language anbmcndm
Conclusion: Romanian is weakly non-context-free
Nicholas Longenbaugh October 2, 2013 15 / 34
Romanian is weakly non-context-free Account
Account
Denote the set of all strings of Romanian as Romanian
Recall that all strings of the following form are in Romanian
(13) (Pe cine)n
who.accncuim
who.datmaihave
vrutwanted
(sa(to-cl
rogi)n
ask)n(sa-i(to-cl
spuna)m
tell)mpovestea?story
“Who did you want to ask to ask who . . . to tell who to tell who the story?”
Nicholas Longenbaugh October 2, 2013 16 / 34
Romanian is weakly non-context-free Account
Account
Define the following homomorphism
(14) f (w) =
a if w = pe cine
b if w = cui
c if w = sa rogi
d if w = sa-i spuna
ε if w = anything else
Take intersection of image of Romanian under f with the regularlanguage L = a∗b∗c∗d∗
f (Romanian) ∩ L = anbmcndm
Nicholas Longenbaugh October 2, 2013 17 / 34
Romanian is weakly non-context-free Account
Account
f (Romanian) ∩ L = anbmcndm
(15) Context-free pumping lemma: If L is a context free language,there exists some length p such that for all w ∈ L, if |w | ≥ p,then w may be split into five pieces u, v , x , y , z such thatw = uvxyz and for all i ≥ 0, uv ixy iz ∈ L.
It can be proved that anbmcnbm cannot be pumped ⇒ it is notcontext-free
If context-free languages are closed under homomorphism and underintersection with regular languages, then if Romanian is context-freeanbmcndm should be context free
Conclusion: Romanian is non-context-free
Nicholas Longenbaugh October 2, 2013 18 / 34
Weak non-context-freeness in perspective
Other weak non-context-freeness arguments
Q: why should we care about romanian?
We don’t need special phonological or morphological processes
Just pure A′-movement
Claim: all the other arguments rely on special phonological ormorphological processes
Nicholas Longenbaugh October 2, 2013 19 / 34
Weak non-context-freeness in perspective
Other weak non-context-freeness arguments
Swiss-German: reordering during the PF linearization process
Swedish: obligatory spelling out of wh-traces
Bambara: two morphological processes (noun reduplication,agglutinative agentive construction)
Nicholas Longenbaugh October 2, 2013 20 / 34
Weak non-context-freeness in perspective Swiss-German
Swiss-German
Shieber (1985) proved Swiss-German to be non-context-free based onthe following structures (proof details are almost identical to ours)
(16) . . . obj.accn obj.datm have wanted verb.accn verb.datm
(17) . . . mer. . . we
d’chindn
the children.accnem Hansm
Hans.datmesthe
husshouse.acc
haendhave
welewanted
laan
letn
halfem
helpmaastriichepaint
“. . . we have wanted to let the children help Hans let the children help Hans. . . paint the house” (Shieber, 1985)
This order of complements/verbs is only attested in Swiss-Germanand Dutch
Where does it come from?
Nicholas Longenbaugh October 2, 2013 21 / 34
Weak non-context-freeness in perspective Swiss-German
Swiss-German
(18) . . . obj.accn obj.datm have wanted verb.accn verb.datm
(19) . . . mer. . . we
d’chindn
the children.accnem Hansm
Hans.datmesthe
husshouse.acc
haendhave
welewanted
laan
letn
halfem
helpmaastriichepaint
“. . . we have wanted to let the children help Hans let the children help Hans. . . paint the house” (Shieber, 1985)
West-Germanic is head final, at least at VP level
(20) . . . [VP OBJ1 [. . . [VP OBJ2 [. . . [VP OBJ3 V3 ] ] V2 ] ] V1 ]
But we need (1-2-3-1-2-3) not (1-2-3-3-2-1)
Head movement!
Nicholas Longenbaugh October 2, 2013 22 / 34
Weak non-context-freeness in perspective Swiss-German
Swiss-German
Underlying (3-2-1) order can be changed: (1-3-2), (1-2-3), etc.
If this is feature motivated head movement, some questions:
Why does the movement not affect the semantics like other types ofverb movement?Why aren’t there any phonological or morphological reflexes of thefeatures?What are the features that motivate this movement and why arecertain orders precluded (3-1-2)?
Wurmbrand (2004, 2006, 2012): verb reordering is not headmovement and occurs after Spell Out
Nicholas Longenbaugh October 2, 2013 23 / 34
Weak non-context-freeness in perspective Swiss-German
Swiss-German
Reordering is post-syntactic (Wurmbrand, 2004)
Overview:
In West-Germanic, there is a post-syntactic process, the infinitivus proparticipio (IPP)Have + modal: modal appears as infinitive not participleIPP is distinctly post-syntactic
Participles in German are ambiguous between simple past andperfective tenses, infinitives are notIPP modals are ambiguous ⇒ they are interpreted as participles
Finally, IPP feeds reordering, not vice versa
Conclusion: Swiss-German data depends on a post-syntacticreordering operation
Nicholas Longenbaugh October 2, 2013 24 / 34
Weak non-context-freeness in perspective Swedish
Swedish
Miller (1991) proved Swedish weakly non-context-free with thefollowing structures
(21) Harhere
aris
{M,F ,Pl}{M,F ,Pl}
somthat
jagI
undrarwonder
({vilken({which
M,M,
vilkenwhich
F ,F ,
vilkawhich
Pl}Pl}
SgSg
undrar)+
wonder)+vilkenwhich
pojkeboy
{han{he,
honshe,
de}they}
troddethought
(att(to
{han,{he,
hon,she,
de}they}
trodde)+
thought)∗{han,{he,
hon,she,
de}they}
hadehad
rekommenderatrecomended
tillto
studenterna.student
“Here is the M/F/Pl that I wonder (which M/F/Pl Sg wonders)+ which boyhe/she/they thought (that he/she/they thought)∗ he/she/they hadrecommended to the students.
Each of the han, hon, de are resumptive pronouns that are obligatorilyinserted
Nicholas Longenbaugh October 2, 2013 25 / 34
Weak non-context-freeness in perspective Swedish
Swedish
Condition on the resumption of gaps in Swedish:
(22) Given a gap G1 and its filler F1, G1 must be realized as a resumptive pronoun ifthere is a gap G2 following G1 such that the filler F2 of G2 follows F1. (Miller,1991)
The han, hon, de are all mandatory in (21) (G2 here is the gap afterrekommenderat)
Claim: Swedish resumptive pronouns are just spelled out wh-traces insome cases
Without this, (21) is the same as English
Nicholas Longenbaugh October 2, 2013 26 / 34
Weak non-context-freeness in perspective Swedish
Swedish
Engdahl (1985) demonstrates that resumptive pronouns in Swedishare just spelled out A′-traces
They can license parasitic gaps:
(23) Detit
varwas
denthat
fangeni
prisonersomthat
lakarnathe-doctors
intenot
kindecould
avgoradecide
[′C omif
hani
he
verkligenreally
varwas
sjuk]ill
[utanwithout
attto
talatalk
medwith
p personligen].in person
“This is the prison that the doctors couldn’t determine if he really was illwithout talking to in person” (Engdahl, 1985)
Parasitic gaps are licensed by the following structural configuration (αis an element in A′-position, t is a variable bound by α, p is theparasitic gap, condition holds at Spell Out)
(24) . . . α . . . t . . . p (order irrelevant) (Chomsky, 1982)
Nicholas Longenbaugh October 2, 2013 27 / 34
Weak non-context-freeness in perspective Swedish
Swedish
Parasitic gaps are licensed by the following structural configuration (αis an element in A′-position, t is a variable bound by α, p is theparasitic gap, condition holds at Spell Out)
(25) . . . α . . . t . . . p (order irrelevant) (Chomsky, 1982)
Conclusion: Swedish resumptive pronouns must be A′-bound atSpell Out
Conclusion: Swedish resumptive pronouns are spelled out A′-traces
Nicholas Longenbaugh October 2, 2013 28 / 34
Weak non-context-freeness in perspective Swedish
Swedish
Conclusion: Swedish resumptive pronouns are spelled out A′-traces
Further evidence: resumptive pronouns can co-occur with A′-traces inATB situations
(26) [Detthere
finnsare
vissacertain
ord]iwords
(som)that
jagI
oftaoften
traffarmeet
pa ti menbut
intenot
minnsremember
hurhow
deithey
stavas.are-spelled
“There are certain words that I often come across but never remember howthey are they are spelled.”
The Swedish argument depends on mandatory spell out of A′-traces,a post-syntactic process
Nicholas Longenbaugh October 2, 2013 29 / 34
Weak non-context-freeness in perspective Bambara
Bambara
Bambara relies on two overtly morphological processes“Noun o noun”: whichever noun
(27) a. wuludog
o wuludog
“whichever dog”b. *malo
riceo wulu
dog(Culy, 1985)
Agentive construction: (T)ransitive (V)erb + (N)noun = “one whoTVs Ns”
(28) a. wuludog
+ nyinisearch for
+ la = wulunyinina
“One who searches for dogs”b. malo
rice+ file
watch+ la = malofilela
“one who watches rice” (Culy, 1985)
Nicholas Longenbaugh October 2, 2013 30 / 34
Weak non-context-freeness in perspective Bambara
Bambara
Agentive construction is recursive:
(29) a. wulunyininadog searcher
+ nyinisearch for
+ la = wulunyininanyinina
“one who searches for dog searchers”b. malofilela
rice watcher+ file
watch+ la = malofilelafilela
“One who watches rice watchers”(Culy, 1985)
Agentive construction can feed “noun o noun”
Non-context-freeness is based on structure below
(30) wulu(filela)n(nyinina)m
dog(watcher)n(searcher)mo wulu(filela)n(nyinina)m
dog(watcher)n(searcher)m
Conclusion: Bambara weak non-context-freeness depends onmorphological processes
Nicholas Longenbaugh October 2, 2013 31 / 34
Weak non-context-freeness in perspective Romanian
Romanian
In Swiss-German, Swedish, and Bambara, data for weaknon-context-freeness argument depends on post- or extra-syntacticprocesses
Q: How can we be sure that the Romanian wh-elements aren’treordered post-syntactically?
A: superiority
Romanian respects superiority constraints (Boskovic, 2002)
(31) a. pe cinewho.acc
cuiwho.dat
aihave
vrutwanted
sato
lasiallow
sa-ito
spunatell
povestea?story
“Who did you want to allow to tell who the story?”b. ?*cui pe cine ai vrut sa lasi sa-i spuna povestea?
Nicholas Longenbaugh October 2, 2013 32 / 34
Conclusion
Conclusion
Based on overt Case morphology & multiple wh-extraction, Romanianis weakly non-context-free
Romanian provides a fundamentally new way wherein naturallanguage exceeds the generative power of context-free grammars
Romanian differs from previously attested examples in not relying onpost- or extra-syntactic operations
Nicholas Longenbaugh October 2, 2013 33 / 34
Conclusion
Yehoshua Bar-Hillel and Eliyahu Shamir. Finite-state languages: Formal representations and adequacy problems. WeizmannScience Press of Israel, 1960.
Zeljko Boskovic. On multiple wh-fronting. Linguistic Inquiry, 33(3):351–383, 2002.
Noam Chomsky. On certain formal properties of grammars. Information and control, 2(2):137–167, 1959.
Noam Chomsky. Some concepts and consequences of the theory of government and binding, volume 6. MIT press, 1982.
Christopher Culy. The complexity of the vocabulary of bambara. Linguistics and Philosophy, 8(3):345–351, 1985.
Guy Deutscher. The unfolding of language. Random House, 2010.
Jon Elster. Logic and society: Contradictions and possible worlds. Wiley New York, 1978.
Elisabet Engdahl. Parasitic gaps, resumptive pronouns, and subject extractions. Linguistics, 23(1):3–44, 1985.
Daniel L Everett. Cultural constraints on grammar and cognition in Piraha. Current Anthropology, 46(4):621–646, 2005.
Talmy Givon. The genesis of syntactic complexity: Diachrony, ontogeny, neuro-cognition, evolution. John Benjamins, 2009.
Bernd Heine and Tania Kuteva. The genesis of grammar. Newsletter, page 440, 2007.
James Higginbotham. English is not a context-free language. In The Formal Complexity of Natural Language, pages 335–348.Springer, 1987.
Philip H Miller. Scandinavian extraction phenomena revisited: Weak and strong generative capacity. Linguistics and Philosophy,14(1):101–113, 1991.
Paul M Postal. Limitations of phrase structure grammars. The structure of language, pages 137–151, 1964.
Geoffrey K Pullum and Gerald Gazdar. Natural languages and context-free languages. Linguistics and Philosophy, 4(4):471–504,1982.
Stuart M Shieber. Evidence against the context-freeness of natural language. Linguistics and Philosophy, 8(3):333–343, 1985.
Alison Wray and George W Grace. The consequences of talking to strangers: Evolutionary corollaries of socio-cultural influenceson linguistic form. Lingua, 117(3):543–578, 2007.
Susi Wurmbrand. Syntactic vs. post-syntactic movement. In Sophie Burelle and Stanca Somesfalean, editors, Proceedings of the2003 Annual Meeting of the Canadian Linguistic Association (CLA), pages 284–295, 2004.
Susi Wurmbrand. Verb clusters, verb raising, and restructuring. The Blackwell companion to syntax, pages 229–343, 2006.
Susi Wurmbrand. Parasitic participles: Evidence for the theory of verb clusters. Taal en Tongval, 2012.
Nicholas Longenbaugh October 2, 2013 34 / 34