semantic complexity and corpus analysis...summary camilo thorne semantic complexity and corpus...

69
Semantic Complexity and Corpus Analysis Camilo Thorne Institut f¨ ur Maschinelle Sprachverarbeitung Universit¨ at Stuttgart [email protected] CoSaQ Worskshop, Amsterdam, 28–29.9.17 Camilo Thorne Semantic Complexity and Corpus Analysis 1/42

Upload: others

Post on 12-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Semantic Complexity and Corpus Analysis

Camilo Thorne

Institut fur Maschinelle SprachverarbeitungUniversitat Stuttgart

[email protected]

CoSaQ Worskshop, Amsterdam, 28–29.9.17

Camilo Thorne Semantic Complexity and Corpus Analysis 1/42

Page 2: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Camilo Thorne Semantic Complexity and Corpus Analysis 2/42

Page 3: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Outline

MotivationComplexity in LanguageResearch Question(s)

Quantifier DistributionAristotelian, Counting and Proportional QuantifiersCorpus Analysis

Fragment DistributionThe Fragments of EnglishCorpus Analysis

Summary

Camilo Thorne Semantic Complexity and Corpus Analysis 3/42

Page 4: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation

Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Page 5: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Cognitive Complexity

“The mind is a neural computer, fitted by natural selection withcombinatorial algorithms for causal and probabilistic reasoning aboutplants, animals, objects, and people.”

Steven Pinker, 1997, How the Mind Works

“(...) if we adopt a computational approach to the study of cognitivephenomena, (...) a notion of tractable competence can be developed,that is, a notion of competence constrained by considerations oncomputational tractability.”

Frixione, 2011, Tractable Competence

Camilo Thorne Semantic Complexity and Corpus Analysis 5/42

Page 6: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Complexity in Language

I Cognitive complexity is mirrored by complexity in natural language(s)

1. surface form/syntactic complexity ⇒length of utterance, size of parse tree, etc.

2. Kolmogorov complexity ⇒size of smallest Turing machine

. . .

3 semantic complexity ⇒complexity of verifying truth conditions as function of (finite) model size

, Semantic complexity seems to mirror best cognitive complexity

Camilo Thorne Semantic Complexity and Corpus Analysis 6/42

Page 7: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Complexity in Language

I Cognitive complexity is mirrored by complexity in natural language(s)

1. surface form/syntactic complexity ⇒length of utterance, size of parse tree, etc.

2. Kolmogorov complexity ⇒size of smallest Turing machine

. . .

3 semantic complexity ⇒complexity of verifying truth conditions as function of (finite) model size

, Semantic complexity seems to mirror best cognitive complexity

Camilo Thorne Semantic Complexity and Corpus Analysis 6/42

Page 8: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Complexity in Language

I Cognitive complexity is mirrored by complexity in natural language(s)

1. surface form/syntactic complexity ⇒length of utterance, size of parse tree, etc.

2. Kolmogorov complexity ⇒size of smallest Turing machine

. . .

3 semantic complexity ⇒complexity of verifying truth conditions as function of (finite) model size

, Semantic complexity seems to mirror best cognitive complexity

Camilo Thorne Semantic Complexity and Corpus Analysis 6/42

Page 9: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Complexity in Language

I Cognitive complexity is mirrored by complexity in natural language(s)

1. surface form/syntactic complexity ⇒length of utterance, size of parse tree, etc.

2. Kolmogorov complexity ⇒size of smallest Turing machine

. . .

3 semantic complexity ⇒complexity of verifying truth conditions as function of (finite) model size

, Semantic complexity seems to mirror best cognitive complexity

Camilo Thorne Semantic Complexity and Corpus Analysis 6/42

Page 10: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Complexity in Language

I Cognitive complexity is mirrored by complexity in natural language(s)

1. surface form/syntactic complexity ⇒length of utterance, size of parse tree, etc.

2. Kolmogorov complexity ⇒size of smallest Turing machine

. . .

3 semantic complexity ⇒complexity of verifying truth conditions as function of (finite) model size

, Semantic complexity seems to mirror best cognitive complexity

Camilo Thorne Semantic Complexity and Corpus Analysis 6/42

Page 11: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Complexity in Language

I Cognitive complexity is mirrored by complexity in natural language(s)

1. surface form/syntactic complexity ⇒length of utterance, size of parse tree, etc.

2. Kolmogorov complexity ⇒size of smallest Turing machine

. . .

3 semantic complexity ⇒complexity of verifying truth conditions as function of (finite) model size

, Semantic complexity seems to mirror best cognitive complexity

Camilo Thorne Semantic Complexity and Corpus Analysis 6/42

Page 12: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Picture Verification Task [Szy09]

Camilo Thorne Semantic Complexity and Corpus Analysis 7/42

Page 13: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Mtivation – Picture Verification Task

p < 0.05p < 0.001

Parity: “exactly 2” Cardinal: all the other counting quantifiers

Camilo Thorne Semantic Complexity and Corpus Analysis 8/42

Page 14: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Hypothesis

We can approximate the distribution of linguistic structures by analyzing (large)corpora

Hypothesis 1Linguistic structures in are distributed w.r.t. semantic complexity

Hypothesis 2The distribution is biased towards low complexity structures

Camilo Thorne Semantic Complexity and Corpus Analysis 9/42

Page 15: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Hypothesis

We can approximate the distribution of linguistic structures by analyzing (large)corpora

Hypothesis 1Linguistic structures in are distributed w.r.t. semantic complexity

Hypothesis 2The distribution is biased towards low complexity structures

Camilo Thorne Semantic Complexity and Corpus Analysis 9/42

Page 16: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Hypothesis

We can approximate the distribution of linguistic structures by analyzing (large)corpora

Hypothesis 1Linguistic structures in are distributed w.r.t. semantic complexity

Hypothesis 2The distribution is biased towards low complexity structures

Camilo Thorne Semantic Complexity and Corpus Analysis 9/42

Page 17: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

QuantifierDistribution

Camilo Thorne Semantic Complexity and Corpus Analysis 10/42

Page 18: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Aristotelian, Counting and Proportional Quantifiers

...

each wordle is a cloud

most wordles contain some kind of message

fewer than 1/2 of wordles suck

few wordles solve mathematical problems

exactly one wordle is contained in these slides

...

Camilo Thorne Semantic Complexity and Corpus Analysis 11/42

Page 19: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Aristotelian, Counting and Proportional Quantifiers

...

each wordle is a cloud

most wordles contain some kind of message

fewer than 1/2 of wordles suck

few wordles solve mathematical problems

exactly one wordle is contained in these slides

...

Camilo Thorne Semantic Complexity and Corpus Analysis 11/42

Page 20: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

L-Expressibility and Complexity [Szy09]

Definition (L-Expressibility)Generalized quantifier Q over domain ∆ is expressible in logic L iff there exists aformula such that

(R1, . . . , Rn) ∈ Q ⇔ (∆, ·I) |= Q(R1, . . . , Rn)

I The semantic complexity of generalized quantifier Q is the cost of computing

(∆, ·I) |= Q(R1, . . . , Rn)

I We measure computational cost in #(∆): data complexity

3 Example: “each”

1. is Fo-expressible as: (A,B) ∈ JeachK⇔ (∆, ·I) |= ∀x(A(x)→ B(x))2. has Space(log2(∆)) semantic complexity

Camilo Thorne Semantic Complexity and Corpus Analysis 12/42

Page 21: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

L-Expressibility and Complexity [Szy09]

Definition (L-Expressibility)Generalized quantifier Q over domain ∆ is expressible in logic L iff there exists aformula such that

(R1, . . . , Rn) ∈ Q ⇔ (∆, ·I) |= Q(R1, . . . , Rn)

I The semantic complexity of generalized quantifier Q is the cost of computing

(∆, ·I) |= Q(R1, . . . , Rn)

I We measure computational cost in #(∆): data complexity

3 Example: “each”

1. is Fo-expressible as: (A,B) ∈ JeachK⇔ (∆, ·I) |= ∀x(A(x)→ B(x))2. has Space(log2(∆)) semantic complexity

Camilo Thorne Semantic Complexity and Corpus Analysis 12/42

Page 22: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

L-Expressibility and Complexity [Szy09]

Definition (L-Expressibility)Generalized quantifier Q over domain ∆ is expressible in logic L iff there exists aformula such that

(R1, . . . , Rn) ∈ Q ⇔ (∆, ·I) |= Q(R1, . . . , Rn)

I The semantic complexity of generalized quantifier Q is the cost of computing

(∆, ·I) |= Q(R1, . . . , Rn)

I We measure computational cost in #(∆): data complexity

3 Example: “each”

1. is Fo-expressible as: (A,B) ∈ JeachK⇔ (∆, ·I) |= ∀x(A(x)→ B(x))2. has Space(log2(∆)) semantic complexity

Camilo Thorne Semantic Complexity and Corpus Analysis 12/42

Page 23: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

L-Expressibility and Complexity [Szy09]

Definition (L-Expressibility)Generalized quantifier Q over domain ∆ is expressible in logic L iff there exists aformula such that

(R1, . . . , Rn) ∈ Q ⇔ (∆, ·I) |= Q(R1, . . . , Rn)

I The semantic complexity of generalized quantifier Q is the cost of computing

(∆, ·I) |= Q(R1, . . . , Rn)

I We measure computational cost in #(∆): data complexity

3 Example: “each”

1. is Fo-expressible as: (A,B) ∈ JeachK⇔ (∆, ·I) |= ∀x(A(x)→ B(x))2. has Space(log2(∆)) semantic complexity

Camilo Thorne Semantic Complexity and Corpus Analysis 12/42

Page 24: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Complexity Ranking [TS15, ST17]

Q Semantics ⊆ P(∆)× P(∆) DC Examplesome {(A,B) | A ∩B 6= ∅} LSpace

arisome men are happy

all {(A,B) | A ⊆ B} LSpace all humans are mammalsno {(A,B) | A ∩B = ∅} LSpace no humans are spiders

≥ k {(A,B) | #(A ∩B) > k} LSpacecnt

more than 5 men are happy≤ k {(A,B) | #(A ∩B) < k} LSpace fewer than 100 violins

are Stradivari

most {(A,B) | #(A ∩B) > #(A \B)} Ppro

most trains are safefew {(A,B) | #(A ∩B) < #(A \B)} P few people are trustworthy

≥ p/k {(A,B) | #(A ∩B) > p · (#(A)/k)} P more than 2/3 of planetsare lifeless

≤ p/k {(A,B) | #(A ∩B) < p · (#(A)/k)} P less than 1/3 of Americansare rich

ari: Aristotelian quantifiers cnt: counting quantifiers pro: proportional quantifiers

Question 1Does complexity influence quantifier distribution in (large) corpora?

Camilo Thorne Semantic Complexity and Corpus Analysis 13/42

Page 25: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Complexity Ranking [TS15, ST17]

Q Semantics ⊆ P(∆)× P(∆) DC Examplesome {(A,B) | A ∩B 6= ∅} LSpace

arisome men are happy

all {(A,B) | A ⊆ B} LSpace all humans are mammalsno {(A,B) | A ∩B = ∅} LSpace no humans are spiders

≥ k {(A,B) | #(A ∩B) > k} LSpacecnt

more than 5 men are happy≤ k {(A,B) | #(A ∩B) < k} LSpace fewer than 100 violins

are Stradivari

most {(A,B) | #(A ∩B) > #(A \B)} Ppro

most trains are safefew {(A,B) | #(A ∩B) < #(A \B)} P few people are trustworthy

≥ p/k {(A,B) | #(A ∩B) > p · (#(A)/k)} P more than 2/3 of planetsare lifeless

≤ p/k {(A,B) | #(A ∩B) < p · (#(A)/k)} P less than 1/3 of Americansare rich

ari: Aristotelian quantifiers cnt: counting quantifiers pro: proportional quantifiers

Question 1Does complexity influence quantifier distribution in (large) corpora?

Camilo Thorne Semantic Complexity and Corpus Analysis 13/42

Page 26: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Complexity Class Hierarchy (Simplified)

Camilo Thorne Semantic Complexity and Corpus Analysis 14/42

Page 27: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

The WaCkY Corpus [BBFZ09]

<s>Flender Flender NP 1 3 VMODWerke Werke NP 2 3 SBJwas be VBD 3 0 ROOTa a DT 4 7 NMODGerman German JJ 5 7 NMODshipbuilding shipbuilding NN 6 7 NMODcompany company NN 7 3 PRD, , , 8 7 Plocated locate VVN 9 7 NMODin in IN 10 9 ADVLubeck Lubeck NP 11 10 PMOD. . SENT 12 0 ROOT</s>

Sentences Tokens SourceWaCkY (Eng) ∼ 43 million ∼ 800 million Wikipedia (EN, 2008)

Camilo Thorne Semantic Complexity and Corpus Analysis 15/42

Page 28: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Corpus Analysis

I We built a list of simple patterns to identify and count

1. Aristotelian quantifiers: all, some, no2. counting quantifiers: less (more) than k3. proportional quantifiers: most, few, less (more) than p/k

I Examples:

most = most/dt, most/jjs [a-z]{1,12}/nns,

most/rbs [a-z]{1,12}/nns,more/rbr than/in half/nn,more/jjr than/in half/nn

some = some/det

I We seeked to understand how much their frequency is influenced bysemantic complexity (and other features)

Camilo Thorne Semantic Complexity and Corpus Analysis 16/42

Page 29: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Corpus Analysis

I We built a list of simple patterns to identify and count

1. Aristotelian quantifiers: all, some, no2. counting quantifiers: less (more) than k3. proportional quantifiers: most, few, less (more) than p/k

I Examples:

most = most/dt, most/jjs [a-z]{1,12}/nns,

most/rbs [a-z]{1,12}/nns,more/rbr than/in half/nn,more/jjr than/in half/nn

some = some/det

I We seeked to understand how much their frequency is influenced bysemantic complexity (and other features)

Camilo Thorne Semantic Complexity and Corpus Analysis 16/42

Page 30: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Frequency Distribution by Feature [ST17]

Camilo Thorne Semantic Complexity and Corpus Analysis 17/42

Page 31: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

GLMs

Definition (Generalized Linear Model (GLM))A (multifactor) generalized linear model (GLM) has the form

f(y(j)) = θ1x(j)1 + · · ·+ θkx

(j)k + θk+1

1. f : R→ R is a link function (usually: ln(·))

2. Y ∼ D, with D an arbitrary distribution

3. Xi’s can be random effects (mixed model)

, Negative binomial: assumes that Y ∼ NB(r, p) i.e., a negativebinomial GLM will assume that frequency decreases geometrically

Camilo Thorne Semantic Complexity and Corpus Analysis 18/42

Page 32: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Results [ST17]

Analysis of DevianceFeature Deviance p-valueLength (words) 47.06% 3.47 · e−10Class 27.29% 5.25 · e−7Type 0.02% 0.97Right mon. 25.65% 1.15 · e−6

I Semantic complexity (GQ class and monotonicity) explain > 50% of errordeviance

I If we consider GQs of different lengths, length has also a significant impact

, GQ distribution skewed towards cheap and short quantifiers

Camilo Thorne Semantic Complexity and Corpus Analysis 19/42

Page 33: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Results [ST17]

Analysis of DevianceFeature Deviance p-valueLength (words) 47.06% 3.47 · e−10Class 27.29% 5.25 · e−7Type 0.02% 0.97Right mon. 25.65% 1.15 · e−6

I Semantic complexity (GQ class and monotonicity) explain > 50% of errordeviance

I If we consider GQs of different lengths, length has also a significant impact

, GQ distribution skewed towards cheap and short quantifiers

Camilo Thorne Semantic Complexity and Corpus Analysis 19/42

Page 34: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Discussion

I Preliminary study to determine if semantic complexity influences quantifierdistribution

I Examined distribution on large encyclopedic corpora (Wikipedia)

I Results indicate that the cheaper a quantifier, the more frequent it is

I Used generalized regression analysis to quantify impact

I But: the class of quantifiers studied here was quite small

Camilo Thorne Semantic Complexity and Corpus Analysis 20/42

Page 35: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

FragmentDistribution

Camilo Thorne Semantic Complexity and Corpus Analysis 21/42

Page 36: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Reasoning Complexity

I Suppose we came across the following argument in some text

Every Italian loves pasta and footballSilvio is Italian

that entails

Silvio loves pasta

I Common-sense reasoning

I Different constructs give rise to different semantic complexity

Question 2Does complexity influence construct distribution in (large) corpora?

Camilo Thorne Semantic Complexity and Corpus Analysis 22/42

Page 37: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Reasoning Complexity

I Suppose we came across the following argument in some text

Every Italian loves pasta and footballSilvio is Italian

that entails

Silvio loves pasta

I Common-sense reasoning

I Different constructs give rise to different semantic complexity

Question 2Does complexity influence construct distribution in (large) corpora?

Camilo Thorne Semantic Complexity and Corpus Analysis 22/42

Page 38: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

The Fragments of English [PHT06]

I The fragments of English are linguistically salient, ambiguity-free subsets ofEnglish [PHT06]

(a) polynomially translate to Fo meaning representations ϕ

(b) basic machinery of formal semantics

I Defined by constraining syntax, semantics and vocabulary

, Define semantic complexity as logical satisfiability!

Camilo Thorne Semantic Complexity and Corpus Analysis 23/42

Page 39: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

The Fragments of English [PHT06]

I The fragments of English are linguistically salient, ambiguity-free subsets ofEnglish [PHT06]

(a) polynomially translate to Fo meaning representations ϕ

(b) basic machinery of formal semantics

I Defined by constraining syntax, semantics and vocabulary

, Define semantic complexity as logical satisfiability!

Camilo Thorne Semantic Complexity and Corpus Analysis 23/42

Page 40: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

The Fragments of English [PHT06]

I The fragments of English are linguistically salient, ambiguity-free subsets ofEnglish [PHT06]

(a) polynomially translate to Fo meaning representations ϕ

(b) basic machinery of formal semantics

I Defined by constraining syntax, semantics and vocabulary

, Define semantic complexity as logical satisfiability!

Camilo Thorne Semantic Complexity and Corpus Analysis 23/42

Page 41: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

The Fragments of English [PHT06]

I The fragments of English are linguistically salient, ambiguity-free subsets ofEnglish [PHT06]

(a) polynomially translate to Fo meaning representations ϕ

(b) basic machinery of formal semantics

I Defined by constraining syntax, semantics and vocabulary

, Define semantic complexity as logical satisfiability!

Camilo Thorne Semantic Complexity and Corpus Analysis 23/42

Page 42: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

The Fragments of English [PHT06]

Fragment Coverage Fo OperatorsCOP(¬) Copula (“is a”), nouns (“man”), intransitive {∀, ∃, (¬)}

P

verbs (“runs”), “every”, “some”names (“Joe”), adjectives (“thin”) (+“not”)

COP(¬)+TV COP(¬) {∀, ∃, (¬)}+transitive verbs (“loves”)

COP(¬)+DTV COP(¬) {∀, ∃, (¬)}+ditransitive verbs (“gives”)

COP(¬)+TV COP(¬)+TV {∀, ∃, (¬)}+DTV + ditransitive verbs

COP¬+Rel COP¬+ {∀, ∃,∧,¬,∨}NP-hard

(“who”, “that”, “which”)“and”, intersective adjectives (+“or”)

COP¬+Rel COP¬+Rel {∀, ∃,∧,¬,∨}+TV +transitive verbs

COP¬+Rel COP¬+Rel {∀, ∃,∧,¬,∨}+DTV +ditransitive verbs

COP¬+Rel COP¬+Rel+TV {∀, ∃,∧,¬,∨}+TV+DTV +ditransitive verbs

Camilo Thorne Semantic Complexity and Corpus Analysis 24/42

Page 43: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

The Fragments of English (Examples)

Fragment Example FoCOP Every politician cheats ∀x(Politician(x)→ Cheat(x))

COP¬ Some philosopher is not trustworthy ∃x(Philosopher(x) ∧ ¬Trusted(x))

COP¬+TV John does not love Luke ¬Loves(John, Luke)

∃xBook(x)∧COP+TV John gives a book to Jane Gives(John, x, Jane)

+DTV Some man likes every candy ∃x(Man(x)∧∀yCandy(x)→ Likes(x, y))

COP¬ Some man who does not cheat ∀x(Man(x) ∧ ¬Cheat(x)+Rel is trustworthy → Trusted(x))

......

...

Camilo Thorne Semantic Complexity and Corpus Analysis 25/42

Page 44: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Semantic Annotation - Boxer [Tho12]

Exploit deep semantic parsers, in particular Boxer 2.0 [Bos08]

I When parsing Wh-questions from the TREC 2008

What is one common element of major religions?

Boxer outputs∃y∃z∃e∃u(card(y, u) ∧ c1num(u)∧ nnumeral1(u) ∧ acommon1(y)∧ nelement1(y) ∧ amajor1(z)∧ nreligions1(z) ∧ nevent1(e)

∧ rof1(y, z))

I ∧ and ∃ co-occur, but not ∨, ¬, or ∀

Camilo Thorne Semantic Complexity and Corpus Analysis 26/42

Page 45: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Semantic Annotation - Boxer [Tho12]

Exploit deep semantic parsers, in particular Boxer 2.0 [Bos08]

I When parsing Wh-questions from the TREC 2008

What is one common element of major religions?

Boxer outputs∃y∃z∃e∃u(card(y, u) ∧ c1num(u)∧ nnumeral1(u) ∧ acommon1(y)∧ nelement1(y) ∧ amajor1(z)∧ nreligions1(z) ∧ nevent1(e)

∧ rof1(y, z))

I ∧ and ∃ co-occur, but not ∨, ¬, or ∀

Camilo Thorne Semantic Complexity and Corpus Analysis 26/42

Page 46: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Semantic Annotation - Boxer [Tho12]

Exploit deep semantic parsers, in particular Boxer 2.0 [Bos08]

I When parsing Wh-questions from the TREC 2008

What is one common element of major religions?

Boxer outputs∃y∃z∃e∃u(card(y, u) ∧ c1num(u)∧ nnumeral1(u) ∧ acommon1(y)∧ nelement1(y) ∧ amajor1(z)∧ nreligions1(z) ∧ nevent1(e)

∧ rof1(y, z))

I ∧ and ∃ co-occur, but not ∨, ¬, or ∀

Camilo Thorne Semantic Complexity and Corpus Analysis 26/42

Page 47: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Classifying Sentences (Rules)

1. Annotate/parse the corpora

2. Consider sentences expressing operators from classes c ⊆ {∧,∃,∀,∨,¬}

3. Observe the frequency of

(a) Four P classes: {∃,∧}, {∃,∧,∀}, {∃,∧,∨} and {∃,∧,∀,∨}

(b) Four NP-hard classes: {∃,∧,¬}, {∃,∧,¬,∀}, {∃,∧,¬,∀,∨} and {¬,∀}

(each class approximates a fragment of English)

4. Study relationships between class frequency fr(c) and class rank orexpressivity rk(c)

Camilo Thorne Semantic Complexity and Corpus Analysis 27/42

Page 48: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Classifying Sentences (Rules)

1. Annotate/parse the corpora

2. Consider sentences expressing operators from classes c ⊆ {∧,∃,∀,∨,¬}

3. Observe the frequency of

(a) Four P classes: {∃,∧}, {∃,∧,∀}, {∃,∧,∨} and {∃,∧,∀,∨}

(b) Four NP-hard classes: {∃,∧,¬}, {∃,∧,¬,∀}, {∃,∧,¬,∀,∨} and {¬,∀}

(each class approximates a fragment of English)

4. Study relationships between class frequency fr(c) and class rank orexpressivity rk(c)

Camilo Thorne Semantic Complexity and Corpus Analysis 27/42

Page 49: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Classifying Sentences (Rules)

1. Annotate/parse the corpora

2. Consider sentences expressing operators from classes c ⊆ {∧,∃,∀,∨,¬}

3. Observe the frequency of

(a) Four P classes: {∃,∧}, {∃,∧,∀}, {∃,∧,∨} and {∃,∧,∀,∨}

(b) Four NP-hard classes: {∃,∧,¬}, {∃,∧,¬,∀}, {∃,∧,¬,∀,∨} and {¬,∀}

(each class approximates a fragment of English)

4. Study relationships between class frequency fr(c) and class rank orexpressivity rk(c)

Camilo Thorne Semantic Complexity and Corpus Analysis 27/42

Page 50: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Classifying Sentences (Rules)

1. Annotate/parse the corpora

2. Consider sentences expressing operators from classes c ⊆ {∧,∃,∀,∨,¬}

3. Observe the frequency of

(a) Four P classes: {∃,∧}, {∃,∧, ∀}, {∃,∧,∨} and {∃,∧, ∀,∨}

(b) Four NP-hard classes: {∃,∧,¬}, {∃,∧,¬,∀}, {∃,∧,¬, ∀,∨} and {¬,∀}

(each class approximates a fragment of English)

4. Study relationships between class frequency fr(c) and class rank orexpressivity rk(c)

Camilo Thorne Semantic Complexity and Corpus Analysis 27/42

Page 51: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Classifying Sentences (Rules)

1. Annotate/parse the corpora

2. Consider sentences expressing operators from classes c ⊆ {∧,∃,∀,∨,¬}

3. Observe the frequency of

(a) Four P classes: {∃,∧}, {∃,∧, ∀}, {∃,∧,∨} and {∃,∧, ∀,∨}

(b) Four NP-hard classes: {∃,∧,¬}, {∃,∧,¬,∀}, {∃,∧,¬, ∀,∨} and {¬, ∀}

(each class approximates a fragment of English)

4. Study relationships between class frequency fr(c) and class rank orexpressivity rk(c)

Camilo Thorne Semantic Complexity and Corpus Analysis 27/42

Page 52: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Classifying Sentences (Rules)

1. Annotate/parse the corpora

2. Consider sentences expressing operators from classes c ⊆ {∧,∃,∀,∨,¬}

3. Observe the frequency of

(a) Four P classes: {∃,∧}, {∃,∧, ∀}, {∃,∧,∨} and {∃,∧, ∀,∨}

(b) Four NP-hard classes: {∃,∧,¬}, {∃,∧,¬,∀}, {∃,∧,¬, ∀,∨} and {¬, ∀}

(each class approximates a fragment of English)

4. Study relationships between class frequency fr(c) and class rank orexpressivity rk(c)

Camilo Thorne Semantic Complexity and Corpus Analysis 27/42

Page 53: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Corpora

We considered:

I a subset (A: press articles) of the Brown corpus(http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml)

I a subset (Geoquery880) of the Geoquery corpus(http://www.cs.utexas.edu/users/ml/nldata/geoquery.html)

I a corpus of clinical questions(http://clinques.nlm.nih.gov)

I a sample from the TREC 2008 corpus(http://trec.nist.gov)

Corpus Size Domain Type

Brown 19,741 sent. Open (news) DeclarativeGeoquery 364 ques. Geographical Interrogative

Clinical ques. 12,189 ques. Clinical InterrogativeTREC 2008 436 ques. Open Interrogative

Camilo Thorne Semantic Complexity and Corpus Analysis 28/42

Page 54: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Power Laws

Definition (Power law)Random variable X follows a power law if the following relation holds

fr(x(j)) =θ0

rk(x(j))θ1

I Zipf posed that power laws arise in natural language data due to theprinciple of least effort

1. speakers maximize succinctness of utterances conveying a message2. hearers have the dual goal

I N.B. Expressible as a linear model:

fr(x(j)) =θ0

rk(x(j))θ1⇔ ln(fr(x(j))) = ln(θ0)− θ1 ln(rk(x(j)))

Camilo Thorne Semantic Complexity and Corpus Analysis 29/42

Page 55: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Power Laws

Definition (Power law)Random variable X follows a power law if the following relation holds

fr(x(j)) =θ0

rk(x(j))θ1

I Zipf posed that power laws arise in natural language data due to theprinciple of least effort

1. speakers maximize succinctness of utterances conveying a message2. hearers have the dual goal

I N.B. Expressible as a linear model:

fr(x(j)) =θ0

rk(x(j))θ1⇔ ln(fr(x(j))) = ln(θ0)− θ1 ln(rk(x(j)))

Camilo Thorne Semantic Complexity and Corpus Analysis 29/42

Page 56: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Motivation – Power Laws

Definition (Power law)Random variable X follows a power law if the following relation holds

fr(x(j)) =θ0

rk(x(j))θ1

I Zipf posed that power laws arise in natural language data due to theprinciple of least effort

1. speakers maximize succinctness of utterances conveying a message2. hearers have the dual goal

I N.B. Expressible as a linear model:

fr(x(j)) =θ0

rk(x(j))θ1⇔ ln(fr(x(j))) = ln(θ0)− θ1 ln(rk(x(j)))

Camilo Thorne Semantic Complexity and Corpus Analysis 29/42

Page 57: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Power Law Fitting [Tho12]

some+

and

some+

and+

or

some+

and+

all

some+

and+

not

some+

and+

or+all

not+all

some+

and+

not+all

some+

and+

not+or+all0.0

0.2

0.4

0.6

0.8

1.0

Distribution of FO fragments (Boxer)

averages (inc.)

averages (cum.)

trec

clinical

geo

brown

0.00.20.40.60.81.00

1

2

3

4

5

6

7

log-log best fit (Boxer)

best fit (cum.), y = 1.70 - 4.39x, r^2 = 0.96

best fit (incr.), y = 1.68 - 4.09x, r^2 = 0.92

(power law) (R2)

cum: fr(c) = 5.47rk(c)4.39 0.96

means: fr(c) = 5.37rk(c)4.09 0.92

Camilo Thorne Semantic Complexity and Corpus Analysis 30/42

Page 58: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Discussion

I We have experimented with a methodology based on the Boxer semanticparser

I The distribution obtained may seem to indicate that “non-Boolean-closed”(tractable) fragments occur more often than “Boolean-closed” (intractable)fragments

I Power laws with high R2 (> 90) could be derived

I But: worked under the assumption that Boxer returns reasonably accurateresults, and

1. Boxer’s precision and recall are not well-known

2. Boxer blows up the number of existential quantifiers due to Davidsonian eventsemantics

Camilo Thorne Semantic Complexity and Corpus Analysis 31/42

Page 59: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Summary

Camilo Thorne Semantic Complexity and Corpus Analysis 32/42

Page 60: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Summarizing

I Semantic complexity provides formal, logic-based model of key cognitivedimension of natural language

I Cognitive experiments show that it influences language use

I Our work indicates that it also influences the distribution (frequency) of keylanguage constructs, if

1. we approximate such distribution by analyzing corpora2. we run regression analysis to quantify its impact

I The distributions observed are skewed towards low complexity constructs

I These results are promising, but require methodological refinement (higheraccuracy and coverage of semantic analysis)

Camilo Thorne Semantic Complexity and Corpus Analysis 33/42

Page 61: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Thanks!

Camilo Thorne Semantic Complexity and Corpus Analysis 34/42

Page 62: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

References I

Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta.The WaCky Wide Web: A collection of very large linguistically processedweb-crawled corpora.Language Resources and Evaluation, 43(3):209–226, 2009.

Johan Bos.Wide-coverage semantic analysis with Boxer.In Proceedings of the 2008 Conference on Semantics in Text Processing(STEP 2008), 2008.

Ian Pratt-Hartmann and Allan Third.More fragments of language.Notre Dame Journal of Formal Logic, 47(2):151–177, 2006.

Jakub Szymanik and Camilo Thorne.Exploring the relation between semantic complexity andquantifier distributionin large corpora.Language Sciences, 60:80–93, 2017.

Camilo Thorne Semantic Complexity and Corpus Analysis 35/42

Page 63: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

References II

Jakub Szymanik.Quantifiers in Time and Space.Institute for Logic, Language and Computation, 2009.

Camilo Thorne.Studying the distribution of fragments of English using deep semanticannotation.In Proceedings of the ISA-9 Workshop, 2012.

Camilo Thorne and Jakub Szymanik.Semantic complexity of quantifiers and their distribution in corpora.In Proceedings of the 11th International Conference on ComputationalSemantics (IWCS 2015), 2015.

Camilo Thorne Semantic Complexity and Corpus Analysis 36/42

Page 64: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Appendix: Distribution w.r.t. Length

Camilo Thorne Semantic Complexity and Corpus Analysis 37/42

Page 65: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Appendix: Linear Regression (Reminder)

A linear regression model has the form

y(j) = θ0x(j) + θ1

with parameters θ = (θ0, θ1) (a gradient and an intercept)

The least squares method infers from training sample S = {(x(j), y(j))}j∈[1,n] themodel whose parameters θ∗

θ∗ = arg minθJ(θ) = arg min

θ

n∑j=1

[y(j) − (θ0x(j) + θ1)]2

minimize the model’s cost J(θ) (empirical error)

The R2 coefficient provides a measure of confidence in θ∗

R2 =V ar(Xθ)

V ar(Y )

Camilo Thorne Semantic Complexity and Corpus Analysis 38/42

Page 66: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Appendix: Generalized Quantifiers

Given the domain of discourse ∆ = {di | i ∈ N}, a generalized quantifier Q oftype (k1, . . . , kn) is an n-ary relation Q ⊆ P(∆k1)× · · · × P(∆kn)

S

VP

is a cloud

J nuclear scope K

NP

NN

wordle

J restrictor K

DT

each

J Q K

(JwordleK , Jis a cloudK) ∈ JeachK ⇔ JwordleK ⊆ Jis a cloudK

Generalized quantifiers describe conditions/constraints to be met by thedenotations of sentence subjects (restrictor) and predicates (nuclear scope),viz., its “arguments”

Camilo Thorne Semantic Complexity and Corpus Analysis 39/42

Page 67: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Appendix: GQ Features/Predictors

1. Class: Factor encoding GQ class, with three values: “ari” (Aristotelian),“cnt” (counting) and “pro” (proportional)

2. Data complexity (DC): Factor encoding GQ data complexity, with twovalues: “L” (LSpace) and “P” (P)

3. Right monotonicity (MonR): Factor encoding the monotonicity of subjectNP, with three values: “up” (upward monotonic), “down” (downwardmonotonic) and “neither” (non-monotonic)

4. Left monotonicity (MonL): a factor encoding the monotonicity propertiesof object NPs

5. Length in words (Lwords): factor that clusters GQ patterns according tothe minimum number of word tokens

6. Type (Typ): Factor encoding if GQ is superlative or comparative (“comp”,“supe”)

Camilo Thorne Semantic Complexity and Corpus Analysis 40/42

Page 68: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Appendix: GQ Raw Numbers

Q Freq Lwords Class MonL MonR DC FRank≥k 51117 3 cnt ↑ ↑ LSpace 6≤k 5953 3 cnt ↑ ↑ LSpace 7

≥p/k 1608 4 pro ↑ ↓ P 8≤p/k 16 4 pro ↑ ↓ P 13

few 209356 1 pro neither ↓ P 5most 688502 1 pro neither ↑ P 3

no 464755 1 ari ↓ ↓ LSpace 4all 1325639 1 ari ↓ ↓ LSpace 1

some 742134 1 ari ↓ ↑ LSpace 2

I Quantifiers expressed by 32 distinct patterns

I In the analysis we split ≥k and ≥p/k into superlative (= “at least”) andcomparative quantifiers (= “more than”)

Camilo Thorne Semantic Complexity and Corpus Analysis 41/42

Page 69: Semantic Complexity and Corpus Analysis...Summary Camilo Thorne Semantic Complexity and Corpus Analysis 3/42. Motivation Camilo Thorne Semantic Complexity and Corpus Analysis 4/42

Appendix: FOEs & Semantic Complexity [PHT06]

I 6= fragments of English give rise to 6= semantic complexity

I Defined as the computational complexity of checking if their Fo meaningrepresentations are satisfiable (“Entscheidungsproblem”)

I It turns out that in general

1. fragments that contain either negation or relatives, but not both havetractable (polynomial) complexity

2. fragments that cover both negation and relatives, have intractable(exponential) complexity ⇒ encode Boolean satisfiability

I Semantic complexity: measured in terms of the signature of the expressed Foformulas (≈ names, adjectives, nouns and verbs present in the sentence)

Camilo Thorne Semantic Complexity and Corpus Analysis 42/42