linguistics 187/287 week 2

61
Linguistics 187/287 Week 2 Linguistics 187/287 Week 2 Engineering and Linguistic Engineering and Linguistic Generalizations Generalizations

Upload: odin

Post on 11-Feb-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Linguistics 187/287 Week 2. Engineering and Linguistic Generalizations. Homework: Due Friday Can discuss in class or via email or ask us for office hours Last assignment: How much time? Trouble: access, procedure? Issues: XLE, LFG, grammar? . Topics for this week. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Linguistics 187/287  Week 2

Linguistics 187/287 Week 2Linguistics 187/287 Week 2

Engineering and Linguistic GeneralizationsEngineering and Linguistic Generalizations

Page 2: Linguistics 187/287  Week 2

Homework:– Due Friday

» Can discuss in class or via email or ask us for office hours

– Last assignment:» How much time?» Trouble: access, procedure?» Issues: XLE, LFG, grammar?

Page 3: Linguistics 187/287  Week 2

Topics for this weekTopics for this week

Notation in LFG (more background) Templates Lexical rules Configurations Feature declaration Metarulemacro

Page 4: Linguistics 187/287  Week 2

Grammar engineering for deep Grammar engineering for deep processingprocessing

Draws on theoretical linguistics, software engineering– Theoretical linguistics => papers

» Generalizations, universality, idealization (competence)– Software engineering => programs

» Coverage, interface, QA, maintainability, efficiency, practicality Grammar engineering

– Grammar::Theory = Program::Programming language– Reflect linguistic generalizations– Respect special cases of ordinary language– Deal with large-scale interactions– Theory/practice trade-offs

Page 5: Linguistics 187/287  Week 2

Grammar Engineering andGrammar Engineering and Linguistic Theory Linguistic Theory Description vs. representation

– Program vs. data Expressiveness of notation

– Regular predicates for c-structure– Boolean combinations (esp. disjunction)– Equality, set-membership

Defaults and marking conventions– Constraining vs. defining, existentials, defaults

Abbreviation and factoring– Templates, macros, lexical rules

Configuration management– Combining rules, templates, lexicons…– Priority of core/specializations/extensions

Page 6: Linguistics 187/287  Week 2

Description vs. RepresentationDescription vs. Representation Complexity trades (program vs. data)

– Simplify descriptions but complicate representations– Complicate descriptions but simplify representations

Example: Arguments and adjuncts– Different behavior

» Arguments selected by predicate, unique» Adjuncts modify predicate, multiple instances

– Similar behavior: Can both be questioned– Representation solution (HPSG)

ARG ADJ DEP = ARG ADJ (new type)

– Description solution (LFG) ARG ADJ ARG | ADJ

Page 7: Linguistics 187/287  Week 2

Description vs. RepresentationDescription vs. Representation External constraints on representation

– Linguistic theory– Applications– Multilingual/cross-grammar similarity

Page 8: Linguistics 187/287  Week 2

Expressiveness of notationExpressiveness of notationRegular predicates for c-structure

NP --> (Det) N optionalityNP --> NNP --> Det N

NP --> { N | Pron} disjunctionNP --> NNP -> Pron

NP --> NNP --> Det NNP --> Pron

NP --> { (Det) N | Pron }

Simple context-free rules Compact notation

Page 9: Linguistics 187/287  Week 2

Expressiveness of notation and Expressiveness of notation and RepresentationRepresentation Equality: attribute values Set-membership: sets and elements

– Adjuncts: PP: (^ ADJUNCT)=! PP*: ! $ (^ ADJUNCT)– Coordination (more next week) NP --> NP: ! $ ^; CONJ NP: ! $ ^.

Semantic forms– (^ PRED)=‘kick<(^ SUBJ)(^ OBJ)>’– Semantic relations, instantiation, subcategorization

Page 10: Linguistics 187/287  Week 2

Defaults and Marking ConventionsDefaults and Marking Conventions

Constraining vs. defining– Must be assigned nom: (^ SUBJ CASE)=c nom – Is nom: (^ SUBJ CASE)=nom

Existentials– Must have case: (^ CASE)

Defaults– NTYPE proper pronoun common– { (^ NTYPE) (^ NTYPE)~=common | (^ NTYPE)=common } (make choices disjoint)

Page 11: Linguistics 187/287  Week 2

Abbreviations and FactoringAbbreviations and Factoring

Templates– Capture generalizations of annotations– Maintainability: changes, mistakes– Compare: HPSG type hierarchy

Macros– Capture generalizations of rules

Lexical Rules– Theoretical proposal to manipulate predicates– Implemented to expand lexicons consistently

Page 12: Linguistics 187/287  Week 2

Example: The verb Example: The verb bakesbakes

Belongs to several classes– Third-person, singular, present-tense verb– Transitive or intransitive

Shares– Some properties with falls– Other properties with cooked

Page 13: Linguistics 187/287  Week 2

The lexicon The lexicon àà la Kiparsky la Kiparsky

A dumping ground for exceptions

“A kind of appendix to the grammar, whose function is to list what is unpredictable and irregular about the words of a language”

Page 14: Linguistics 187/287  Week 2

The lexicon The lexicon à à la Bresnanla Bresnan

A repository of linguistic generalizations

Active and passive forms are related by lexical rules, not syntactic transformations

(^ SUBJ) (^ OBL-AG)(^ OBJ) (^ SUBJ)

Rules relating lexical items are a prime locus of syntactic generalizations

Page 15: Linguistics 187/287  Week 2

The lexicon The lexicon àà la Flickinger la FlickingerA hierarchical structure of classes Each class represents some piece of syntactic

informationbakes belongs to: – the third-person singular present-tense class

(like appears)– the transitive/intransitive class

(like cooked)– and others

Classes may be subclasses of other classes Classes may partition other classes along several

dimensions

Page 16: Linguistics 187/287  Week 2

LFG: Relations between descriptionsLFG: Relations between descriptions

LFG functional description is a collection of equations These can be named This name can stand for those equations in linguistic

descriptions Named descriptions are referred to as templates Interpretation: Simple substitution

Template-description is substituted for template-name that appears in (is invoked by) another description

LFG can encode linguistic generalizations asrelations between descriptions of structures

Page 17: Linguistics 187/287  Week 2

3SG and P3SG and PRESENTRESENT templates templates

3SG = (^ SUBJ PERSON) = 3 (^ SUBJ NUM) = SG.

“3SG names (^ SUBJ PERSON)=3 (^ SUBJ NUM)=SG”

PRESENT = (^ TENSE) = PRES.

@ marks invocation (in lexicon, rules, templates)Substitute (^ TENSE)=PRES for @PRESENT

in other descriptions

Page 18: Linguistics 187/287  Week 2

Templates enable hierarchical Templates enable hierarchical generalizationsgeneralizations

Template definitions can refer to other templates by name– E.g. further divide 3SG into: 3PERS = (^ SUBJ PERSON) = 3. SING = (^ SUBJ NUM) = SG.

then 3SG = @3PERS @SING. Hierarchy of references represents inclusion hierarchy of

named descriptions Frequently repeated subdescriptions

– specified in one place– effective in many

Page 19: Linguistics 187/287  Week 2

Hierarchy of template invocationsHierarchy of template invocationsSharing in verb agreement

PRES3SG

PRESENT3SG

SING 3PERS

PRESNOT3SG

PRESNOT3SG = ~@3SG @PRESENT. ⇒ ~[@SING @3PERS] ⇒ ~[(^ SUBJ NUM)=SG

(^ SUBJ PERS=3 ]

• Boolean combinations of template references (just like ordinary descriptions)• Sharing is distinct from mode of combination

Page 20: Linguistics 187/287  Week 2

Functional description for Functional description for bakesbakes

{(^ PRED)=‘bake<SUBJ,OBJ>’ | (^ PRED)=‘bake<SUBJ>’ }

(

Page 21: Linguistics 187/287  Week 2

Templates with parameters: ValencyTemplates with parameters: Valency

TRANS-OR-INTRANS(_p) = { (^ PRED) = ‘_p<SUBJ, OBJ>’ | (^ PRED) = ‘_p<SUBJ>’ }.

PRED value as a parameter of the template @TRANS-OR-INTRANS(bake)

⇒ { (^ PRED) = ‘bake<SUBJ, OBJ>’ | (^ PRED) = ‘bake<SUBJ>’ } Arguments can substitute for any part of an f-description

– Attributes– Values– Semantic relation-names– Descriptions

Pargram convention: Parameters begin with _

Page 22: Linguistics 187/287  Week 2

Valency hierarchyValency hierarchy TRANS-OR-INTRANS(p) = { @INTRANSITIVE(p) |

@TRANSITIVE(p) }.

INTRANSITIVE(p) = (^ PRED)=‘p<SUBJ>

TRANS-OR-INTRANS

INTRANSITIVE TRANSITIVE

Page 23: Linguistics 187/287  Week 2

Templates and generalizations: Templates and generalizations: bakesbakes

bakes: @TRANS-OR-INTRANS(bake)@PRES3SG

TRANS-OR-INTRANS(p): shared by eat, cooked,… PRES3SG: shared by appears, goes, cooks,… PRESENT:

– used by PRES3SG template– shared by bake, laugh, etc.

Page 24: Linguistics 187/287  Week 2

Lexical sharingLexical sharing

TRANS-OR-INTRANS

INTRANSITIVE TRANSITIVE

PRES3SG

PRESENT 3SG

3PERS SING

bakes cookedfalls

Page 25: Linguistics 187/287  Week 2

Type hierarchy vs. templatesType hierarchy vs. templates

Templates can play the same role as hierarchical type systems in theories like HPSG

A notational device for factoring descriptions– Interpreted as simple substitution– Not part of a formal ontology – Do not require an elaborate mathematical

characterization

Page 26: Linguistics 187/287  Week 2

Templates also invoked by RulesTemplates also invoked by Rules

Rule annotations can also call templates– Global changes, typo prevention

Example: adjunct annotationPP: ! $ (^ ADJUNCT) (! ADJ-TYPE)=VPADVP: ! $ (^ ADJUNCT) (! ADJ-TYPE)=VP

ADJ(_T) = ! $ (^ ADJUNCT) (! ADJ-TYPE)=_T.

PP: @(ADJ VP) PP: @(ADJ NP)ADVP: @(ADJ VP) ADVP: @(ADJ S)

Page 27: Linguistics 187/287  Week 2

Templates: RulesTemplates: Rules

Example: null pronounsPush it! They left (in order) to be on time.

NULL-PRON(_P) = (_P PRED)=‘pro’ (_P PRON-TYPE)=null.

VPimp --> VP: @(NULL-PRON (^ SUBJ)).

VPimp --> VP: (^ SUBJ PRED)=‘pro’ (^ SUBJ PRON-TYPE)=null.

Page 28: Linguistics 187/287  Week 2

Templates: Extend notationTemplates: Extend notation

DEFAULT(D V) = { D D~=V | D=V }. e.g. @(DEFAULT (^ NTYPE) common)

IF(P1 P2) = { ~P1 | P2 }

IFF(P1 P2) = { P1 P2 | ~P1 ~P2 }.

Page 29: Linguistics 187/287  Week 2

Templates and “Principles”Templates and “Principles”

Subject principle: every verb has a subject. Implementaton:

VERB = (^ SUBJ).– Put @VERB in every verbal entry.or– Put @VERB in the templates called by the verbal

entries.

Page 30: Linguistics 187/287  Week 2

Lexical RulesLexical Rules

Theoretical construct Templates can often achieve the same result

– Disjunction of several templates– Parameterization of a complex template

Page 31: Linguistics 187/287  Week 2

Lexical Rules: ExampleLexical Rules: Example

Active: They ate the cake. (^ PRED)=‘eat<(^SUBJ)(^OBJ)>'

Passive:The cake was eaten.(^ PRED)='eat<NULL (^SUBJ)>'

Could have VTRANS have two disjuncts Or: manipulate PRED with lexical rule

Page 32: Linguistics 187/287  Week 2

Lexical Rules: ExampleLexical Rules: Example Passive lexical rule _SCHEMA is a subcategorization frame

PASSIVE(_SCHEMA) = { _SCHEMA (^ PASSIVE)=- | _SCHEMA (^ SUBJ) --> NULL (^ OBJ) --> (^ SUBJ) (^ PASSIVE)=c +}.

Example calls– TRANS(_P) = @(PASSIVE (^ PRED)='_P<(^SUBJ)(^OBJ)>').– DITRANS(_P) = @(PASSIVE (^ PRED)='_P<(^SUBJ)(^OBJ)(^OBJ2)>').

Page 33: Linguistics 187/287  Week 2

Lexical Rules: SummaryLexical Rules: Summary Lexical rules manipulate arguments of predicates

– capture systematic alternations like active-passive Rename and remove roles No good implementation for adding roles

– causative– complex predicates– benefactives

Page 34: Linguistics 187/287  Week 2

Configuration ManagementConfiguration Management

Combining rules, templates, lexicons, …– System needs to know where everything is– For large grammars, need modularization (multiple

grammar rule files, multiple lexicons) Priority of core/specializations/extentions

– Want to specialize a grammar» No questions in instruction manuals» Loosen subj-V agreement

– Have lexicons of varying quality

Page 35: Linguistics 187/287  Week 2

Combining Rules, Templates, Combining Rules, Templates, LexiconsLexicons XLE: configuration section

– Specify what files are called – Specify which rule, template, and lexicon sections

are used RULES (TOY ENGLISH). RULES (CORE ENGLISH) (SPECIAL ENGLISH).– Other grammar information

Page 36: Linguistics 187/287  Week 2

Configurations and DeclarationsConfigurations and Declarations

Configurations– File management– Priority

Declarations– Governable relations and semantics– Features

Global Operators– METARULEMACRO

Page 37: Linguistics 187/287  Week 2

FilesFiles

Priority ordered; rules/entries in later files override those in earlier ones

Example:FILES standard-english-rules.lfg eureka-english-rules.lfg standard-english-lexicon.lfg eureka-english-lexicon.lfg.

Page 38: Linguistics 187/287  Week 2

Eureka vs. Standard rulesEureka vs. Standard rulesSTANDARD ENGLISH RULES (1.0) N --> { @NOUN-COMMON |@NOUN-PROPER}. NOUN-COMMON -> … NOUN-PROPER -> …

EUREKA ENGLISH RULES (1.0) N --> { @NOUN-COMMON |@NOUN-PROPER |@NOUN-EUREKA | N PL }. NOUN-EUREKA --> { EUR-PART | EUR-NUM }.

Page 39: Linguistics 187/287  Week 2

Sections UsedSections Used

All lexicon, rule, and template sections have names and versions*.

These are called in priority order in the config. Use with the file order to create overrides.

RULES (STANDARD RULES) (EUREKA RULES).LEXENTRIES (all all).

*Versions allow for future XLE upgrades

Page 40: Linguistics 187/287  Week 2

Multiple Lexicon SectionsMultiple Lexicon Sections

LEXENTRIES (AUTOMATIC ENGLISH) (CORRECTED ENGLISH).

AUTOMATIC ENGLISH LEXICON (1.0) appear V XLE {@(V-TRANS appear) |@(V-INTRANS appear)}.

CORRECTED ENGLISH LEXICON (1.0) appear V XLE {@(V-INTRANS appear) |@(V-SUBJ-XCOMP appear)}.

Page 41: Linguistics 187/287  Week 2

Other Configuration InformationOther Configuration Information

ROOTCAT: default top level category– Standard: ROOT, Eureka: FIELD

Nondistributives for coordination External attributes for applications Character encoding Reparse category and Optimality order for

robustness See XLE documentation for complete list

Page 42: Linguistics 187/287  Week 2

DeclarationsDeclarations

Must declare grammatical and semantic functions for each grammar.– Used for completeness and coherence

GOVERNABLERELATIONS– Functions (features) that must be subcategorized

for in the PRED– SUBJ OBJ OBL-?* ?COMP etc.

SEMANTICFUNCTIONS– Functions that must have a PRED– ADJUNCT NMOD

Page 43: Linguistics 187/287  Week 2

Feature DeclarationFeature Declaration

List of all the features– GGF and semantic functions need not be listed– all other features must be listed

List of their possible values– atomic– f-structure

Multiple feature declarations– multilingual setting– grammar specialization

Page 44: Linguistics 187/287  Week 2

Why a feature declaration?Why a feature declaration?

Good engineering practice Catch typos and old analyses Grammar easier to read

NB: Theory doesn’t have typos

Page 45: Linguistics 187/287  Week 2

Declaration formatDeclaration format

STANDARD LANGUAGE FEATURES (1.0)feature1: -> $ { val1 val2 val3 }.feature2: -> $ {val4 val 5 }.feature3: -> << [ feature1 feature2 ].feature4.----

Page 46: Linguistics 187/287  Week 2

Sample feature declarationSample feature declaration

TOY ENGLISH FEATURES (1.0)

NUM: -> $ { sg pl }.PERS: -> $ { 1 2 3 }.TNS-ASP: -> << [ TENSE MOOD ASPECT ].TENSE.MOOD: -> $ { indicative subjunctive }.ASPECT: -> << [ PERF PROG ].PERF: -> $ { + - }.PROG: -> $ {+ - }.

Page 47: Linguistics 187/287  Week 2

XLE and the feature declarationXLE and the feature declaration

XLE will not load a grammar with a violation of the feature declaration.

To catch violations in the lexicon, the generator must be loaded.– regenerate “some-sentence-to-parse”– parse, then choose “generate” in f-str window– create-generator grammar-name.lfg

print-unused-feature-declarations

Page 48: Linguistics 187/287  Week 2

Multiple feature declarationsMultiple feature declarations

List in priority order in the configuration– FEATURES (STANDARD COMMON) (STANDARD

ENGLISH).– New features are listed as usual– Changes to features use edit operators

+ add a new value& intersect the values ! replace the feature entirely

Page 49: Linguistics 187/287  Week 2

Multiple feature declarationsMultiple feature declarationsSTANDARD COMMON FEATURES (1.0) NUM: -> $ { sg pl dual }. CASE: -> $ { nom acc }. TENSE: -> << [ PAST FUTURE ]. PAST: -> $ { + - }. FUTURE: -> $ { + - }.

STANDARD ENGLISH FEATURES (1.0) PERS: -> $ { 1 2 3 }. PERS: -> $ { 1 2 3 }. &NUM: -> $ { sg pl }. NUM: -> $ { sg pl }. +CASE: -> $ { gen }. CASE: -> $ { nom acc gen }. !TENSE: -> $ { pres past fut }. TENSE: -> $ {pres past fut }. !PAST: -> $ { }. !FUTURE: -> $ { }.

Page 50: Linguistics 187/287  Week 2

Using Multiple Feature Decl.Using Multiple Feature Decl.

Multilingual contexts – Language universal features– Customize to particular language

Grammar specialization– Add new features for odd constructions– Remove unused choices

Page 51: Linguistics 187/287  Week 2

Global Operations: METARULEMACROGlobal Operations: METARULEMACRO

System defined function– Operates on every category

Global statements– Linguistic: subject condition SUBJ < OBJ coordination– Engineering: quotes bracketing

Page 52: Linguistics 187/287  Week 2

METARULEMACROMETARULEMACRO

Right-hand side of each grammar rule is the result of applying the macro to the rule

METARULEMACRO(_CAT _BASECAT _RHS) = _RHS.

Page 53: Linguistics 187/287  Week 2

Punctuation and METARULEMACROPunctuation and METARULEMACRO

Surround any constituent with quotesMETARULEMACRO( _CAT _BASECAT _RHS) = { _RHS | L-QT _CAT R-QT |L-DQT _CAT R-DQT}.

Page 54: Linguistics 187/287  Week 2

Punctuation cont.Punctuation cont.

`Mary and John’ left them there. We saw them “in the garden”. They `appeared and then disappeared.'

NP

L-QT NP R-QT

CONJNP NP

Mary and John

Page 55: Linguistics 187/287  Week 2

Punctuation: ProblemPunctuation: Problem

Vacuous branching results in many analyses

NP

Nzero

N

bagels

NP

NPzero

Nzero

N

bagels

L-QT R-QT

NP

NPzero

N

N

bagels

L-QT R-QT

etc.

Page 56: Linguistics 187/287  Week 2

Solution: PUSHUPSolution: PUSHUP If non-branching, push up to highest node.

METARULEMACRO(_CAT _BASECAT _RHS) = { _RHS | L-QT _CAT: @PUSHUP; R-QT }.

How to define PUSHUP?– Need to test existence of sister nodes: * MOTHER SISTER

PUSHUP = { (* MOTHER LEFT_SISTER) |(* MOTHER RIGHT_SISTER) ~(* MOTHER LEFT_SISTER)

|~(* MOTHER MOTHER) }.

Page 57: Linguistics 187/287  Week 2

SummarySummary

Lexical rules allow for generalizations over predicate alternations

Configurations and declarations allow management of large-scale grammars– readability and consistency– maintenance– specialization

Global operators allow for cross-grammar generalizations– coordination

Page 58: Linguistics 187/287  Week 2
Page 59: Linguistics 187/287  Week 2

The HPSG lexicon: a type hierarchyThe HPSG lexicon: a type hierarchy More specific types inherit information from less specific Types and subtypes:

– A mathematical relation between structures: AND/OR lattice – Different subtypes represent alternatives/disjunction– Multiple supertypes represent conjunction

… but type inheritance is not the only (best?) way to express generalizations

LFG does not use typed feature structures for lexical generalizations

head

noun relational

c-noun gerund verbAND

OR(Malouf)

Page 60: Linguistics 187/287  Week 2

Coordination without METARULEMACROCoordination without METARULEMACRO

Want to coordinate any constituent Coordination macro

SCCOORD(_CAT) = [ _CAT: ! $ ^; COMMA]* _CAT: ! $ ^; CONJ _CAT: ! $ ^.

Put call in each rule:NP: { (DET) AP* N PP* |@(SCCOORD NP)}.

Engineering problem: – forget to call– put in wrong category

Page 61: Linguistics 187/287  Week 2

Coordination with METARULEMACROCoordination with METARULEMACRO

Call SCCOORD as part of MRMMETARULEMACRO(_CAT _BASECAT _RHS) = { _RHS |@(SCCOORD _CAT)}.

NP rule now: NP: (DET) AP* N PP*.Effectively: NP: { (DET) AP* N PP* |@(SCCOORD NP}.