grammar for fun: it-based gmmar teaching with visl eckhard bick, 2004 eckhard bick

42
Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick Eckhard Bick

Upload: phoenix-mccabe

Post on 02-Apr-2015

225 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Grammar for Fun:IT-based Gmmar

Teaching with VISLEckhard Bick, 2004

Eckhard BickEckhard Bick

Page 2: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Talk outlineTalk outline

• Background: VISL project activities

• A unified approach to grammar teaching

• Internet based teaching tools

• Grammar Games

• TextPainter: Visualising grammatical text properties

• Research corpora: A ressource for teaching

• Slot filler exercises: Towards evaluation

Page 3: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Teaching projectsTeaching projects• CTUCTU 1996-99: Internet based grammar teaching 1996-99: Internet based grammar teaching

software (research and development)software (research and development)

• ELU1ELU1 1998-2000: VISL tools for Danish universities 1998-2000: VISL tools for Danish universities and teacher seminariesand teacher seminaries

• VISL-HHXVISL-HHX 2001-03: VISL tools for Danish business 2001-03: VISL tools for Danish business schoolsschools

• VISL-GYMVISL-GYM 2001-02: VISL tools for Danish 2001-02: VISL tools for Danish gymnasiumsgymnasiums

• PaNoLa, GREIPaNoLa, GREI 2002-2004: Major Nordic languages 2002-2004: Major Nordic languages

• VISL-SEMVISL-SEM 2004-05: VISL didactics for teacher training 2004-05: VISL didactics for teacher training collegescolleges

• URKASURKAS 2004-05: “Almen sprogforståelse” (1.g) 2004-05: “Almen sprogforståelse” (1.g)

Page 4: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Unity in diversity:A unified approach for 22

languages

Page 5: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

VISL research languagesVISL research languages revised

syntactic trees (nodes)

morphological analysis

syntactic analysis

semantics

200.000* 4 subcorpora

lexicon and rule based analyzer + CG

CG + tree-generator

semantic prototypes Po->Da MT

40.400 13 subcorpora

integrated TWOL/CG (lingsoft) + add-on

CG + PSG WordNet based tagging

200.000* 10 subcorpora

lexicon and rule based analyzer + CG

CG + PSG + topological

semantic prototypes Da->Esp MT

8.400 3 subcorpora

lexicon and rule based analyzer + CG

CG + tree-generator

-

16.000 3 subcorpora

integrated TWOL/CG (lingsoft) + add-on

CG + PSG -

20.000 3 subcorpora

Decision Tree Tagger (H.Schmid & A.Stein)

CG + PSG -

1.000 2 subcorpora

Decision Tree Tagger (H.Schmid & A.Stein)

- -

- morpheme based analyzer + CG

CG (experimental)

Da->Esp MT

Page 6: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

The VISL teaching networkThe VISL teaching network

Page 7: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

kompleksitetsprogressionkompleksitetsprogression

Page 8: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Grammy i Grammy i KlostermølleskovenKlostermølleskoven

Story-line about grammar

Interactive exercises Book = IT

Comments for teachers

Explanations for students

Page 9: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

The Paintbox gameThe Paintbox game

Page 10: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

ShootingGallery: Hit a noun!ShootingGallery: Hit a noun!

Page 11: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

WordFall - Tetris for grammariansWordFall - Tetris for grammarians

Page 12: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Labyrinth - a word class Labyrinth - a word class mazemaze

Page 13: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Post office - stamping syntactic Post office - stamping syntactic functionfunction

Page 14: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Syntris - syntax brick by Syntris - syntax brick by brickbrick

Page 15: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

SpaceRescue: Alien syntaxSpaceRescue: Alien syntax

Page 16: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Constituent treesConstituent trees

Page 17: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Interactive syntactic treesInteractive syntactic trees

Page 18: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Choose tool e.g. inspection, build tree or label tree

Choose complexity e.g. minor (dynamic sentence dependent reduction in category complexity) or major

Choose notation e.g. symbols or abbrebiations and/or colors

Choose teaching environment e.g. latinate Danish gymnasium

Choose meta-language e.g. English

Choose visualisation e.g. graphical trees or field analysis

Choose level e.g. VISL-lite (for schools)

Choose subcorpus e.g. VISL-HHX (business gymnasium)

Choose target language e.g. German or Swedish

Teaching corpora of analyzed sentences

Page 19: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Function categoriesFunction categories

Page 20: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

BuildTree: Drag & drop constituentsBuildTree: Drag & drop constituents

Page 21: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

LabelTree: Drag & drop syntactic LabelTree: Drag & drop syntactic functionfunction

Page 22: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Cross-language problems:Cross-language problems:Infinitive markerInfinitive marker

Page 23: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Cross-language problems:Cross-language problems:participial clausesparticipial clauses

Page 24: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Cross-language problems:Cross-language problems:DiscontinuityDiscontinuity

Page 25: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

VISL source notationVISL source notation

VISL lite vertical tree(non-graphical notation, filtered)

VISL vertical tree(non-graphical notation, incl. morphology)

UTT:clS:prop VISLP:v erCs:g=D:art et=H:n forskningsprojekt=D:cl==S:pron der==P:v involverer==Od:g===D:pron mange===D:adj forskellige===H:n sprog

STA:fclS:prop("VISL") VISLP:v-fin("være",pr,akt) erCs:np=DN:art("en",neu,sg,idf) et=H:n("forskningsprojekt",neu,sg,idf,nom) forskningsprojekt=DN:fcl==S:pron-rel("der",nG,nN,nom) der==P:v-fin("involvere",pr,akt) involverer==Od:np===DN:pron-indef("mange",nG,pl,nom) mange===DN:adj("forskellig",nG,pl,nD,nom) forskellige===H:n("sprog",neu,pl,idf,nom) sprog

Page 26: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

CG source notation CG source notation (function/dependency)(function/dependency)

Page 27: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Supported xml-formatsSupported xml-formats

• TIGER-xml (constituents)

• TIGER-xml (dependency)

• MALT-xml

• VISL data file markers:pedagogical topic and chaptering

attributesfor dynamic html-layout

Page 28: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Search interfaces Search interfaces for annotated corporafor annotated corpora

Page 29: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Menu-based searchesMenu-based searches

Page 30: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Statistical toolsStatistical tools

Page 31: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Corpus annotationCorpus annotation

Page 32: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Annotated corporaAnnotated corpora

Morphosyntactically tagged

• Korpus90 and Korpus2000, mixed genre, 56M words

• DFK, mainly transscribed parliamentary discussions, 7M words

• CETEMPúblico, European Portuguese, news text, 180M words

• Folha de São Paulo, Brazilian news text, 90M words

• CORDIAL-SIN, dialectal Portuguese, 30K words

• NURC, transscribed Brazilian speech, 100K words

• Tycho Brahe, historical Portuguese, 50K words

Valency tagged

• NILC corpus, Brazilian Portuguese, journalistic and essays, 39M words

Treebanks

• Floresta Sintá(c)tica, European Portuguese, 1M words (35K revised)

• Arboretum, Danish, 50K words revised

Page 33: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Integrating live NLPIntegrating live NLPand language awareness teachingand language awareness teaching

Page 34: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

KillerFiller: Towards KillerFiller: Towards evaluationevaluation

Page 35: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Performance statisticsPerformance statistics

Page 36: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

VISLVISLhttp://visl.sdu.dkhttp://visl.sdu.dk

Eckhard Bick, [email protected]

**************

Page 37: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

The most common syntactic categoriesThe most common syntactic categories

@SUBJ subject @ADVL free (adjunct) adverbial

@ACC direct (accusative) object @PRED free (adjunct) predicative

@DAT indirect (dative) object @APP apposition

@PIV prepositional object @>N prenominal dependent

@SC subject complement @N< postnominal dependent

@OC object complement @>A adverbial pre-dependent

@SA subject related adverbial argument @A< adverbial post-dependent

@OA object related adverbial argument @P< argument of preposition

@MV main verb @INFM infinitive marker

@AUX auxiliary @VOK vocative

Page 38: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick
Page 39: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

The DanGram system in current numbers

Lexemes in morphological base lexicon: 146.342(equals about 1.000.000 full forms), of these:

proper names: 44839 (experimental)polylexicals: 460 (+ names and certain number expressions)

Lexemes in the valency and semantic prototype lexicon: 95.308Lexemes in the bilingual lexicon (Danish-Esperanto): 36.001

Danish CG-rules, in all: 6.233morphological CG disambiguation rules: 2.678syntactic mapping-rules: 1.701syntactic CG disambiguation rules: 1.854(plus 429 bilingual rules in separate MT grammars, and a smaller number of semantic case-role and proper name-

rules in the semantics and name grammars)

Danish PSG-rules: 490 (for generating syntactic tree structures)

Performance:At full disambiguation (i.e., maximal precision), the system has an average correctness of 99% for word class (PoS), and about 96% for syntactic tags (depending, on how fine grained an annotation scheme is used)

Speed:full CG-parse: ca. 400 words/sec for larger texts (start up time 3-6 sec)morphological analysis alone: ca. 1000 words/sec

Page 40: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

VISL parsing tools VISL parsing tools

• Preprocessing: word- and sentence boundaries, Preprocessing: word- and sentence boundaries, polylexicalspolylexicals

• Lexicon and rule based morphological analysis: Lexicon and rule based morphological analysis: Inflexion, derivation, composita recognitionInflexion, derivation, composita recognition

• Postprocessing: Valency and semantic potentialPostprocessing: Valency and semantic potential

• Morphological contextual disambiguation (CG)Morphological contextual disambiguation (CG)

• Syntactic mapping og diambiguation (CG)Syntactic mapping og diambiguation (CG)

• Names CG , feature propagation CG, Case role-CGNames CG , feature propagation CG, Case role-CG

• PSG-overbygning: Teaching, Arboretum, FlorestaPSG-overbygning: Teaching, Arboretum, Floresta

Page 41: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Research projectsResearch projects

• SHFSHF 1999-2001: CG, syntax & semantics (da,en,po) 1999-2001: CG, syntax & semantics (da,en,po)

• AC/DCAC/DC 1999-?: Portuguese CG-corpora 1999-?: Portuguese CG-corpora

• FlorestaFloresta 2000-?: Portuguese treebank 2000-?: Portuguese treebank

• DSLDSL 2001-?: Korpus90/2000 (Danish CG-corpora) 2001-?: Korpus90/2000 (Danish CG-corpora)

• ArboretumArboretum 2002-?: Danish treebank 2002-?: Danish treebank

• PaNoLaPaNoLa 2002-2003: Integration of Nordic CG research 2002-2003: Integration of Nordic CG research

• Nomen NescioNomen Nescio: Automatic named entity recognition: Automatic named entity recognition

Page 42: Grammar for Fun: IT-based Gmmar Teaching with VISL Eckhard Bick, 2004 Eckhard Bick

Da [da] KS @SUB den [den] ART UTR S DEF @>N gamle [gammel] ADJ nG S DEF NOM @>N sælger [sælger] N UTR S IDF NOM @SUBJ> kørte [køre] <mv> V IMPF AKT @FS-ADVL> hjem [hjem] N NEU P IDF NOM @<ACC i [i] PRP @<ADVL sin [sin] <poss> <refl> DET UTR S @>N bil [bil] N UTR S IDF NOM @P< , så [se] <mv> V IMPF AKT @FMV han [han] PERS UTR 3S NOM @<SUBJ mange [mange] <quant> DET nG P NOM @>N små [lille] ADJ nG P nD NOM @>N dyr [dyr] N NEU P IDF NOM &ACI-SUBJ @<ACC på [på] PRP @<OA de [den] ART nG P DEF @>N våde [våd] ADJ nG P nD NOM @>N veje [vej] N UTR P IDF NOM @P<

Running CG-annotation