ling 581: advanced computational linguistics lecture notes february 9th

9
LING 581: Advanced Computational Linguistics Lecture Notes February 9th

Upload: morgan-goodman

Post on 17-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LING 581: Advanced Computational Linguistics Lecture Notes February 9th

LING 581: Advanced Computational Linguistics

Lecture NotesFebruary 9th

Page 2: LING 581: Advanced Computational Linguistics Lecture Notes February 9th

tregexPattern matching for passives: using variable names and regex group numbering for coindexation matching for passives (NP-SBJ-i and object of VP [NP [ –NONE- [ -*-I ]]])

Page 3: LING 581: Advanced Computational Linguistics Lecture Notes February 9th

Homework Task Report

• Bracketing guide– TREEBANK_3/docs/prsguid1.pdf

• Pattern matching for selected constructions in– wsj-00-24-tregex.mrg

Page 4: LING 581: Advanced Computational Linguistics Lecture Notes February 9th

Bikel Collins

From treebanks search to stochastic parsers trained on the WSJ Penn treebank

• Java re-implementation of Collins’ parser• Paper

– Daniel M. Bikel. 2004. Intricacies of Collins’ Parsing Model. (PS) (PDF) in Computational Linguistics, 30(4), pp. 479-511.

– http://www.cis.upenn.edu/~dbikel/papers/collins-intricacies.pdf

• Software– http://www.cis.upenn.edu/~dbikel/software.html#stat-

parser

Page 5: LING 581: Advanced Computational Linguistics Lecture Notes February 9th

Bikel Collins

some TCL/TK code (I wrote for research use)

makes it easy to work the parser without memorizing the command line options

Page 6: LING 581: Advanced Computational Linguistics Lecture Notes February 9th

Bikel CollinsThe wrapper is syntactic sugar for various commands• Scripting language is TCL/TK (“tickle T K”)• Assume variables

– set prefix "/Users/sandiway/research/"– set dbprefix "$prefix/dbparser"– set tbvprefix "/Applications/treebankviewer.app/Contents/MacOS"

• POS tagging (MXPOST, in directory jmx)– $prefix/jmx/mxpost $prefix/jmx/tagger.project < /tmp/test.txt 2> /tmp/err.txt

• Parsing– $dbprefix/bin/parse 400 $dbprefix/settings/$properties $dbprefix/bin/$ddf

/tmp/test2.txt 2>@ stdout• Training

– $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg 2>@ stdout

Page 7: LING 581: Advanced Computational Linguistics Lecture Notes February 9th

Bikel Collins• POS tagging (MXPOST, in directory jmx)

– tagger_input– $prefix/jmx/mxpost $prefix/jmx/tagger.project

< /tmp/test.txt 2> /tmp/err.txt

• Parsing– set ddf "wsj-02-21.obj.gz”– set properties "collins.properties"

– parser_input– $dbprefix/bin/parse 400

$dbprefix/settings/$properties $dbprefix/bin/$ddf /tmp/test2.txt 2>@ stdout

• Training– set mrg "wsj-02-21.mrg”– set properties "collins.properties"

– $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg 2>@ stdout

Unix file descriptors0 Standard input (stdin)1 Standard output (stdout)2 Standard error (stderr)

GUI componentsframe .inputtext .input.t -height 4 -yscrollcommand {.input.s set}scrollbar .input.s -command {.input.t yview}

frame .taggedtext .tagged.t -height 9 -yscrollcommand {.tagged.s set}scrollbar .tagged.s -command {.tagged.t yview}

Codeproc tagger_input {} { set lines [.input.t get 1.0 end] set infile [open "/tmp/test.txt" w] puts -nonewline $infile [string trimright $lines] close $infile}

proc parser_input {} { set lines [.tagged.t get 1.0 end] set infile [open "/tmp/test2.txt" w] puts -nonewline $infile [string trimright $lines] close $infile}

Page 8: LING 581: Advanced Computational Linguistics Lecture Notes February 9th

Bikel Collins

There’s also a simple tree viewer I wrote but it may not run on your system…

Page 9: LING 581: Advanced Computational Linguistics Lecture Notes February 9th

Bikel Collins

Relevant files and directories• bikeldemo

– wrapper2.tcl (prefix set to /Users/sandiway)• jmx

– mxpost (shell script)– mxpost.jar (Java code)

• dbparser– dbparser/bin/parse (shell script)– dbparser/bin/train (shell script)– dbparser/dbparser.jar (Java code)– dbparser/userguide/guide.pdf