LING 581: Advanced Computational Linguistics
Lecture NotesFebruary 9th
tregexPattern matching for passives: using variable names and regex group numbering for coindexation matching for passives (NP-SBJ-i and object of VP [NP [ –NONE- [ -*-I ]]])
Homework Task Report
• Bracketing guide– TREEBANK_3/docs/prsguid1.pdf
• Pattern matching for selected constructions in– wsj-00-24-tregex.mrg
Bikel Collins
From treebanks search to stochastic parsers trained on the WSJ Penn treebank
• Java re-implementation of Collins’ parser• Paper
– Daniel M. Bikel. 2004. Intricacies of Collins’ Parsing Model. (PS) (PDF) in Computational Linguistics, 30(4), pp. 479-511.
– http://www.cis.upenn.edu/~dbikel/papers/collins-intricacies.pdf
• Software– http://www.cis.upenn.edu/~dbikel/software.html#stat-
parser
Bikel Collins
some TCL/TK code (I wrote for research use)
makes it easy to work the parser without memorizing the command line options
Bikel CollinsThe wrapper is syntactic sugar for various commands• Scripting language is TCL/TK (“tickle T K”)• Assume variables
– set prefix "/Users/sandiway/research/"– set dbprefix "$prefix/dbparser"– set tbvprefix "/Applications/treebankviewer.app/Contents/MacOS"
• POS tagging (MXPOST, in directory jmx)– $prefix/jmx/mxpost $prefix/jmx/tagger.project < /tmp/test.txt 2> /tmp/err.txt
• Parsing– $dbprefix/bin/parse 400 $dbprefix/settings/$properties $dbprefix/bin/$ddf
/tmp/test2.txt 2>@ stdout• Training
– $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg 2>@ stdout
Bikel Collins• POS tagging (MXPOST, in directory jmx)
– tagger_input– $prefix/jmx/mxpost $prefix/jmx/tagger.project
< /tmp/test.txt 2> /tmp/err.txt
• Parsing– set ddf "wsj-02-21.obj.gz”– set properties "collins.properties"
– parser_input– $dbprefix/bin/parse 400
$dbprefix/settings/$properties $dbprefix/bin/$ddf /tmp/test2.txt 2>@ stdout
• Training– set mrg "wsj-02-21.mrg”– set properties "collins.properties"
– $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg 2>@ stdout
Unix file descriptors0 Standard input (stdin)1 Standard output (stdout)2 Standard error (stderr)
GUI componentsframe .inputtext .input.t -height 4 -yscrollcommand {.input.s set}scrollbar .input.s -command {.input.t yview}
frame .taggedtext .tagged.t -height 9 -yscrollcommand {.tagged.s set}scrollbar .tagged.s -command {.tagged.t yview}
Codeproc tagger_input {} { set lines [.input.t get 1.0 end] set infile [open "/tmp/test.txt" w] puts -nonewline $infile [string trimright $lines] close $infile}
proc parser_input {} { set lines [.tagged.t get 1.0 end] set infile [open "/tmp/test2.txt" w] puts -nonewline $infile [string trimright $lines] close $infile}
Bikel Collins
There’s also a simple tree viewer I wrote but it may not run on your system…
Bikel Collins
Relevant files and directories• bikeldemo
– wrapper2.tcl (prefix set to /Users/sandiway)• jmx
– mxpost (shell script)– mxpost.jar (Java code)
• dbparser– dbparser/bin/parse (shell script)– dbparser/bin/train (shell script)– dbparser/dbparser.jar (Java code)– dbparser/userguide/guide.pdf