ling 581: advanced computational linguistics lecture notes february 9th

Post on 17-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

LING 581: Advanced Computational Linguistics

Lecture NotesFebruary 9th

tregexPattern matching for passives: using variable names and regex group numbering for coindexation matching for passives (NP-SBJ-i and object of VP [NP [ –NONE- [ -*-I ]]])

Homework Task Report

• Bracketing guide– TREEBANK_3/docs/prsguid1.pdf

• Pattern matching for selected constructions in– wsj-00-24-tregex.mrg

Bikel Collins

From treebanks search to stochastic parsers trained on the WSJ Penn treebank

• Java re-implementation of Collins’ parser• Paper

– Daniel M. Bikel. 2004. Intricacies of Collins’ Parsing Model. (PS) (PDF) in Computational Linguistics, 30(4), pp. 479-511.

– http://www.cis.upenn.edu/~dbikel/papers/collins-intricacies.pdf

• Software– http://www.cis.upenn.edu/~dbikel/software.html#stat-

parser

Bikel Collins

some TCL/TK code (I wrote for research use)

makes it easy to work the parser without memorizing the command line options

Bikel CollinsThe wrapper is syntactic sugar for various commands• Scripting language is TCL/TK (“tickle T K”)• Assume variables

– set prefix "/Users/sandiway/research/"– set dbprefix "$prefix/dbparser"– set tbvprefix "/Applications/treebankviewer.app/Contents/MacOS"

• POS tagging (MXPOST, in directory jmx)– $prefix/jmx/mxpost $prefix/jmx/tagger.project < /tmp/test.txt 2> /tmp/err.txt

• Parsing– $dbprefix/bin/parse 400 $dbprefix/settings/$properties $dbprefix/bin/$ddf

/tmp/test2.txt 2>@ stdout• Training

– $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg 2>@ stdout

Bikel Collins• POS tagging (MXPOST, in directory jmx)

– tagger_input– $prefix/jmx/mxpost $prefix/jmx/tagger.project

< /tmp/test.txt 2> /tmp/err.txt

• Parsing– set ddf "wsj-02-21.obj.gz”– set properties "collins.properties"

– parser_input– $dbprefix/bin/parse 400

$dbprefix/settings/$properties $dbprefix/bin/$ddf /tmp/test2.txt 2>@ stdout

• Training– set mrg "wsj-02-21.mrg”– set properties "collins.properties"

– $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg 2>@ stdout

Unix file descriptors0 Standard input (stdin)1 Standard output (stdout)2 Standard error (stderr)

GUI componentsframe .inputtext .input.t -height 4 -yscrollcommand {.input.s set}scrollbar .input.s -command {.input.t yview}

frame .taggedtext .tagged.t -height 9 -yscrollcommand {.tagged.s set}scrollbar .tagged.s -command {.tagged.t yview}

Codeproc tagger_input {} { set lines [.input.t get 1.0 end] set infile [open "/tmp/test.txt" w] puts -nonewline $infile [string trimright $lines] close $infile}

proc parser_input {} { set lines [.tagged.t get 1.0 end] set infile [open "/tmp/test2.txt" w] puts -nonewline $infile [string trimright $lines] close $infile}

Bikel Collins

There’s also a simple tree viewer I wrote but it may not run on your system…

Bikel Collins

Relevant files and directories• bikeldemo

– wrapper2.tcl (prefix set to /Users/sandiway)• jmx

– mxpost (shell script)– mxpost.jar (Java code)

• dbparser– dbparser/bin/parse (shell script)– dbparser/bin/train (shell script)– dbparser/dbparser.jar (Java code)– dbparser/userguide/guide.pdf

top related