news blurb o’ the day
DESCRIPTION
News blurb o’ the day. Allied armed forces in Iraq using machine translation+AIM to communicate Many possible MT techniques; some based on Bayesian statistical techniques Ex: see “le chat noire” “the black cat”; estimate Pr[“black cat”|“chat noire”] - PowerPoint PPT PresentationTRANSCRIPT
News blurb o’ the day
Allied armed forces in Iraq using machine translation+AIM to communicate
Many possible MT techniques; some based on Bayesian statistical techniques
Ex: see “le chat noire” <-> “the black cat”; estimate Pr[“black cat”|“chat noire”] When you see “chat” next, estimate
max probability word to associate with it
Much more difficult than your spam filters -- need to handle entire phrases, words out of order, idom, etc.
Recursive Descent Parsing
Or: Before you can understand this sentence, first, you must understand this sentence...
Recursive Descent Parsing
A translation between streams of tokens and complex structures like trees (or tree-like data structs)
One step beyond lexing Requires more sophisticated structures
Lexical analysis, revisited
Rules equivalent to regular expressions Can only represent sequences, indefinite
repetition (i.e., “*” or “+” operators), and finite cases (“[]” and “|” operators)
Can be recognized in linear time Equivalent to a finite state machine
R.D. Parsing and CFGs
Rules can be recursive Technically, based on “context free
grammars” Needs a full stack machine, not just a
state machine Stack can be unboundedly deep Needs more than a finite number of
states to run
CFGs and BNF Write our rules in “Bakus-Naur Normal Form”
(BNF) Rules made up of two elements:
Terminals: actual tokens that could be found in the data -- “dog”, “127”, “{“, [a-zA-Z]+
Non-terminals: names of rules Rules must be of form:
LHS := term1 op1 term2 op2 ... termN opN
LHS is a non-terminal termi is a terminal or non-terminal opi is one of the operators we’ve met
before -- +, *, |, ()
BNF from P2
FILE := ( CONTROL | PUZZLEDEF )*
CONTROL := ( OUTFILE |
LOGFILE |
ERRFILE |
RESULTS |
STATS |
SEARCH-CTRL |
"Run" |
"Reset" )
BNF from P2
FILE := ( CONTROL | PUZZLEDEF )*
CONTROL := ( OUTFILE |
LOGFILE |
ERRFILE |
RESULTS |
STATS |
SEARCH-CTRL |
"Run" |
"Reset" )
BNF from P2
FILE := ( CONTROL | PUZZLEDEF )*
CONTROL := ( OUTFILE |
LOGFILE |
ERRFILE |
RESULTS |
STATS |
SEARCH-CTRL |
"Run" |
"Reset" )
Recursion...
N2KPUZZLE := "NToTheKPuzzle" "(" HNAME ")”
"=” "{”
"StartState" "=" NKPUZSTATE
"GoalState" "=" NKPUZSTATE
"}”
NKPUZSTATE := "[”
( NUMLIST |
NKPUZSTATE ( "," NKPUZSTATE )* )
"]”
NUMLIST := NON-NEG-INTEGER ( "," NON-NEG-INTEGER )*
HNAME := [a-zA-Z]+
POS-INTEGER := [1-9][0-9]+
NON-NEG-INTEGER := [0-9]+
Turning it into code
public PuzState parseNKPuzzle(Lexer l) { Token t=l.next();if (!t.tokStr().equals(“NToTheKPuzzle”)) {
throw new ParseException(“Unexpected” +“ token “ + t.tokStr() +“ found when expecting “ +“ N^k-1 puzzle state”);
}t=l.next();
if (!t.tokStr().equals(“(“)) { //... } t=l.next(); if (t.getType()!=TT_HNAME) { // ... } String heuristic=t.tokStr();
Turning it into code
// parse “)”, “=“, “{“, “StartState”,// “=“. Now ready for NKPUZSTATENkPuzStateRep sRep=parseNKPuzState(l);// now parse “GoalState”, “=“NkPuzStateRep gRep=parseNKPuzState(l);// parse “}” and you know you’re done with// NKPUZ
// now construct the actual puzzle objectif (heuristic.equals(“Manhattan”) {
NkPuz p=new NkManhattanPuz(sRep,gRep);return p;
}if (heuristic.equals(“TileCount”) {
NkPuz p=new NkTileCountPuz(sRep,gRep);return p;
}