new resources and ideas for semantic parsing · analogy with compiler design nl text syntax...
TRANSCRIPT
New Resources and Ideas for Semantic Parsing
Kyle Richardson
Institute for Natural Language Processing (IMS)
School of Computer Science, Electrical Engineering and Information TechnologyUniversity of Stuttgart
May 28, 2018
Collaborators: Jonas Kuhn (advisor, Stuttgart) and Jonathan Berant (work on”polyglot semantic parsing”, Tel Aviv)
Main Topic: Semantic Parsing
I Task: mapping text to formal meaning representations (ex., from Herzigand Berant (2017)).
Text: Find an article with no more than two authors.→
LF: Type.Article u R[λx .count(AuthorOf.x)] ≤ 2
”Machines and programs which attempt to answer Englishquestion have existed for only about five years.... Attemptsto build machine to test logical consistency date back to atleast Roman Lull in the thirteenth century... Only in recentyears have attempts been made to translate mechanicallyfrom English into logical formalisms...”
R.F. Simmons. 1965, Answering English Question by Computer: A Survey.Communications of the ACM
2
Main Topic: Semantic Parsing
I Task: mapping text to formal meaning representations (ex., from Herzigand Berant (2017)).
Text: Find an article with no more than two authors.→
LF: Type.Article u R[λx .count(AuthorOf.x)] ≤ 2
”Machines and programs which attempt to answer Englishquestion have existed for only about five years.... Attemptsto build machine to test logical consistency date back to atleast Roman Lull in the thirteenth century... Only in recentyears have attempts been made to translate mechanicallyfrom English into logical formalisms...”
R.F. Simmons. 1965, Answering English Question by Computer: A Survey.Communications of the ACM
2
Classical Natural Language Understanding
I Conventional pipeline model: focus on capturing deep inference andentailment.
input semList samples that containevery major element
databaseJsemK ={S10019,S10059,...}
1. Semantic Parsing
3. Reasoning
(FOR EVERY X /MAJORELT : T;(FOR EVERY Y /SAMPLE : (CONTAINS Y X);(PRINTOUT Y)))
2. Knowledge Representation
Lunar QA system of Woods (1973)
3
Classical Natural Language Understanding
input semList samples that containevery major element
databaseJsemK ={S10019,S10059,...}
1. Semantic Parsing
3. Reasoning
(FOR EVERY X /MAJORELT : T;(FOR EVERY Y /SAMPLE : (CONTAINS Y X);(PRINTOUT Y)))
2. Knowledge Representation
Sub-problem Problem Description1. Semantic Parsing Translating input to sem, input → sem2. Knowledge Representation Defining a sufficiently expressive sem language.3. Reasoning/Execution Going from sem to denotations in the real-word.
3
Why and How? Analogy with Compiler Design
NL text
Syntax
Logic/Semantics
Model
Programs
Translation
Interpretation
pos := i + rate * 60
Syntax :=
+
*
60id3
id2
id1
Semantics :=
+
*
int2real
60
id3
id2
id1
Code
MOVF id2, R2MULF #60.0, R2MOVF id2, R1ADDF R2, R1MOVF R1,id1
Programs
lex: id1 := id2 + id3 * 60
Translation
Generation
I NLU model is a kind of compiler, involves a transduction from NL to aformal (usually logical) language.
4
Data-driven Semantic Parsing and NLU
input semList samples that containevery major element
databaseJsemK ={S10019,S10059,...}
1. Semantic Parsing
3. Reasoning
(FOR EVERY X /MAJORELT : T;(FOR EVERY Y /SAMPLE : (CONTAINS Y X);(PRINTOUT Y)))
2. Knowledge Representation
I Data-driven NLU: Asks an empirical question: Can we learn NLUmodels from examples? Building a NL compiler by hand is hard....
5
Data-driven Semantic Parsing and NLU
input semList samples that containevery major element
databaseJsemK ={S10019,S10059,...}
1. Semantic Parsing
3. Reasoning
machine learning(FOR EVERY X /MAJORELT : T;(FOR EVERY Y /SAMPLE : (CONTAINS Y X);(PRINTOUT Y)))
2. Knowledge Representation
I Semantic Parser Induction: Learn semantic parser (weightedtransduction) from parallel text/meaning data, constrained SMT task.
I Resource Problem: Where does the parallel data come from, whatdo we learn from? Does not occur ’in the wild’
5
Talk Overview
Sub-problem Problem Description1. Semantic Parsing Translating input to sem, input → sem2. Knowledge Representation Defining a sufficiently expressive sem language.3. Reasoning/Execution Going from sem to denotations in the real-word.4. Resource Problem Finding/building parallel data for SP induction.
I Resource Problem: Using technical documentation as a parallel corpus(problems 1,4), from Richardson and Kuhn (2017b,a)
I Polyglot Modeling: Building semantic parsers from multiple datasets,”polyglot decoding” (problems 1,4), from Richardson et al. (2018)
I Learning from Entailment: Integrating entailment into a semanticparsing pipeline (problems 1,2,3), from Richardson and Kuhn (2016).
6
Semantic Parsing and Parallel Data
input x What state has the largest population?
sem z (argmax (λx . (state x) λx . (population x)))
y
I Learning from LFs: Pairs of text x and logical forms z, D = {(x, z)i}ni ,
learn sem : x→ zI Modularity: Study the translation independent of other semantic issues.
I Resource issue: Finding parallel data, current lack of resources.
8
Semantic Parsing and Parallel Data
input x What state has the largest population?
sem z (argmax (λx . (state x) λx . (population x)))
y
I Learning from LFs: Pairs of text x and logical forms z, D = {(x, z)i}ni ,
learn sem : x→ zI Modularity: Study the translation independent of other semantic issues.
I Resource issue: Finding parallel data, current lack of resources.
8
Source Code and API Documentation
* Returns the greater of two long values** @param a an argument* @param b another argument* @return the larger of a and b* @see java.lang.Long#MAX VALUE*/public static Long max(long a, long b)
I Source Code Documentation: High-level descriptions of internalsoftware functionality paired with code.
I Idea: Treat as a parallel corpus (Allamanis et al., 2015; Gu et al., 2016;Iyer et al., 2016), or synthetic semantic parsing dataset.
9
Source Code and API Documentation
* Returns the greater of two long values** @param a an argument* @param b another argument* @return the larger of a and b* @see java.lang.Long#MAX VALUE*/public static Long max(long a, long b)
I Source Code Documentation: High-level descriptions of internalsoftware functionality paired with code.
I Idea: Treat as a parallel corpus (Allamanis et al., 2015; Gu et al., 2016;Iyer et al., 2016), or synthetic semantic parsing dataset.
9
Source Code as a Parallel Corpus
I Observation 1: Tight coupling between high-level text and code.
* Returns the greater of two long values** @param a an argument* @param b another argument* @return the larger of a and b* @see java.lang.Long#MAX VALUE*/public static Long max(long a, long b)
(ns ... clojure.core)
(defn random-sample"Returns items from coll with randomprobability of prob (0.0 - 1.0)"([prob] ...)([prob coll] ...))
I Function signatures: Header-like representations, containing functionname, (optionally typed) arguments, (optional) return value, namespace.
Returns the greater of two long values math.util Long max(long a, long b)
Returns items from coll with random... (core.random-sample prob coll)
10
Source Code as a Parallel Corpus
I Observation 1: Tight coupling between high-level text and code.
* Returns the greater of two long values** @param a an argument* @param b another argument* @return the larger of a and b* @see java.lang.Long#MAX VALUE*/public static Long max(long a, long b)
(ns ... clojure.core)
(defn random-sample"Returns items from coll with randomprobability of prob (0.0 - 1.0)"([prob] ...)([prob coll] ...))
I Function signatures: Header-like representations, containing functionname, (optionally typed) arguments, (optional) return value, namespace.
Returns the greater of two long values math.util Long max(long a, long b)
Returns items from coll with random... (core.random-sample prob coll)
10
Source Code as a Parallel Corpus
I Observation 1: Tight coupling between high-level text and code.
* Returns the greater of two long values** @param a an argument* @param b another argument* @return the larger of a and b* @see java.lang.Long#MAX VALUE*/public static Long max(long a, long b)
(ns ... clojure.core)
(defn random-sample"Returns items from coll with randomprobability of prob (0.0 - 1.0)"([prob] ...)([prob coll] ...))
I Function signatures: Header-like representations, containing functionname, (optionally typed) arguments, (optional) return value, namespace.
Returns the greater of two long values math.util Long max(long a, long b)
Returns items from coll with random... (core.random-sample prob coll)
10
Source Code: Many Formal Languages
I Observation 2: There are many languages, hence many datasets.
* Returns the greater of two long values** @param a an argument* @param b another argument* @return the larger of a and b* @see java.lang.Long#MAX VALUE*/public static Long max(long a, long b)
(ns ... clojure.core)
(defn random-sample"Returns items from coll with randomprobability of prob (0.0 - 1.0)"([prob] ...)([prob coll] ...))
# zipfile.py"""Read and write ZIP files"""
class ZipFile(object):
"""Class to open ... zip files."""
def write(filename,arcname,....):"""Put the bytes from filenameinto the archive under the name.."""
--| Mostly functions for reading andshowing RealFloat like valuesmodule Numeric
-- | Show non-negative Integral numbersin base 10.showInt :: Integral a => a -> ShowS
11
Source Code: Multilingual
I Observation 3: Many NLs, hence many multilingual datasets.
namespace ArrayIterator;
/** Appends values as the last element** @param value The value to append* @see ArrayIterator::next()*/public void append(mixed $value)
namespace ArrayIterator;
/** Ajoute une valeur comme dernier element** @param value La valeur a ajouter* @see ArrayIterator::next()*/public void append(mixed $value)
namespace ArrayIterator;
/** Dobavlyaet znachenie value, kakposledni element massiva.** @param value znachenie, kotoroe nuzhnodobavit’.* @see ArrayIterator::next()*/public void append(mixed $value)
namespace ArrayIterator;
/*/* Anade el valor como el ultimoelemento./** @param value El valor a anadir.* @see ArrayIterator::next()*/public void append(mixed $value)
12
Beyond raw pairs: Background Information
I Observation 4: Code collections contain rich amount of background info.
NAME : dappprofprofile user and lib function usage.
SYNOPSISdappprof [-ac..] .. -p PID | command
DESCRIPTION--a print all data--p PID examine the PID
EXAMPLESRun and examine the ‘‘df -h’’ command
dappprof command=‘‘df -h’’
Print elapsed time for PID 1871dappprof -p PID=1871
SEE ALSOdapptrace(1M), dtrace(1M), ...
namespace ArrayIterator;
/** Appends values as the last element** @param value The value to append* @see ArrayIterator::next()*/public void append(mixed $value)
I Descriptions: textual descriptions of parameters, return values, ...I Cluster information: pointers to related functions/utilities, ...I Syntactic information: function/code syntax
13
Resource 1: Standard Library DocumentationDataset #Pairs #Descr.Symbols#WordsVocab. Example Pairs (x, z), Goal: learn a function x → z
Java 7,183 4,804 4,072 82,696 3,721 x : Compares this Calendar to the specified Object.z : boolean util.Calendar.equals(Object obj)
Ruby 6,885 1,849 3,803 67,274 5,131 x : Computes the arc tangent given y and x.z : Math.atan2(y,x) → Float
PHPen 6,611 13,943 8,308 68,921 4,874 x : Delete an entry in the archive using its name.z : bool ZipArchive::deleteName(string $name)
Python 3,085 429 3,991 27,012 2,768 x : Remove the specific filter from this handler.z : logging.Filterer.removeFilter(filter)
Elisp 2,089 1,365 1,883 30,248 2,644 x : Returns the total height of the window.z : (window-total-height window round)
Haskell 1,633 255 1,604 19,242 2,192 x : Extract the second component of a pair.z : Data.Tuple.snd :: (a, b) -> b
Clojure 1,739 – 2,569 17,568 2,233 x : Returns a lazy seq of every nth item in coll.z : (core.take-nth n coll)
C 1,436 1,478 1,452 12,811 1,835 x : Returns current file position of the stream.z : long int ftell(FILE *stream)
Scheme 1,301 376 1,343 15,574 1,756 x : Returns a new port and the given state.z : (make-port port-type state)
Geoquery 880 – 167 6,663 279 x : What is the tallest mountain in America?z : (highest(mountain(loc 2(countryid usa))))
I Standard library documentation for 9+ programming languages, 7 naturallanguages, from Richardson and Kuhn (2017b).
14
Resource 2: Open source Python projectsProject # Pairs # Symbols # Words Vocab.scapy 757 1,029 7,839 1,576zipline 753 1,122 8,184 1,517biopython 2,496 2,224 20,532 2,586renpy 912 889 10,183 1,540pyglet 1,400 1,354 12,218 2,181kivy 820 861 7,621 1,456pip 1,292 1,359 13,011 2,201twisted 5,137 3,129 49,457 4,830vispy 1,094 1,026 9,744 1,740orange 1,392 1,125 11,596 1,761tensorflow 5,724 4,321 45,006 4,672pandas 1,969 1,517 17,816 2,371sqlalchemy 1,737 1,374 15,606 2,039pyspark 1,851 1,276 18,775 2,200nupic 1,663 1,533 16,750 2,135astropy 2,325 2,054 24,567 3,007sympy 5,523 3,201 52,236 4,777ipython 1,034 1,115 9,114 1,771orator 817 499 6,511 670obspy 1,577 1,861 14,847 2,169rdkit 1,006 1,380 9,758 1,739django 2,790 2,026 31,531 3,484ansible 2,124 1,884 20,677 2,593statsmodels 2,357 2,352 21,716 2,733theano 1,223 1,364 12,018 2,152nltk 2,383 2,324 25,823 3,151sklearn 1,532 1,519 13,897 2,115
I 27 Python projects from Github, from Richardson and Kuhn (2017a),similar to Barone and Sennrich (2017)
15
Summary of Current ResourcesI API Datasets: Stdlib collection and Py27, consists of 45 APIs across 11
programming languages, 8 natural languages.I Other Resources: Function Assistant, tool for extracting parallel
datasets from Python projectsI forthcoming: around 460 Python/Java API datasets for data-to-text
generation (Richardson et al., 2017).
https://github.com/yakazimir/Code-Datasets
16
Text to Signature Translation: How hard is it?I Task: For each API dataset of text/signature pairs (each within a finite
signature space), learn a sp: text → signature.I Question Can background info. from API help?
I SMT Baseline: (Deng and Chrupa la, 2014), sequence prediction model.
Gets the total cache size
× string APCIterator::key(void)× int APCIterator::getTotalHits(void)× int APCterator::getSize(void)int APCIterator::getTotalSize(void)× int Memcached::append(string $key)...
Accuracy @i?
int APCIterator::getTotalSize(void)× int APCterator::getSize(void)× string APCIterator::key(void)× int Memcached::append(string $key)× int APCIterator::getTotalHits(void)...
SMT Model
Task specific decoder
Discriminative Model
Evaluation
Evaluation
reranked k-best
k-best signature translation list
17
Text to Signature Translation: How hard is it?I Task: For each API dataset of text/signature pairs (each within a finite
signature space), learn a sp: text → signature.I Question Can background info. from API help?
I SMT Baseline: (Deng and Chrupa la, 2014), sequence prediction model.
Gets the total cache size
× string APCIterator::key(void)× int APCIterator::getTotalHits(void)× int APCterator::getSize(void)int APCIterator::getTotalSize(void)× int Memcached::append(string $key)...
Accuracy @i?
int APCIterator::getTotalSize(void)× int APCterator::getSize(void)× string APCIterator::key(void)× int Memcached::append(string $key)× int APCIterator::getTotalHits(void)...
SMT Model
Task specific decoder
Discriminative Model
Evaluation
Evaluation
reranked k-best
k-best signature translation list
17
Background API Information
I Reranker Model: See-also annotations, abstract syntax info., parameterdescriptions,...
{cos,acosh,sinh}
z: function float cosh float $arg
x: Returns the hyperbolic cosine of arg
φ(x,z) =
Model score: is it in top 5..10?Alignments: (hyperbolic, cosh), (cosine, cosh), ...Phrases: (hyperbolic cosine, cosh), (of arg, float $arg), ...See also classes: (hyperbolic, {cos,acosh,sinh,..}), ...In descriptions: (arg, , $arg)Matches/Tree position: ......
18
Text to Signature Translation: How hard is it?
Gets the total cache size
× string APCIterator::key(void)× int APCIterator::getTotalHits(void)× int APCterator::getSize(void)int APCIterator::getTotalSize(void)× int Memcached::append(string $key)...
Accuracy @i?
int APCIterator::getTotalSize(void)× int APCterator::getSize(void)× string APCIterator::key(void)× int Memcached::append(string $key)× int APCIterator::getTotalHits(void)...
SMT Model
Task specific decoder
Discriminative Model
Evaluation
Evaluation
reranked k-best
k-best signature translation list
Dataset(Avg.) Term Matching SMT SMT + ReankerStandard Library Docs. 12.7 32.0 19.2 28.9 67.7 41.9 31.1 71.1 44.5Py27 22.9 50.6 32.4 29.3 67.4 42.5 32.4 73.5 46.5
accuracy @1 accuracy @10 MRR
19
What do these results mean?
Dataset(Avg.) Term Matching SMT SMT + ReankerStandard Library Docs. 12.7 32.0 19.2 28.9 67.7 41.9 31.1 71.1 44.5Py27 22.9 50.6 32.4 29.3 67.4 42.5 32.4 73.5 46.5
accuracy @1 accuracy @10 MRR
20
Semantic Parsing in Tech Docs: General Findings
I How hard is it?: Certainly not trivial, simple SMT models do alright, butlots of room for improvement
I New Challenges: Highly sparse vocabulary, very hard to apply existingsemantic parsing and MT methods (more about this next).
21
The Semantics of Function Signatures
Returns the greater of two long valuesySignature (informal) lang Math long max(long a,long b)Normalized java lang Math::max(long:a,long:b) -> long
Expansion to Logic
Jjava lang Math::max(long:a,long:b) -> longKm
λx1λx2∃v∃f ∃n∃c eq(v, max(x1, x2)) ∧ fun(f , max) ∧ type(v ,long)∧ lang(f ,java)∧ var(x1,a) ∧ param(x1,f ,1) ∧ type(x1,long)∧ var(x2,b) ∧ param(x2,f ,2) ∧ type(x2,long)∧ namespace(n,lang) ∧ in namespace(f ,n)∧ class(c,Math) ∧ in class(f ,c)
I Disclaimer: not real logical forms, but we can formalize the functionsignature languages and define a translation to logic (Richardson, 2018).
I Reasoning: A lot of declarative knowledge can be extracted from librariesdirectly, and via natural language.
22
Other Resource Issues
(en,PHP)(en,Lisp)(de, PHP)
(ja, Python)(en, Haskell)
...
θen → PHP
θen → Lisp
θde → PHP
θja → Python
θen → Haskell
I Traditional approaches to semantic parsing train individual models foreach available parallel dataset.
I Resource Problem: Datasets tend to be small, hard and unlikely to getcertain types of parallel data, e.g., (de,Haskell).
24
Code Domain: Projects often Lack Documentation
I Ideally, we want each dataset to have tens of thousands of documentedfunctions.
I Most projects have 500 or less documented functions.
25
Polyglot Models: Training on Multiple Datasets
(en,PHP)(en,Lisp)(de, PHP)
(ja, Python)(en, Haskell)
...
θpolyglot
I Idea: concatenate all datasets into one, build a single-model with sharedparameters, capture redundancy (Herzig and Berant, 2017).
I Polyglot Translator: translates from any input language to any output(programming) language.
1. Multiple Datasets: Does this help learn better translators?
2. Zero-Short Translation (Johnson et al., 2016): Can we translatebetween different APIs and unobserved language pairs?
26
Polyglot Models: Training on Multiple Datasets
(en,PHP)(en,Lisp)(de, PHP)
(ja, Python)(en, Haskell)
...
θpolyglot
I Idea: concatenate all datasets into one, build a single-model with sharedparameters, capture redundancy (Herzig and Berant, 2017).
I Polyglot Translator: translates from any input language to any output(programming) language.
1. Multiple Datasets: Does this help learn better translators?
2. Zero-Short Translation (Johnson et al., 2016): Can we translatebetween different APIs and unobserved language pairs?
26
Polyglot Models: Training on Multiple Datasets
I Challenge: Building a polyglot decoder, or translation mechanism thatfacilitates crossing between (potentially unobserved) language pairs.
I Constraint 1: Ensure well-formed code output (not guaranteed inordinary MT, cf. Cheng et al. (2017); Krishnamurthy et al. (2017))
I Constraint 2: Must be able to translate to targetAPIs/programming languages on demand.
27
Polyglot Models: Training on Multiple Datasets
I Challenge: Building a polyglot decoder, or translation mechanism thatfacilitates crossing between (potentially unobserved) language pairs.
I Constraint 1: Ensure well-formed code output (not guaranteed inordinary MT, cf. Cheng et al. (2017); Krishnamurthy et al. (2017))
I Constraint 2: Must be able to translate to targetAPIs/programming languages on demand.
27
Graph Based Approach
0.00
s0
∞5
s5
∞1
s1
∞6
s6
∞2
s2
∞3
s3
∞7
s7
∞4
s4
∞9
s9
∞10
s10
∞11
s11
∞8
s82C
2Clojure
numeric
algo
math
mathceil
atan2
atan2
ceil
x
xarg
y
I Idea: Exploit finite-ness of target translation space, represent full searchspace as directed acyclic graph (DAG).
I Trick: Prepend to each signature an artificial token that identifiers theAPI project or programming language (Johnson et al., 2016).
I Decoding: Reduces to finding a path given an input x:
x : The ceiling of a number
Can be solved using variant of single-source shortest path (SSSP)problem (Cormen et al., 2009), extendible to k-SSSP paths.
28
Graph Based Approach
0.00
s0
∞5
s5
∞1
s1
∞6
s6
∞2
s2
∞3
s3
∞7
s7
∞4
s4
∞9
s9
∞10
s10
∞11
s11
∞8
s82C
2Clojure
numeric
algo
math
mathceil
atan2
atan2
ceil
x
xarg
y
I Idea: Exploit finite-ness of target translation space, represent full searchspace as directed acyclic graph (DAG).
I Trick: Prepend to each signature an artificial token that identifiers theAPI project or programming language (Johnson et al., 2016).
I Decoding: Reduces to finding a path given an input x:
x : The ceiling of a number
Can be solved using variant of single-source shortest path (SSSP)problem (Cormen et al., 2009), extendible to k-SSSP paths.
28
Graph Based Approach
0.00
s0
∞5
s5
∞1
s1
∞6
s6
∞2
s2
∞3
s3
∞7
s7
∞4
s4
∞9
s9
∞10
s10
∞11
s11
∞8
s82C
2Clojure
numeric
algo
math
mathceil
atan2
atan2
ceil
x
xarg
y
I Idea: Exploit finite-ness of target translation space, represent full searchspace as directed acyclic graph (DAG).
I Trick: Prepend to each signature an artificial token that identifiers theAPI project or programming language (Johnson et al., 2016).
I Decoding: Reduces to finding a path given an input x:
x : The ceiling of a number
Can be solved using variant of single-source shortest path (SSSP)problem (Cormen et al., 2009), extendible to k-SSSP paths.
28
Graph Decoder: Shortest Path Decoding
0.00
s0
∞5
s5
∞1
s1
∞6
s6
∞2
s2
∞3
s3
∞7
s7
∞4
s4
∞9
s9
∞10
s10
∞11
s11
∞8
s82C
2Clojure
numeric
algo
math
mathceil
atan2
atan2
ceil
x
xarg
y
I Standard SSSP: assumes a DAG G = (V ,E), a weight function:w : E → R, (initialized) vector d ∈ ∞|V |, unique source node b
0: d[b]← 0.01: for vertex u ∈ V in top sorted order2: do d(v) = min
(u,v,z)∈E
{d(u) + w(u, v , z)
}3: return min
v∈V
{d(v)
}
I Variant: replace w(..) with translation model, dynamically generatesweights correspond. to translation scores for x and labels in SSSP search.
29
Graph Decoder: Shortest Path Decoding
0.00
s0
∞5
s5
∞1
s1
∞6
s6
∞2
s2
∞3
s3
∞7
s7
∞4
s4
∞9
s9
∞10
s10
∞11
s11
∞8
s82C
2Clojure
numeric
algo
math
mathceil
atan2
atan2
ceil
x
xarg
y
I Standard SSSP: assumes a DAG G = (V ,E), a weight function:w : E → R, (initialized) vector d ∈ ∞|V |, unique source node b
0: d[b]← 0.01: for vertex u ∈ V in top sorted order2: do d(v) = min
(u,v,z)∈E
{d(u) + w(u, v , z)
}3: return min
v∈V
{d(v)
}I Variant: replace w(..) with translation model, dynamically generates
weights correspond. to translation scores for x and labels in SSSP search.
29
Neural Sequence to Sequence Models
decoderc z1 z2 z3 ... z|z|
...
...
...
x1 x2 x3 ... x|x|
encoder
I Encoder Model: neural sequence model, builds a distributedrepresentation of the source sentence and its words x = (h1, h2, .., h|x|):
I Decoder Model: RNN language model additionally conditioned on inputx/Encoder states.
p(z | x) =|z|∏i
pΘ(zi | z<i , x)
I Modification (at decode/test time): Constrain search (each new zi )to allowable transitions and paths in the graph.
30
Neural Sequence to Sequence Models
decoderc z1 z2 z3 ... z|z|
...
...
...
x1 x2 x3 ... x|x|
encoder
I Encoder Model: neural sequence model, builds a distributedrepresentation of the source sentence and its words x = (h1, h2, .., h|x|):
I Decoder Model: RNN language model additionally conditioned on inputx/Encoder states.
p(z | x) =|z|∏i
pΘ(zi | z<i , x)
I Modification (at decode/test time): Constrain search (each new zi )to allowable transitions and paths in the graph.
30
Graph Decoder: Shortest Path DecodingI Standard SSSP: assumes a DAG G = (V ,E), a weight function:
w : E → R, (initialized) vector d ∈ ∞|V |, unique source node b
0: d[b]← 0.01: for each vertex u ∈ V in top sorted order2: do d(v) = min
(u,v,z)∈E
{d(u) + w(u, v , z)
}3: return min
v∈V
{d(v)
}I Translation models: Any model can be used, we experiment with lexical
translation models (see paper) and attentive encoder-decoder models.
I Neural Variant: assumes input x , G, neural decoder parameters Θ(trained normally), d , and s (state map):
0: d[b]← 0.01: for each vertex u ∈ V in top sorted order2: do d(v) = min
(u,v,z)∈E
{− log pΘ(z | z<i , x) + d(u)
}3: s[v ]← RNN state for min edge4: return min
v∈V
{d(v)
}
31
Graph Decoder: Shortest Path DecodingI Standard SSSP: assumes a DAG G = (V ,E), a weight function:
w : E → R, (initialized) vector d ∈ ∞|V |, unique source node b
0: d[b]← 0.01: for each vertex u ∈ V in top sorted order2: do d(v) = min
(u,v,z)∈E
{d(u) + w(u, v , z)
}3: return min
v∈V
{d(v)
}I Translation models: Any model can be used, we experiment with lexical
translation models (see paper) and attentive encoder-decoder models.I Neural Variant: assumes input x , G, neural decoder parameters Θ
(trained normally), d , and s (state map):
0: d[b]← 0.01: for each vertex u ∈ V in top sorted order2: do d(v) = min
(u,v,z)∈E
{− log pΘ(z | z<i , x) + d(u)
}3: s[v ]← RNN state for min edge4: return min
v∈V
{d(v)
}31
Polyglot vs. Monolingual Decoding
I The difference is the type of input data, and starting point (i.e., sourcenode) in the graph search.
I Any Language Decoding: Letting the decoder decide.
1. Source API (stdlib): (es, PHP) Input: Devuelve el mensaje asociado al objeto lanzado.
Out
put Language: PHP Translation: public string Throwable::getMessage ( void )
Language: Java Translation: public String lang.getMessage( void )Language: Clojure Translation: (tools.logging.fatal throwable message & more)
2. Source API (stdlib): (ru, PHP) Input: konvertiruet stroku iz formata UTF-32 v format UTF-16.
Out
put Language: PHP Translation: string PDF utf32 to utf16 ( ... )
Language: Ruby Translation: String#toutf16 => stringLanguage: Haskell Translation: Encoding.encodeUtf16LE :: Text -> ByteString
3. Source API (py): (en, stats) Input: Compute the Moore-Penrose pseudo-inverse of a matrix.
Out
put Project: sympy Translation: matrices.matrix.base.pinv solve( B, ... )
Project: sklearn Translation: utils.pinvh( a, cond=None,rcond=None,... )Project: stats Translation: tools.pinv2( a,cond=None,rcond=None )
I Can be used for extracting declarative knowledge about functionequivalences (e.g., for the logical approach introduced).
32
Polyglot vs. Monolingual Decoding
I The difference is the type of input data, and starting point (i.e., sourcenode) in the graph search.
I Any Language Decoding: Letting the decoder decide.
1. Source API (stdlib): (es, PHP) Input: Devuelve el mensaje asociado al objeto lanzado.
Out
put Language: PHP Translation: public string Throwable::getMessage ( void )
Language: Java Translation: public String lang.getMessage( void )Language: Clojure Translation: (tools.logging.fatal throwable message & more)
2. Source API (stdlib): (ru, PHP) Input: konvertiruet stroku iz formata UTF-32 v format UTF-16.
Out
put Language: PHP Translation: string PDF utf32 to utf16 ( ... )
Language: Ruby Translation: String#toutf16 => stringLanguage: Haskell Translation: Encoding.encodeUtf16LE :: Text -> ByteString
3. Source API (py): (en, stats) Input: Compute the Moore-Penrose pseudo-inverse of a matrix.
Out
put Project: sympy Translation: matrices.matrix.base.pinv solve( B, ... )
Project: sklearn Translation: utils.pinvh( a, cond=None,rcond=None,... )Project: stats Translation: tools.pinv2( a,cond=None,rcond=None )
I Can be used for extracting declarative knowledge about functionequivalences (e.g., for the logical approach introduced).
32
Polyglot vs. Monolingual Decoding
I The difference is the type of input data, and starting point (i.e., sourcenode) in the graph search.
I Any Language Decoding: Letting the decoder decide.
1. Source API (stdlib): (es, PHP) Input: Devuelve el mensaje asociado al objeto lanzado.
Out
put Language: PHP Translation: public string Throwable::getMessage ( void )
Language: Java Translation: public String lang.getMessage( void )Language: Clojure Translation: (tools.logging.fatal throwable message & more)
2. Source API (stdlib): (ru, PHP) Input: konvertiruet stroku iz formata UTF-32 v format UTF-16.
Out
put Language: PHP Translation: string PDF utf32 to utf16 ( ... )
Language: Ruby Translation: String#toutf16 => stringLanguage: Haskell Translation: Encoding.encodeUtf16LE :: Text -> ByteString
3. Source API (py): (en, stats) Input: Compute the Moore-Penrose pseudo-inverse of a matrix.
Out
put Project: sympy Translation: matrices.matrix.base.pinv solve( B, ... )
Project: sklearn Translation: utils.pinvh( a, cond=None,rcond=None,... )Project: stats Translation: tools.pinv2( a,cond=None,rcond=None )
I Can be used for extracting declarative knowledge about functionequivalences (e.g., for the logical approach introduced).
32
Polyglot vs. Monolingual Decoding: Tech Doc Task
I Our Focus: Does training on multiple datasets (i.e., polyglot models)improve monolingual decoding?
I Monolingual models: current best models, primarily SMT based.
Method Acc@1 Acc@10 MRR
stdl
ib
mono. Best Monolingual Model 29.9 69.2 43.1poly. Lexical SMT SSSP 33.2 70.7 45.9
Best Seq2Seq SSSP 13.9 36.5 21.5
py27
mono. Best Monolingual Model 32.4 73.5 46.5poly. Lexical SMT SSSP 41.3 77.7 54.1
Best Seq2Seq SSSP 9.0 26.9 15.1
I Findings: Polyglot models can improve performance using SMT models,do not work for Seq2Seq models.
I Standard set of tricks: copying, lexical biasing (Arthur et al., 2016).
33
Polyglot vs. Monolingual Decoding: Tech Doc Task
I Our Focus: Does training on multiple datasets (i.e., polyglot models)improve monolingual decoding?
I Monolingual models: current best models, primarily SMT based.
Method Acc@1 Acc@10 MRR
stdl
ib
mono. Best Monolingual Model 29.9 69.2 43.1poly. Lexical SMT SSSP 33.2 70.7 45.9
Best Seq2Seq SSSP 13.9 36.5 21.5
py27
mono. Best Monolingual Model 32.4 73.5 46.5poly. Lexical SMT SSSP 41.3 77.7 54.1
Best Seq2Seq SSSP 9.0 26.9 15.1
I Findings: Polyglot models can improve performance using SMT models,do not work for Seq2Seq models.
I Standard set of tricks: copying, lexical biasing (Arthur et al., 2016).
33
Polyglot vs. Monolingual Decoding: Tech Doc Task
I Our Focus: Does training on multiple datasets (i.e., polyglot models)improve monolingual decoding?
I Monolingual models: current best models, primarily SMT based.
Method Acc@1 Acc@10 MRR
stdl
ib
mono. Best Monolingual Model 29.9 69.2 43.1poly. Lexical SMT SSSP 33.2 70.7 45.9
Best Seq2Seq SSSP 13.9 36.5 21.5
py27
mono. Best Monolingual Model 32.4 73.5 46.5poly. Lexical SMT SSSP 41.3 77.7 54.1
Best Seq2Seq SSSP 9.0 26.9 15.1
I Findings: Polyglot models can improve performance using SMT models,do not work for Seq2Seq models.
I Standard set of tricks: copying, lexical biasing (Arthur et al., 2016).
33
Polyglot vs. Monolingual Decoding: Tech Doc Task
I Our Focus: Does training on multiple datasets (i.e., polyglot models)improve monolingual decoding?
I Monolingual models: current best models, primarily SMT based.
Method Acc@1 Acc@10 MRR
stdl
ib
mono. Best Monolingual Model 29.9 69.2 43.1poly. Lexical SMT SSSP 33.2 70.7 45.9
Best Seq2Seq SSSP 13.9 36.5 21.5
py27
mono. Best Monolingual Model 32.4 73.5 46.5poly. Lexical SMT SSSP 41.3 77.7 54.1
Best Seq2Seq SSSP 9.0 26.9 15.1
I Findings: Polyglot models can improve performance using SMT models,do not work for Seq2Seq models.
I Standard set of tricks: copying, lexical biasing (Arthur et al., 2016).
33
Polyglot Modeling on Benchmark SP Tasks
I Our Focus: Does this help on benchmark semantic parsing tasks?
Method Acc@1 (averaged)
Mul
tilin
gual
Geo
quer
y
mon
o.
UBL Kwiatkowski et al. (2010) 74.2TreeTrans Jones et al. (2012) 76.8Lexical SMT SSSP 68.6
poly.
Best Seq2Seq SSSP 78.0Lexical SMT SSSP 67.3Best Seq2Seq SSSP 79.6
I Multilingual Geoquery: monolingual/polyglot models on Geoquery inen, de, gr , th, polyglot setting improves accuracy, neural Seq2Seq modelsperform best (consistent with recent findings, (Dong and Lapata, 2016)).
I Recall that these same Seq2Seq models do not work in the technicaldocumentation tasks.
34
Polyglot Modeling on Benchmark SP Tasks
I Our Focus: Does this help on benchmark semantic parsing tasks?
Method Acc@1 (averaged)
Mul
tilin
gual
Geo
quer
y
mon
o.
UBL Kwiatkowski et al. (2010) 74.2TreeTrans Jones et al. (2012) 76.8Lexical SMT SSSP 68.6
poly.
Best Seq2Seq SSSP 78.0Lexical SMT SSSP 67.3Best Seq2Seq SSSP 79.6
I Multilingual Geoquery: monolingual/polyglot models on Geoquery inen, de, gr , th, polyglot setting improves accuracy, neural Seq2Seq modelsperform best (consistent with recent findings, (Dong and Lapata, 2016)).
I Recall that these same Seq2Seq models do not work in the technicaldocumentation tasks.
34
Polyglot Modeling on Benchmark SP Tasks
I Our Focus: Does this help on benchmark semantic parsing tasks?
Method Acc@1 (averaged)
Mul
tilin
gual
Geo
quer
y
mon
o.
UBL Kwiatkowski et al. (2010) 74.2TreeTrans Jones et al. (2012) 76.8Lexical SMT SSSP 68.6
poly.
Best Seq2Seq SSSP 78.0Lexical SMT SSSP 67.3Best Seq2Seq SSSP 79.6
I Multilingual Geoquery: monolingual/polyglot models on Geoquery inen, de, gr , th, polyglot setting improves accuracy, neural Seq2Seq modelsperform best (consistent with recent findings, (Dong and Lapata, 2016)).
I Recall that these same Seq2Seq models do not work in the technicaldocumentation tasks.
34
Benchmark SP Tasks: Mixed Language Decoding
I Introduced a new mixed language GeoQuery test set, each sentencecontains NPs from two or more languages.
Mixed Lang. Input: Wie hoch liegt der hochstgelegene punkt in Αλαμπάμα?LF: answer(elevation 1(highest(place(loc 2(stateid(’alabama’))))))
Method Acc@1 (averaged) Acc@10 (averaged)Mixed Best Monolingual Seq2Seq 4.2 18.2
Polyglot Seq2Seq 75.2 90.0
35
Learning from multiple datasets: Summary
I Take Away: polyglot modeling can be a useful technique for improvingsemantic parsing and transfer learning.
I New Ideas: Translating between datasets and languages, mixed languageparsing, hardness of classical SMT decoding..?
I Technical Docs: has features of a low-resource translation task, difficultespecially for neural modeling.
36
Semantic Parsing and Entailment
I Entailment: One of the basic aims of semantics. (Montague (1970))I Representations should be grounded in judgements about entailment.
input sem
All samples that containa major element
→Some sample that containsa major element
databaseJsemK ={S10019,S10059,...} ⊇ {S10019}
Semantic Parsing
Reasoning
(FOR EVERY X /MAJORELT : T;(FOR EVERY Y /SAMPLE : (CONTAINS Y X);(PRINTOUT Y)))
Knowledge Representation
38
Unit Testing our Semantic Parsers
I Minimal requirement: A semantic parser should be able to recognizecertain types of entailments.
I RTE: Would a human reading t infer h? Dagan et al. (2005)
sentence Logical Form (from corpus)t Pink3 passes to Pink7 pass(pink3,pink7)h Pink3 quickly passes to Pink7 pass(pink3,pink7)
Inference t → h Uncertain (human)
Sportscaster corpus (Chen and Mooney (2008))
39
Unit Testing our Semantic Parsers
I Minimal requirement: A semantic parser should be able to recognizecertain types of entailments.
I RTE: Would a human reading t infer h? Dagan et al. (2005)
sentence Logical Form (from corpus)t Pink3 passes to Pink7 pass(pink3,pink7)h Pink3 quickly passes to Pink7 pass(pink3,pink7)
Inference t → h Uncertain (human)t → h Entail (LF only)
Sportscaster corpus (Chen and Mooney (2008))
39
Unit Testing our Semantic Parsers
I Minimal requirement: A semantic parser should be able to recognizecertain types of entailments.
I RTE: Would a human reading t infer h? Dagan et al. (2005)
sentence Logical Form (from corpus)t Pink3 passes to Pink7 pass(pink3,pink7)h Pink3 kicks the ball kick(Pink3)
Inference t → h Entail (human)t → h Contradict (LF only)
Sportscaster corpus (Chen and Mooney (2008))
39
Unit Testing our Semantic ParsersI Minimal requirement: A semantic parser should be able to recognize
certain types of entailments.I RTE: Would a human reading t infer h? Dagan et al. (2005)
sentence Logical Form (from corpus)t Pink3 passes to Pink7 pass(pink3,pink7)h Pink3 kicks the ball kick(Pink3)
Inference t → h Entail (human)t → h Contradict (LF only)
Inference Prediction AccuracyMajority Baseline 0.33Logical Form Matching 0.60
Sportscaster corpus (Chen and Mooney (2008))
39
How to improve this? Test Driven Learning...
I General Problem: Semantic representations are underspecified, fail tocapture entailments, background knowledge missing.
I Goal: Capture the missing knowledge and inferential properties of text,incorporate entailment information into learning.
I Solution: Use entailment information (EI) and logical inference as weaksignal to train parser, in Richardson and Kuhn (2016).
Paradigm and Supervision Dataset D = Learning Goal
Learning from LFs {(inputi , LFi )}Ni input
Trans.−−−−→ LF
Learning from Entailment {(inputt , input′h)i , EIi )}N
i (inputt , input′h)
Proof−−−→ EI
40
How to improve this? Test Driven Learning...
I General Problem: Semantic representations are underspecified, fail tocapture entailments, background knowledge missing.
I Goal: Capture the missing knowledge and inferential properties of text,incorporate entailment information into learning.
I Solution: Use entailment information (EI) and logical inference as weaksignal to train parser, in Richardson and Kuhn (2016).
Paradigm and Supervision Dataset D = Learning Goal
Learning from LFs {(inputi , LFi )}Ni input
Trans.−−−−→ LF
Learning from Entailment {(inputt , input′h)i , EIi )}N
i (inputt , input′h)
Proof−−−→ EI
40
Learning from Entailment: IllustrationI Entailments are used to reason about target symbols and find holes in the
analyses.
input: (t,h) t pink3 λ passes to pink1
a
h pink3 quickly kicks λ
ypink3 ≡ pink3
Ipink3 ≡ pink3
λ wvcI
λ w quickly
pass v kick, pink1 v λ./
passes to pink1 v kicks./
passes to pink 1 # quickly kicks./
pink3 passes to pink1 # pink3 quickly kicks
EI z Uncertain
world
pink3/pink3
λ/wc
pass/kick
pink1/λ
Data: D = {((t, h)i , zi )}Ni=1, generic logical calculus. Task: learn (latent) proof y
41
Learning from Entailment: IllustrationI Entailments are used to reason about target symbols and find holes in the
analyses.
input: (t,h) t pink3 λ passes to pink1
a
h pink3 quickly kicks λ
ypink3 ≡ pink3
Ipink3 ≡ pink3
λ wvcI
λ w quickly
pass v kick, pink1 v λ./
passes to pink1 v kicks./
passes to pink 1 # quickly kicks./
pink3 passes to pink1 # pink3 quickly kicks
EI z Uncertain
world
pink3/pink3
λ/wc
pass/kick
pink1/λ
Data: D = {((t, h)i , zi )}Ni=1, generic logical calculus. Task: learn (latent) proof y
41
Learning from Entailment: IllustrationI Entailments are used to reason about target symbols and find holes in the
analyses.
input: (t,h) t pink3 λ passes to pink1
a
h pink3 quickly kicks λ
ypink3 ≡ pink3
Ipink3 ≡ pink3
λ wvcI
λ w quickly
pass v kick, pink1 v λ./
passes to pink1 v kicks./
passes to pink 1 # quickly kicks./
pink3 passes to pink1 # pink3 quickly kicks
EI z Uncertain
world
pink3/pink3
λ/wc
pass/kick
pink1/λ
Data: D = {((t, h)i , zi )}Ni=1, generic logical calculus. Task: learn (latent) proof y
41
Learning from Entailment: IllustrationI Entailments are used to reason about target symbols and find holes in the
analyses.
input: (t,h) t pink3 λ passes to pink1
a
h pink3 quickly kicks λ
ypink3 ≡ pink3
Ipink3 ≡ pink3
λ wvcI
λ w quickly
pass v kick, pink1 v λ./
passes to pink1 v kicks./
passes to pink 1 # quickly kicks./
pink3 passes to pink1 # pink3 quickly kicks
EI z Uncertain
world
pink3/pink3
λ/wc
pass/kick
pink1/λ
Data: D = {((t, h)i , zi )}Ni=1, generic logical calculus. Task: learn (latent) proof y
41
Learning from Entailment: IllustrationI Entailments are used to reason about target symbols and find holes in the
analyses.
input: (t,h) t pink3 λ passes to pink1
a
h pink3 quickly kicks λ
ypink3 ≡ pink3
Ipink3 ≡ pink3
λ wvcI
λ w quickly
pass v kick, pink1 v λ./
passes to pink1 v kicks./
passes to pink 1 # quickly kicks./
pink3 passes to pink1 # pink3 quickly kicks
EI z Uncertain
world
pink3/pink3
λ/wc
pass/kick
pink1/λ
Data: D = {((t, h)i , zi )}Ni=1, generic logical calculus. Task: learn (latent) proof y
41
Grammar Approach: Sentences to Logical Form
I Use a semantic CFG, rules constructed from target representations usingsmall set of templates (Borschinger et al. (2011))
(x : purple 10 quickly kicks, z : {kick(purple10), block(purple7),...})
↓ (rule extraction)
Rep
in transitive
kickc
kickw
kicks
λc
quickly
arg1
purple10c
purple10w
purple 10
Rep
arg1×
purple10c
purple10w
kicks
λc
quickly
in transitive×
kickc
kickw
purple 10
Rep
in transitive
blockc
blockw
kicks
λc
quickly
arg1
purple7c
purple7w
purple 10
Rep
in transitive
blockc
blockw
kicks
blockw
quickly
arg1
purple7c
purple7w
purple 10
kick(purple10) kick(purple10) block(purple7) block(purple9)
42
Grammar Approach: Sentences to Logical Form
I Use a semantic CFG, rules constructed from target representations usingsmall set of templates (Borschinger et al. (2011))
(x : purple 10 quickly kicks, z : {kick(purple10), block(purple7),...})
↓ (rule extraction)
Rep
in transitive
kickc
kickw
kicks
λc
quickly
arg1
purple10c
purple10w
purple 10
Rep
arg1×
purple10c
purple10w
kicks
λc
quickly
in transitive×
kickc
kickw
purple 10
Rep
in transitive
blockc
blockw
kicks
λc
quickly
arg1
purple7c
purple7w
purple 10
Rep
in transitive
blockc
blockw
kicks
blockw
quickly
arg1
purple9c
purple9w
purple 10
kick(purple10) kick(purple10) block(purple7) block(purple9)
42
Grammar Approach: Sentences to Logical Form
I Use a semantic CFG, rules constructed from target representations usingsmall set of templates (Borschinger et al. (2011))
(x : purple 10 quickly kicks, z : {kick(purple10), block(purple7),...})
↓ (rule extraction)
X X × ×Rep
in transitive
kickc
kickw
kicks
λc
quickly
arg1
purple10c
purple10w
purple 10
Rep
arg1×
purple10c
purple10w
kicks
λc
quickly
in transitive×
kickc
kickw
purple 10
Rep
in transitive
blockc
blockw
kicks
λc
quickly
arg1
purple7c
purple7w
purple 10
Rep
in transitive
blockc
blockw
kicks
blockw
quickly
arg1
purple9c
purple9w
purple 10
kick(purple10) kick(purple10) block(purple7) block(purple9)
42
Semantic Parsing as Grammatical Inference
I Rules used to define a PCFG Gθ, learn correct derivations.I Learning: EM bootstrapping approach (Angeli et al. (2012)
input d
Purple 7 kicks to Purple 4
worldz = {pass(purple7,purple4)}
Beam Parser θt
Interpretation
d1
Semsv
play-transitive
playerarg2
purple4c
purple 4
passr
passc
passes to
playerarg1
purple7c
purple 7
d2
Semsv
play-transitive
playerarg2
purple4c
purple 4
turnoverr
turnoverc
passes to
playerarg1
purple7c
purple 7
d3
Semsv
play-transitive
playerarg2
purple8c
purple 8
kickr
passc
passes to
playerarg1
purple7c
purple 7
d4
Semsv
playerarg1
play-transitive
passr
passc
purple 4
playerarg2
purple4c
passes to
playerarg1
purple7c
purple 7
... ... dk ...
k-best list
43
Semantic Parsing as Grammatical Inference
I Rules used to define a PCFG Gθ, learn correct derivations.I Learning: EM bootstrapping approach (Angeli et al. (2012)
input d
Purple 7 kicks to Purple 4
worldz = {pass(purple7,purple4)}
Beam Parser θt
Interpretation
d1
Semsv
play-transitive
playerarg2
purple4c
purple 4
passr
passc
passes to
playerarg1
purple7c
purple 7
d2
Semsv
play-transitive
playerarg2
purple4c
purple 4
turnoverr
turnoverc
passes to
playerarg1
purple7c
purple 7
d3
Semsv
play-transitive
playerarg2
purple8c
purple 8
kickr
passc
passes to
playerarg1
purple7c
purple 7
d4
Semsv
playerarg1
play-transitive
passr
passc
purple 4
playerarg2
purple4c
passes to
playerarg1
purple7c
purple 7
... ... dk ...
k-best list
43
Semantic Parsing as Grammatical Inference
I Rules used to define a PCFG Gθ, learn correct derivations.I Learning: EM bootstrapping approach (Angeli et al. (2012)
input d
Purple 7 kicks to Purple 4
worldz = {pass(purple7,purple4)}
Beam Parser θt
Interpretation
d1
Semsv
play-transitive
playerarg2
purple4c
purple 4
passr
passc
passes to
playerarg1
purple7c
purple 7
d2
Semsv
play-transitive
playerarg2
purple4c
purple 4
turnoverr
turnoverc
passes to
playerarg1
purple7c
purple 7
d3
Semsv
play-transitive
playerarg2
purple8c
purple 8
kickr
passc
passes to
playerarg1
purple7c
purple 7
d4
Semsv
playerarg1
play-transitive
passr
passc
purple 4
playerarg2
purple4c
passes to
playerarg1
purple7c
purple 7
... ... dk ...
k-best list
θt+1
43
Learning Entailment RulesI Rules define an inference PCFG G′θ, learn correct proofs, uses natural
logic calculus from MacCartney and Manning (2009).I Learning: Grammatical inference problem as before, EM bootsrapping.
input dt: pink 1 kicksh: pink 1 quickly passes to pink 2
world
z = Uncertain
Beam Parser θt
Interpretation
d1
w
w
w
λ/pink2
λ/ pink2
w
w
kick/pass
kicks / passes to
wc
λ/ v
λ / quickly
≡
pink1/pink1
pink 1 / pink 1
d2
w
w
w
λ/pink2
λ/ pink2
w
w
kick/pass
kicks / passes to
≡c
λ/ ≡
λ / quickly
≡
pink1/pink1
pink 1 / pink 1
d3
|
|
w
λ/pink2
λ/ pink2
|
|
kick/pass
kicks / passes to
wc
λ/ v
λ / quickly
≡
pink1/pink1
pink 1 / pink 1
d4
|
|
w
λ/pink2
λ/ pink2
|
|
kick/pass
kicks / passes to
≡c
λ/ ≡
λ / quickly
≡
pink1/pink1
pink 1 / pink 1
... ... dk ...
k-best list
θt+1
44
Reasoning about EntailmentI Improving the internal representations (before, a, after, b).
a.
Semsv
play-transitive
playerarg2
purple6c
6c
6 under pressure
purplec
purple
passr
passc
passp
passes to
playerarg1
purple9c
purple 9
b
Semsv
play-transitive
playerarg2
vc
vp
under pressure
purple6c
purple 6
passr
passc
passp
passes to
playerarg1
purple9c
purple 9
45
Reasoning about EntailmentI Learned modifiers from example proofs trees.
(t, h): (a beautiful pass to,passes to) (gets a free kick,freekick from the)
analysis:
vc ./≡play-tran=vplay-tran
modifier≡play-tran.
pass/pass
“pass to’/“passes to”
vc
vc /λ
“a beautiful”/λ
≡c ./≡game-play=≡game-play
modifier≡game-play
freekick/freekick
“free kick” / “freekick from the”
≡c
≡c /λ
“gets a”/λ
generalization: beautiful(X) v X get(X) ≡ X
(t, h): (yet again passes to,kicks to) (purple 10,purple 10 who is out front)
analysis:
vc ./≡play-tran.=vplay-tran
modifier
≡play-tran.
pass/pass
“passes to”/“kicks to”
vc
vc /λ
“yet again”/λ
≡playerarg2 ./wc=wplayerarg2
modifier
wc
λ/ vc
λ/“who is out front”
≡playerarg2
purple10/purple10
“purple 10”/“purple 10”
generalization: yet-again(X) v X X w out front(X)
45
Reasoning about EntailmentI Learned lexical relations from example proof trees
(t, h): (pink team is offsides,purple 9 passes) (bad pass.., loses the ball to)
analysis:
|teamarg1
substitute
pink team/purple9
“pink team’/“purple 9”
vplay-tran
substitute
bad pass/turnover
“bad pass .. picked off by”/“loses the ball to”
relation: pink team | purple9 bad pass v turnover
(t, h): (free kick for, steals the ball from) (purple 6 kicks to,purple 6 kicks)
analysis:
|game-play
substitute
free kick/steal
“free kick for”/“steals the ball from”
vplay-tran.
substitute
pass/kick
“kicks to”/“kicks’
relation: free kick| steal pass v kick
45
Reasoning about Entailment: Summary
I New Ideas: Evaluating semantic parsers on RTE.
Inference Prediction AccuracyMajority Baseline 0.33Logical Form Matching 0.60Learning from Entailment 0.73
I Take Away: Using entailment as a weak signal can help improverepresentations being learned, help theory construction.
I Conceptually: Training our semantic parsers to be more like asemanticist, work backwards from entailments to representations.
46
Reasoning about Entailment: Summary
I New Ideas: Evaluating semantic parsers on RTE.
Inference Prediction AccuracyMajority Baseline 0.33Logical Form Matching 0.60Learning from Entailment 0.73
I Take Away: Using entailment as a weak signal can help improverepresentations being learned, help theory construction.
I Conceptually: Training our semantic parsers to be more like asemanticist, work backwards from entailments to representations.
46
Reasoning about Entailment: Summary
I New Ideas: Evaluating semantic parsers on RTE.
Inference Prediction AccuracyMajority Baseline 0.33Logical Form Matching 0.60Learning from Entailment 0.73
I Take Away: Using entailment as a weak signal can help improverepresentations being learned, help theory construction.
I Conceptually: Training our semantic parsers to be more like asemanticist, work backwards from entailments to representations.
46
Reasoning about Entailment: Summary
I New Ideas: Evaluating semantic parsers on RTE.
Inference Prediction AccuracyMajority Baseline 0.33Logical Form Matching 0.60Learning from Entailment 0.73
I Take Away: Using entailment as a weak signal can help improverepresentations being learned, help theory construction.
I Conceptually: Training our semantic parsers to be more like asemanticist, work backwards from entailments to representations.
46
Conclusions
I Natural language understanding: pursue a data-driven approach,centering around semantic parser induction.
I Explore many areas of the problem, developed new resources andtechniques, looking at the problem holistically.
I Easy to get stuck on the translation problem, must be integratedwithin a broader theory of KR and reasoning.
I Question: To what extend can we use source code in the wild as aKR for deep NLU?
48
Conclusions
I Natural language understanding: pursue a data-driven approach,centering around semantic parser induction.
I Explore many areas of the problem, developed new resources andtechniques, looking at the problem holistically.
I Easy to get stuck on the translation problem, must be integratedwithin a broader theory of KR and reasoning.
I Question: To what extend can we use source code in the wild as aKR for deep NLU?
48
References IAllamanis, M., Tarlow, D., Gordon, A., and Wei, Y. (2015). Bimodal modelling of
source code and natural language. In International Conference on MachineLearning, pages 2123–2132.
Angeli, G., Manning, C. D., and Jurafsky, D. (2012). Parsing time: Learning tointerpret time expressions. In Proceedings of NAACL-2012, pages 446–455.
Arthur, P., Neubig, G., and Nakamura, S. (2016). Incorporating Discrete TranslationLexicons into Neural Machine Translation. arXiv preprint arXiv:1606.02006.
Barone, A. V. M. and Sennrich, R. (2017). A parallel corpus of python functions anddocumentation strings for automated code documentation and code generation.arXiv preprint arXiv:1707.02275.
Borschinger, B., Jones, B. K., and Johnson, M. (2011). Reducing grounded learningtasks to grammatical inference. In Proceedings of EMNLP-2011, pages 1416–1425.
Chen, D. L. and Mooney, R. J. (2008). Learning to sportscast: A test of groundedlanguage acquisition. In Proceedings of ICML-2008, pages 128–135.
Cheng, J., Reddy, S., Saraswat, V., and Lapata, M. (2017). Learning an executableneural semantic parser. arXiv preprint arXiv:1711.05066.
Cormen, T., Leiserson, C., Rivest, R., and Stein, C. (2009). Introduction toAlgorithms. MIT Press.
Dagan, I., Glickman, O., and Magnini, B. (2005). The pascal recognizing textualentailment challenge. In Proceedings of the PASCAL Challenges Workshop onRecognizing Textual Entailment.
50
References IIDeng, H. and Chrupa la, G. (2014). Semantic approaches to software component
retrieval with English queries. In Proceedings of LREC-14, pages 441–450.Dong, L. and Lapata, M. (2016). Language to logical form with neural attention.
arXiv preprint arXiv:1601.01280.Gu, X., Zhang, H., Zhang, D., and Kim, S. (2016). Deep api learning. In Proceedings
of the 2016 24th ACM SIGSOFT International Symposium on Foundations ofSoftware Engineering, pages 631–642. ACM.
Herzig, J. and Berant, J. (2017). Neural semantic parsing over multipleknowledge-bases. In Proceedings of ACL.
Iyer, S., Konstas, I., Cheung, A., and Zettlemoyer, L. (2016). Summarizing SourceCode using a Neural Attention Model. In Proceedings of ACL, volume 1.
Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., Thorat, N.,Viegas, F., Wattenberg, M., Corrado, G., et al. (2016). Google’s MultilingualNeural Machine Translation System: Enabling Zero-Shot Translation. arXivpreprint arXiv:1611.04558.
Jones, B. K., Johnson, M., and Goldwater, S. (2012). Semantic parsing with BayesianTree Transducers. In Proceedings of ACL-2012, pages 488–496.
Krishnamurthy, J., Dasigi, P., and Gardner, M. (2017). Neural Semantic Parsing withType Constraints for Semi-Structured Tables. In Proceedings of EMNLP.
51
References IIIKwiatkowski, T., Zettlemoyer, L., Goldwater, S., and Steedman, M. (2010). Inducing
probabilistic CCG grammars from logical form with higher-order unification. InProceedings of EMNLP-2010, pages 1223–1233.
MacCartney, B. and Manning, C. D. (2009). An extended model of natural logic. InProceedings of the eighth International Conference on Computational Semantics,pages 140–156.
Montague, R. (1970). Universal grammar. Theoria, 36(3):373–398.Richardson, K. (2018). A Language for Function Signature Representations. arXiv
preprint arXiv:1804.00987.Richardson, K., Berant, J., and Kuhn, J. (2018). Polyglot Semantic Parsing in APIs.
In Proceedings of NAACL.Richardson, K. and Kuhn, J. (2016). Learning to Make Inferences in a Semantic
Parsing Task. Transactions of the Association for Computational Linguistics,4:155–168.
Richardson, K. and Kuhn, J. (2017a). Function Assistant: A Tool for NL Querying ofAPIs. In Proceedings of EMNLP-17.
Richardson, K. and Kuhn, J. (2017b). Learning Semantic Correspondences inTechnical Documentation. In Proceedings of ACL-17.
Richardson, K., Zarrieß, S., and Kuhn, J. (2017). The Code2Text Challenge: TextGeneration in Source Code Libraries. In Proceedings of INLG.
Woods, W. A. (1973). Progress in natural language understanding: an application tolunar geology. In Proceedings of the June 4-8, 1973, National ComputerConference and Exposition, pages 441–450.
52