efficient kernels for sentence pair classification
Post on 23-Feb-2016
38 Views
Preview:
DESCRIPTION
TRANSCRIPT
Fabio Massimo Zanzotto and Lorenzo Dell’ArcipreteUniversity of Rome “Tor Vergata”
Roma, Italy
Efficient kernels for sentence pair classification
F.M.Zanzotto
University of Rome “Tor Vergata”
• Classifying sentence pairs is an important activity in many NLP tasks, e.g.:– Textual Entailment Recognition– Machine Translation– Question-Answering
• Classifiers need suitalble feature spaces
Motivation
F.M.Zanzotto
University of Rome “Tor Vergata”
For example, in textual entailment…
Motivation
T1
H1
“Farmers feed cows animal extracts”
“Cows eat animal extracts”
P1: T1 H1
T2
H2
“They feed dolphins fishs”
“Fishs eat dolphins”
P2: T2 H2
T3
H3
“Mothers feed babies milk”
“Babies eat milk”
P3: T3 H3
Training examples
Classification
Relevant Featuresfeed eatX Y X Y
First-order rules
F.M.Zanzotto
University of Rome “Tor Vergata”
• First-order rule (FOR) feature spaces: a challenge
• Tripartite Directed Acyclic Graphs (tDAG) as a solution:– for modelling FOR feature spaces– for defining efficient algorithms for computing kernel functions
with tDAGs in FOR feature spaces
• An efficient algorithm for computing kernels in FOR spaces
• Experimental and comparative assessment of the computational efficiency of the proposed algorithm
In this talk…
F.M.Zanzotto
University of Rome “Tor Vergata”
We want to exploit first-order rule (FOR) feature spaces writing the implicit kernel function
K(P1,P2)=|S(P1)S(P2)|that computes how many common first-order rules are activated from P1 and P2
Without loss of generality, we present the problem in syntactic-first-order rule feature spaces
First-order rule (FOR) feature spaces: challenges
F.M.Zanzotto
University of Rome “Tor Vergata”
• … using the Kernel Trick: – define the distance K(P1 , P2) – instead of defining the feautures
Observations
T1 H1
T1 H2
K(T1 H1,T1 H2)
F.M.Zanzotto
University of Rome “Tor Vergata”
First-order rule (FOR) feature spaces: challenges
S
NP VP
VB NP
eat
VP
VB NP
feed
NPNNS
CowsNN NNS
animal extractsNNS
cows
NN NNS
animal extracts
S
NP
NNS
Farmers
1 2 3
3
3
1 2
1
2 3
3
3
21
1
1
,
VP
S
NP
S
NP VP1 , VP
VB NP NP 31
S
NP VP
VB NP 3
1 ,, ,...{ }
T1
H1
“Farmers feed cows animal extracts”
“Cows eat animal extracts”
T1 H1
feedeat
Pa=
S(Pa)=
Adding placeholdersPropagating placeholders
F.M.Zanzotto
University of Rome “Tor Vergata”
First-order rule (FOR) feature spaces: challenges
S
NP VP
VB
eat
VP
VB NP
feed
NPNNS
Babies
NNS
babies
NN
milk
S
NP
NNS
Mothers
1 2
2
1 2
1
1
1
1
, NP
NN
milk2
2
2
T3
H3
“Mothers feed babies milk”“Babies eat milk”
T3 H3
Pb=
S(Pb)=VP
S
NP
S
NP VP1 , VP
VB NP NP 21
S
NP VP
VB NP 2
1 ,, ,...{ }
feedeat
F.M.Zanzotto
University of Rome “Tor Vergata”
First-order rule (FOR) feature spaces: challenges
S
NP VP
VB NP
X
Y
eat
VP
VB NP X
feed
NP Y
VP
S
NP
S
NP VP1 , VP
VB NP NP 21
S
NP VP
VB NP 2
1 ,, ,...{ }
feedeat
VP
S
NP
S
NP VP1 , VP
VB NP NP 31
S
NP VP
VB NP 3
1 ,, ,...{ }
feedeat
K(Pa,Pb)=|S(Pa)S(Pb)|
S(Pb)=
S(Pa)=
,=
==
F.M.Zanzotto
University of Rome “Tor Vergata”
• FOR feature spaces can be modelled with particular graphs
• We call these graphs tripartite direct acyclic graphs (tDAGs)
• Observations:– tDAGs are not trees– tDAGs can be used to model both rules and sentence
pairs– unifying rules in sentences is a graph matching problem
A step back…
F.M.Zanzotto
University of Rome “Tor Vergata”
As for Feature Structures…
Tripartite Directed Acyclic Graphs (tDAG)
S
NP VP
VB NP
X
Y
eat
VP
VB NP X
feed
NP Y
S
NP VP
VB NP
eat
VP
VB NP
feed
NPNNS
CowsNN NNS
animal extractsNNS
cows
NN NNS
animal extracts
S
NP
NNS
Farmers
1 2 3
3
3
1 2
1
2 3
3
3
21
1
1
F.M.Zanzotto
University of Rome “Tor Vergata”
As for Feature Structures…
Tripartite Directed Acyclic Graphs (tDAG)
S
NP VP
VB NP
X
Y
eat
VP
VB NP X
feed
NP Y
S
NP VP
VB NP
eat
VP
VB NP
feed
NPNNS
CowsNN NNS
animal extractsNNS
cows
NN NNS
animal extracts
S
NP
NNS
Farmers
1 2 3
3
3
1 2
1
2 3
3
3
21
1
1
F.M.Zanzotto
University of Rome “Tor Vergata”
S
NP VP
NP
eat
VP
VB
feed
NPNP
VB
A tripartite directed acyclic graph is a graph G = (N,E)
where:• the set of nodes N is partitioned in three sets Nt, Ng, and A• the set of edges is partitioned in four sets Nt, Ng, EA(t), and
EA(g)
where t = (Nt,Et) and g = (Nt,Et) are two trees EA(t) = {(x, y)|x Nt and yA} EA(g) = {(x, y)|x Ng and yA}
Tripartite Directed Acyclic Graphs (tDAGs)
F.M.Zanzotto
University of Rome “Tor Vergata”
Alternative definitionA tDAG is a pair of extented trees
G = (t,g) where:t = (NtAt,EtEA(t)) and g = (NgAg,EgEA(g)).
Tripartite Directed Acyclic Graphs (tDAGs)
S
NP VP
NP
eat
VP
VB
feed
NPNP
VB
S
NP VP
NP
eat
VP
VB
feed
NPNP
VB
X
Y
X Y
F.M.Zanzotto
University of Rome “Tor Vergata”
Computing the implicit kernel functionK(P1,P2)=|S(P1)S(P2)|
involves general graph matching. This is an exponential problem.
Yet…tDAGs are particular graphs and we can define an efficient algorithm
We will analyze the isomorphism among tDAGs and we will derive an algorithm for
Again challenges
F.M.Zanzotto
University of Rome “Tor Vergata”
Isomorphism between graphs
G1=(N1,E1) and G2=(N2,E2) are isomorphic if:– |N1|=|N2| and |E1|=|E2|– Among all the bijecive functions relating N1 and N2, it
exists f : N1 N2 such that:• for each n1 in N1, Label(n1)=Label(f(n1))• for each (na,nb) in E1, (f(na),f(nb)) is in E2
Isomorphism between tDAGs
F.M.Zanzotto
University of Rome “Tor Vergata”
Isomorphism adapted to tDAGsG1 = (t1,g1) and G2 = (t2,g2) are isomorphic if these
two properties hold– Partial isomorphism
• g1 and g2 are isomorphic• t1 and t2 are isomorphic• This property generates two functions fg and ft
– Constraint compatibility• fg and ft are compatible on the sets of nodes A1 and A2, if
for each n A1, it happens that f g (n) = ft (n).
Isomorphism between tDAGs
F.M.Zanzotto
University of Rome “Tor Vergata”
Isomorphism between tDAGs
VP
VB NP NP 31
S
NP VP
VB NP 3
1 ,
VP
VB NP NP 21
S
NP VP
VB NP 2
1 ,
Ct=
Ct= Cg
1 1{ ), 3 2( ),( }, Cg= 1 1{ ), 3 2( ),( },
Partial isomorphism
Constraint compatibility
Pa=(ta,ga)=
Pb=(tb,gb)=
F.M.Zanzotto
University of Rome “Tor Vergata”
We defineK(P1,P2)=|S(P1)S(P2)|
using the isomorphism between tDAGs
The idea: reverse the order of isomorphism detection• First, constraint compatibility
– Building a set C of all the relevant alternative constraints – Finding subsets of S(P1)S(P2) meeting a constraint cC
• Second, partial isomorphism detection
Ideas for building the kernelsubsets of S(P1)S(P2) Alternative constraints
Partial Isomorphism
Constraint compatibility
F.M.Zanzotto
University of Rome “Tor Vergata”
Ideas for building the kernel
A
B C
C
1
1
C 2B B 21
1
1
A
B C
C
1
1
C 3B B 21
1
1
I
M N
N
1
1
N 1M M 12
1
2
I
M N
N
1
1
N 1M M 13
1
2
,
,
C={c1,c2}={ 1 1{ ), 2 2( ),( }, , 1 1{ ), 2 3( ),( }, }
K(Pa,Pb)=|S(Pa)S(Pb)|
Pa=(ta,ga)=
Pb=(tb,gb)=
subsets of S(P1)S(P2) Alternative constraints
Partial Isomorphism
Constraint compatibility
F.M.Zanzotto
University of Rome “Tor Vergata”
Ideas for building the kernel
A
B C
C
1
1
C 2B B 21
1
1
A
B C
C
1
1
C 3B B 21
1
1
I
M N
N
1
1
N 1M M 12
1
2
I
M N
N
1
1
N 1M M 13
1
2
,
,
1 1{ ), 2 2( ),( },c1=
A
B C
1
1
B B 21
1
I
M N
N
1
1
N 1
1
2
,
A
B C
1
1 1
I
M N
N
1
1
N 1
1
2
,
A
B C
1
1
B B 21
1I
M N
1
1 1
,
A
B C
1
1 1
I
M N
1
1 1 ,{
}
, ,
,
C={c1,c2}
S(Pa)S(Pb)) c1=
Pa=
Pb=
subsets of S(P1)S(P2) Alternative constraints
Partial Isomorphism
Constraint compatibility
K(Pa,Pb)=|S(Pa)S(Pb)|K(Pa,Pb)=|S(Pa)S(Pb)|=|(S(Pa)S(Pb)) c1(S(Pa)S(Pb)) c2|
F.M.Zanzotto
University of Rome “Tor Vergata”
Ideas for building the kernel
A
B C
C
1
1
C 2B B 21
1
1
A
B C
C
1
1
C 3B B 21
1
1
I
M N
N
1
1
N 1M M 12
1
2
I
M N
N
1
1
N 1M M 13
1
2
,
,
1 1{ ), 2 3( ),( },c2=
A
B C
1
1
C C 21
1
I
M N
M
1
1
M 1
1
2
,
A
B C
1
1 1
I
M N
N
1
1
N 1
1
2
,
A
B C
1
1
C C 21
1I
M N
1
1 1
,
A
B C
1
1 1
I
M N
1
1 1 ,{
}
, ,
,
C={c1,c2}
K(Pa,Pb)=|S(Pa)S(Pb)|=|(S(Pa)S(Pb)) c1(S(Pa)S(Pb)) c2|
Pa=
Pb=
S(Pa)S(Pb)) c2=
subsets of S(P1)S(P2) Alternative constraints
Partial Isomorphism
Constraint compatibility
F.M.Zanzotto
University of Rome “Tor Vergata”
Ideas for building the kernel
A
B C
1
1
B B 21
1
I
M N
N
1
1
N 1
1
2
,
A
B C
1
1 1
I
M N
N
1
1
N 1
1
2
,
A
B C
1
1
B B 21
1I
M N
1
1 1
,
A
B C
1
1 1
I
M N
1
1 1 ,=
{
} =
, ,
,
={A
B C
1
1
B B 21
1
I
M N
N
1
1
N 1
1
2
,A
B C
1
1 1
I
M N
1
1 1, }
=}{
(S(Pa)S(Pb)) c1
=(S(ta)S(tb)) c1 (S(ga)S(gb)) c1
K(Pa,Pb)=|cC(S(Pa)S(Pb))c|=|cC (S(ta)S(tb))c(S(ga)S(gb))c|
subsets of S(P1)S(P2) Alternative constraints
Partial Isomorphism
Constraint compatibility
F.M.Zanzotto
University of Rome “Tor Vergata”
The general Equation
can be computed using:1) KS (kernel function for trees) introduced in(Duffy&Collins, 2001)
and refined in (Moschitti&Zanzotto, 2007)2) The inclusion exclusion principle
Kernel on FOR feature spaces
K(P1,P2)=|cC (S(t1)S(t2))c(S(g1)S(g2))c|
F.M.Zanzotto
University of Rome “Tor Vergata”
• Comparison Kernel (Zanzotto&Moschitti, Coling-ACL 2006),(Moschitti&Zanzotto, ICML 2007)
• Test-bed: corpus– Recognizing Textual Entailment challenge data
Computational Efficency Analysis
F.M.Zanzotto
University of Rome “Tor Vergata”
Computational Efficency Analysis
Execution time in seconds (s) for all the RTE2 with respect to different numbers of allowed placeholders
F.M.Zanzotto
University of Rome “Tor Vergata”
• Training: RTE 1, 2, 3 • Testing: RTE 4
Accuracy Comparison
F.M.Zanzotto
University of Rome “Tor Vergata”
• We reduced kernels in first-order feature spaces as graph-matching problems
• We defined a new class of graphs, tDAGs• We presented an efficient algorithm for computing
kernels in FOR feature spaces
Conclusions
F.M.Zanzotto
University of Rome “Tor Vergata”
S
NP VP
VB NP
eat
VP
VB NP
feed
NPNNS
CowsNN NNS
animal extractsNNS
cows
NN NNS
animal extracts
S
NP
NNS
Farmers
1 2 3
3
3
1 2
1
2 3
3
3
21
1
1
,
VP
S
NP
S
NP VP1 , VP
VB NP NP 31
S
NP VP
VB NP 3
1 ,, ,...{ }
F.M.Zanzotto
University of Rome “Tor Vergata”
VP
S
NP
S
NP VP1 , VP
VB NP NP 21
S
NP VP
VB NP 2
1 ,, ,...{ }
S
NP VP
VB
eat
VP
VB NP
feed
NPNNS
Cows
NNS
babies
NN
milk
S
NP
NNS
Mothers
1 2
2
1 2
1
1
1
1
, NP
NN
milk2
2
2
VP
S
NP
S
NP VP1 , VP
VB NP NP 31
S
NP VP
VB NP 3
1 ,, ,...{ }
top related