natural language question answering over knowledge graph a...
TRANSCRIPT
Natural Language Question Answering OverKnowledge Graph—A Data-Driven Approach
Lei Zou
Peking UniversityInstitute of Computer Science and Technology
Joint work with Jeffery Xu Yu(CUHK), Haixun Wang (Google),Tamer Ozsu (UW), Lei Chen (HKUST), Ruizhe Huang, Shuo
Han, Youhuan Li, Dongyan Zhao (PKU)
RDF and Semantic Web
I RDF is a language for the conceptual modeling of informationabout web resources
I A building block of semantic webI Facilitates exchange of informationI Search engines can retrieve more relevant informationI Facilitates data integration (mashes)
I Machine understandableI Understand the information on the web and the
interrelationships among them
Outline
RDF Introduction
gAnswer: Natural Language Question Answering over RDFA Data Driven Approach [Zou et al., SIGMOD 2014; Zheng etal., SIGMOD 2015]
Outline
RDF Introduction
gAnswer: Natural Language Question Answering over RDFA Data Driven Approach [Zou et al., SIGMOD 2014; Zheng etal., SIGMOD 2015]
RDF Introduction
I Everything is an uniquely namedresource
I Namespaces can be used to scopethe names
I Properties of resources can bedefined
I Relationships with other resourcescan be defined
I Resources can be contributed bydifferent people/groups and can belocated anywhere in the web
I Integrated web “database”
http://en.wikipedia.org/wiki/Abraham Lincoln
xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln
Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”
y:Washington DC
Abraham Lincoln:DiedIn
RDF Introduction
I Everything is an uniquely namedresource
I Namespaces can be used to scopethe names
I Properties of resources can bedefined
I Relationships with other resourcescan be defined
I Resources can be contributed bydifferent people/groups and can belocated anywhere in the web
I Integrated web “database”
http://en.wikipedia.org/wiki/Abraham Lincoln
xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln
Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”
y:Washington DC
Abraham Lincoln:DiedIn
RDF Introduction
I Everything is an uniquely namedresource
I Namespaces can be used to scopethe names
I Properties of resources can bedefined
I Relationships with other resourcescan be defined
I Resources can be contributed bydifferent people/groups and can belocated anywhere in the web
I Integrated web “database”
http://en.wikipedia.org/wiki/Abraham Lincoln
xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln
Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”
y:Washington DC
Abraham Lincoln:DiedIn
RDF Introduction
I Everything is an uniquely namedresource
I Namespaces can be used to scopethe names
I Properties of resources can bedefined
I Relationships with other resourcescan be defined
I Resources can be contributed bydifferent people/groups and can belocated anywhere in the web
I Integrated web “database”
http://en.wikipedia.org/wiki/Abraham Lincoln
xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln
Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”
y:Washington DC
Abraham Lincoln:DiedIn
RDF Introduction
I Everything is an uniquely namedresource
I Namespaces can be used to scopethe names
I Properties of resources can bedefined
I Relationships with other resourcescan be defined
I Resources can be contributed bydifferent people/groups and can belocated anywhere in the web
I Integrated web “database”
http://en.wikipedia.org/wiki/Abraham Lincoln
xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln
Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”
y:Washington DC
Abraham Lincoln:DiedIn
RDF Data ModelI Triple: Subject, Predicate (Property),
Object (s, p, o)
Subject: the entity that is described(URI or blank node)
Predicate: a feature of the entity (URI)Object: value of the feature (URI,
blank node or literal)
I (s, p, o) ∈ (U ∪ B)× U × (U ∪ B ∪ L)
I Set of RDF triples is called an RDF graph
U
Subject Object
U B U B L
U: set of URIsB: set of blank nodesL: set of literals
Predicate
Subject Predicate ObjectAbraham Lincoln hasName “Abraham Lincoln”Abraham Lincoln BornOnDate “1809-02-12”Abraham Lincoln DiedOnDate “1865-04-15”
RDF Example InstancePrefix: y=http://en.wikipedia.org/wiki
Subject Predicate Object
y: Abraham Lincoln hasName “Abraham Lincoln”y: Abraham Lincoln BornOnDate “1809-02-12”’y: Abraham Lincoln DiedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy: Abraham Lincoln DiedIn y: Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y: Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roosevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”
URI
Literal
URI
RDF Graph
y:Abraham Lincoln
“Abraham Lincoln”
hasName
“1809-02-12”bornOnDate
“1865-04-15”
diedOnDate
“President”
title
“Male”
gender
y:Washington D.C.
“1790”
foundYear
“Washington D.C.”
hasName
y:Hodgenville KY “Hodgenville”hasName
y:United States
“United States”
hasName
“1776”
foundingYear
y:Reese Witherspoon
“1976-03-22”
bornOnDate
“Female”
gender
“Actress”
title
“Reese Witherspoon”
hasName
y:New Orleans LA
“1718”
foundingYear
y:Franklin Roosevelt
“Franklin D. Roosevelt”
hasName
“Male”
gender
“President”
title
y:Hyde Park NY
“1810”
foundingYear
y:Marilyn Monroe“1962-08-05”diedOnDate
“1926-07-01”
bornOnDate
“Female”
gender
“Marilyn Monroe”hasName
diedIn
bornIn
hasCapital
bornInlocatedIn locatedIn
bornIn
RDF Query ModelI Query Model - SPARQL Protocol and RDF Query LanguageI Given U (set of URIs), L (set of literals), and V (set of
variables), a SPARQL expression is defined recursively:I an atomic triple pattern, which is an element of
(U ∪ V )× (U ∪ V )× (U ∪ V ∪ L)
I ?x hasName “Abraham Lincoln”
I P FILTER R, where P is a graph pattern expression and R is abuilt-in SPARQL condition (i.e., analogous to a SQL predicate)
I ?x price ?p FILTER(?p < 30)
I P1 AND/OPT/UNION P2, where P1 and P2 are graphpattern expressions
I Example:SELECT ?nameWHERE {
?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )
}
SPARQL Queries
SELECT ?nameWHERE {
?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )
}
?m ?citybornIn
?name
hasName
?bd
bornOnDate
“1718”
foundingYear
FILTER(regex(str(?bd),“1976”))
Naıve Triple Store Design
SELECT ?nameWHERE {
?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )
}Subject Property Objecty:Abraham Lincoln hasName “Abraham Lincoln”y:Abraham Lincoln bornOnDate “1809-02-12”y:Abraham Lincoln diedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy:Abraham Lincoln diedIn y:Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y:Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roo-
sevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”
Too many self-joins!
Naıve Triple Store Design
SELECT ?nameWHERE {
?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )
}Subject Property Objecty:Abraham Lincoln hasName “Abraham Lincoln”y:Abraham Lincoln bornOnDate “1809-02-12”y:Abraham Lincoln diedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy:Abraham Lincoln diedIn y:Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y:Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roo-
sevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”
SELECT T2 . o b j e c tFROM T as T1 , T as T2 , T as T3 ,
T as T4WHERE T1 . p r o p e r t y=” b o r n I n ”AND T2 . p r o p e r t y=”hasName”AND T3 . p r o p e r t y=” bornOnDate ”AND T1 . s u b j e c t=T2 . s u b j e c tAND T2 . s u b j e c t=T3 . s u b j e c tAND T4 . p r o p e t y=” f o u n d i n g Y e a r ”AND T1 . o b j e c t=T4 . s u b j e c tAND T4 . o b j e c t=” 1718 ”AND T3 . o b j e c t LIKE ’%1976% ’
Too many self-joins!
Naıve Triple Store Design
SELECT ?nameWHERE {
?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )
}Subject Property Objecty:Abraham Lincoln hasName “Abraham Lincoln”y:Abraham Lincoln bornOnDate “1809-02-12”y:Abraham Lincoln diedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy:Abraham Lincoln diedIn y:Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y:Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roo-
sevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”
SELECT T2 . o b j e c tFROM T as T1 , T as T2 , T as T3 ,
T as T4WHERE T1 . p r o p e r t y=” b o r n I n ”AND T2 . p r o p e r t y=”hasName”AND T3 . p r o p e r t y=” bornOnDate ”AND T1 . s u b j e c t=T2 . s u b j e c tAND T2 . s u b j e c t=T3 . s u b j e c tAND T4 . p r o p e t y=” f o u n d i n g Y e a r ”AND T1 . o b j e c t=T4 . s u b j e c tAND T4 . o b j e c t=” 1718 ”AND T3 . o b j e c t LIKE ’%1976% ’
Too many self-joins!
Existing Solutions
1. Property tableI Each class of objects go to a different table ⇒ similar to
normalized relationsI Eliminates some of the joins
2. Vertically partitioned tablesI For each property, build a two-column table, containing both
subject and object, ordered by subjectsI Can use merge join (faster)I Good for subject-subject joins but does not help with
subject-object joins
3. Exhaustive indexingI Create indexes for each permutation of the three columnsI Query components become range queries over individual
relations with merge-join to combineI Excessive space usage
gStore [Zou et al., VLDB 11], [Zou et al., VLDB J 14] –General Idea
I We work directly on the RDF graph and the SPARQL querygraph
I Answering SPARQL query ≡ subgraph matchingI Subgraph matching is computationally expensive
I Use a signature-based encoding of each entity and class vertexto speed up matching
I Filter-and-evaluateI Use a false positive algorithm to prune nodes and obtain a set
of candidates; then do more detailed evaluation on those
I We develop an index (VS∗-tree) over the data signature graph(has light maintenance load) for efficient pruning
0. Start with RDF Graph G
?m ?citybornIn
?name
hasName
?bd
bornOnDate
“1718”
foundingYear
FILTER(regex(str(?bd),“1976”))
y:Abraham Lincoln
“Abraham Lincoln”
hasName
“1809-02-12”bornOnDate
“1865-04-15”
diedOnDate
“President”
title
“Male”
gender
y:Washington D.C.
“1790”
foundYear
“Washington D.C.”
hasName
y:Hodgenville KY “Hodgenville”hasName
y:United States
“United States”
hasName
“1776”
foundingYear
y:Reese Witherspoon
“1976-03-22”
bornOnDate
“Female”
gender
“Actress”
title
“Reese Witherspoon”
hasName
y:New Orleans LA
“1718”
foundingYear
y:Franklin Roosevelt
“Franklin D. Roosevelt”
hasName
“Male”
gender
“President”
title
y:Hyde Park NY
“1810”
foundingYear
y:Marilyn Monroe“1962-08-05”diedOnDate
“1926-07-01”
bornOnDate
“Female”
gender
“Marilyn Monroe”hasName
diedIn
bornIn
hasCapital
bornInlocatedIn locatedIn
bornIn
0. Start with RDF Graph G
?m ?citybornIn
?name
hasName
?bd
bornOnDate
“1718”
foundingYear
FILTER(regex(str(?bd),“1976”))
y:Abraham Lincoln
“Abraham Lincoln”
hasName
“1809-02-12”bornOnDate
“1865-04-15”
diedOnDate
“President”
title
“Male”
gender
y:Washington D.C.
“1790”
foundYear
“Washington D.C.”
hasName
y:Hodgenville KY “Hodgenville”hasName
y:United States
“United States”
hasName
“1776”
foundingYear
y:Reese Witherspoon
“1976-03-22”
bornOnDate
“Female”
gender
“Actress”
title
“Reese Witherspoon”
hasName
y:New Orleans LA
“1718”
foundingYear
y:Franklin Roosevelt
“Franklin D. Roosevelt”
hasName
“Male”
gender
“President”
title
y:Hyde Park NY
“1810”
foundingYear
y:Marilyn Monroe“1962-08-05”diedOnDate
“1926-07-01”
bornOnDate
“Female”
gender
“Marilyn Monroe”hasName
diedIn
bornIn
hasCapital
bornInlocatedIn locatedIn
bornIn
Finding matches over a largegraph is not a trivial task!
Outline
RDF Introduction
gAnswer: Natural Language Question Answering over RDFA Data Driven Approach [Zou et al., SIGMOD 2014; Zheng etal., SIGMOD 2015]
gAnswer: Natural Language Question Answering OverKnowledge Graph–A Graph Data Driven Approach
I An Easy-to-Use Interface to Access Knowledge GraphI It is interesting to both academia and industry.I Interdisciplinary research between database and NLP (natural
language processing) communities.
gAnswer: Natural Language Question Answering OverKnowledge Graph–A Graph Data Driven Approach
I An Easy-to-Use Interface to Access Knowledge GraphI It is interesting to both academia and industry.I Interdisciplinary research between database and NLP (natural
language processing) communities.
gAnswer
Running Example
Question: Who was married to an actor that play in Philadelphia ?
Subject Property ObjectAntonio Banderas type actorAntonio Banderas spouse Melanie GriffithAntonio Banderas starring Philadelphia (film)Philadelphia (film) type filmJonathan Demme director filmPhiladelphia type cityAaron McKie bornIn PhiladelphiaJames Anderson playForTeam Philadelphia 76ersConstantin Stanislavski create An Actor PreparesPhiladelphia 76ers type Basketball teamAn Actor Prepares type Book
Melanie Griffith
Running Example
Question: Who was married to an actor that play in Philadelphia ?
Subject Property ObjectAntonio Banderas type actor
Antonio Banderas spouse Melanie Griffith
Antonio Banderas starring Philadelphia (film)Philadelphia (film) type filmJonathan Demme director filmPhiladelphia type cityAaron McKie bornIn PhiladelphiaJames Anderson playForTeam Philadelphia 76ersConstantin Stanislavski create An Actor PreparesPhiladelphia 76ers type Basketball teamAn Actor Prepares type Book
Melanie Griffith
Search Needs a Shake-Up
“ Academics and industry researchers mustinvest much more in bold strategies thatcan achieve natural-language searching andanswering.”—Oren Etzioni, NATURE, Vol 476, p25-26,2011.
Some Interesting ProductsEVI— acquired by Amazon on October 2012.
William Tunstall-Pedoe: True Knowledge: Open-Domain Question Answering UsingStructured Knowledge and Inference. AI Magazine 31(3): 80-92 (2010)
Some Interesting ProductsWolframAlpha
Existing Solutions
Who was married to an actor that play in Philadelphia ?
Translate NL Question to structured queriesSELECT ? yWHERE {? x s t a r r i n g P h i l a d e l p h i a ( f i l m ) .? x t y p e a c t o r .? x s p o u s e ? y . }
Query Processing
Melanie Griffith
Ambiguity
Existing Solutions
Who was married to an actor that play in Philadelphia ?
Translate NL Question to structured queriesSELECT ? yWHERE {? x s t a r r i n g P h i l a d e l p h i a ( f i l m ) .? x t y p e a c t o r .? x s p o u s e ? y . }
Query Processing
Melanie Griffith
Philadelpha
Philadelpha (film)
Philadelpha 76ers
Ambiguity
Existing Solutions
Who was married to an actor that play in Philadelphia?
Translate NL Question to structured queriesSELECT ? yWHERE {? x s t a r r i n g P h i l a d e l p h i a ( f i l m ) .? x t y p e a c t o r .? x s p o u s e ? y . }
Query Processing
Melanie Griffith
playForTeam
starring
director
Ambiguity
Our Method: Data Driven [Zou et al., SIGMOD 14]
actor film city
Antonio Banderas
Philadelphia (film)
Philadelphia
Melanie Griffith
Jonathan Demme
Aaron McKie
James Anderson
Philadelphia 76ers
Basketball team
Constantin Stanislavski
An Actor Prepares
Book
typetype
type
starringspouse
director
bornIn
PlayForTeam type
create type
Our Method: Data Driven [Zou et al., SIGMOD 14]
actor film city
Antonio Banderas
Philadelphia (film)
Philadelphia
Melanie Griffith
Jonathan Demme
Aaron McKie
James Anderson
Philadelphia 76ers
Basketball team
Constantin Stanislavski
An Actor Prepares
Book
typetype
type
starringspouse
director
bornIn
PlayForTeam type
create type
Who was married to an actor
that play in Philadelphia ?
“Who”
“actor”/”that”
“Philadelphia”
“be married to” “play in”
Our Method: Data Driven [Zou et al., SIGMOD 14]
actor film city
Antonio Banderas
Philadelphia (film)
Philadelphia
Melanie Griffith
Jonathan Demme
Aaron McKie
James Anderson
Philadelphia 76ers
Basketball team
Constantin Stanislavski
An Actor Prepares
Book
typetype
type
starringspouse
director
bornIn
PlayForTeam type
create type
Who was married to an actor
that play in Philadelphia ?
“Who”
?Who
“actor”/”that”
(actor, 1.0)
(An Actor Prepares, 0.9)
“Philadelphia”
(Philadelphia, 1.0)
(Philadelphia (film), 0.9)
(Philadelphia 76ers, 0.8)
“be married to” “play in”
(spouse, 1.0)(playForTeam, 1.0)
(starring, 0.9)
(director, 1.0)
Our Method: Data Driven [Zou et al., SIGMOD 14]
actor film city
Antonio Banderas
Philadelphia (film)
Philadelphia
Melanie Griffith
Jonathan Demme
Aaron McKie
James Anderson
Philadelphia 76ers
Basketball team
Constantin Stanislavski
An Actor Prepares
Book
typetype
type
starringspouse
director
bornIn
PlayForTeam type
create type
Who was married to an actor
that play in Philadelphia ?
“Who”
?Who
“actor”/”that”
(actor, 1.0)
(An Actor Prepares, 0.9)
“Philadelphia”
(Philadelphia, 1.0)
(Philadelphia (film), 0.9)
(Philadelphia 76ers, 0.8)
“be married to” “play in”
(spouse, 1.0)(playForTeam, 1.0)
(starring, 0.9)
(director, 1.0)
Combine Disambiguation
and Query Together !
Our Method: Framework
Who actor
?who <spouse, 1.0>
<actor, 1.0>
1
2
3
2
1 23
5
3
1
4 6
(a) Natural Language Question
(b) Semantic Relations Extraction and
Building Semantic Query Graph
An_Actor_Prepares
7
8
9
10
5
Our Method: Offline
I TASK: Building a paraphrase dictionary to map relation phrases topredicates in RDF graph.
I METHOD: Data Driven Approach
Our Method: Offline
I INPUT: Relation Phrases and Supporting Entity Pairs.
I OUTPUT: Possible Mapping Predicates / Predicate Paths
Relation Phrase Supporting Entity Pairs“play in” (〈 Antonio Banderas 〉, 〈 Philadelphia(film) 〉),
(〈 Julia Roberts 〉, 〈 Runaway Bride 〉),......
“uncle of” (〈 Ted Kennedy 〉, 〈 John F. Kennedy, Jr. 〉)(〈 Peter Corr 〉, 〈 Jim Corr 〉),......
Joseph P. Kennedy,. Sr.
Ted Kennedy John F. Kennedy
Male
John F. Kennedy. Jr.
hasChild hasChild
hasGenderhasChild
hasGender
Joseph P. Kennedy,. Sr.
Ted Kennedy John F. Kennedy
Male
John F. Kennedy. Jr.
hasChild hasChild
hasGenderhasChild
hasGender
hasChild hasChild
hasChild
hasGender hasGender
Which Path is Better ?
Our Method: Offline
I INPUT: Relation Phrases and Supporting Entity Pairs.
I OUTPUT: Possible Mapping Predicates / Predicate Paths
Relation Phrase Supporting Entity Pairs“play in” (〈 Antonio Banderas 〉, 〈 Philadelphia(film) 〉) ,
(〈 Julia Roberts 〉, 〈 Runaway Bride 〉),......
“uncle of” (〈 Ted Kennedy 〉, 〈 John F. Kennedy, Jr. 〉)(〈 Peter Corr 〉, 〈 Jim Corr 〉),......
Antono Banderas
Philadelphia (film)
Julia Roberts
Runaway Bride
starring starring
Joseph P. Kennedy,. Sr.
Ted Kennedy John F. Kennedy
Male
John F. Kennedy. Jr.
hasChild hasChild
hasGenderhasChild
hasGender
Joseph P. Kennedy,. Sr.
Ted Kennedy John F. Kennedy
Male
John F. Kennedy. Jr.
hasChild hasChild
hasGenderhasChild
hasGender
hasChild hasChild
hasChild
hasGender hasGender
Which Path is Better ?
Our Method: Offline
I INPUT: Relation Phrases and Supporting Entity Pairs.
I OUTPUT: Possible Mapping Predicates / Predicate Paths
Relation Phrase Supporting Entity Pairs“play in” (〈 Antonio Banderas 〉, 〈 Philadelphia(film) 〉) ,
(〈 Julia Roberts 〉, 〈 Runaway Bride 〉),......
“uncle of” (〈 Ted Kennedy 〉, 〈 John F. Kennedy, Jr. 〉)(〈 Peter Corr 〉, 〈 Jim Corr 〉),......
Joseph P. Kennedy,. Sr.
Ted Kennedy John F. Kennedy
Male
John F. Kennedy. Jr.
hasChild hasChild
hasGenderhasChild
hasGender
Joseph P. Kennedy,. Sr.
Ted Kennedy John F. Kennedy
Male
John F. Kennedy. Jr.
hasChild hasChild
hasGenderhasChild
hasGender
hasChild hasChild
hasChild
hasGender hasGender
Which Path is Better ?
Our Method: Offline
I INPUT: Relation Phrases and Supporting Entity Pairs.
I OUTPUT: Possible Mapping Predicates / Predicate Paths
Relation Phrase Supporting Entity Pairs“play in” (〈 Antonio Banderas 〉, 〈 Philadelphia(film) 〉) ,
(〈 Julia Roberts 〉, 〈 Runaway Bride 〉),......
“uncle of” (〈 Ted Kennedy 〉, 〈 John F. Kennedy, Jr. 〉)(〈 Peter Corr 〉, 〈 Jim Corr 〉),......
Joseph P. Kennedy,. Sr.
Ted Kennedy John F. Kennedy
Male
John F. Kennedy. Jr.
hasChild hasChild
hasGenderhasChild
hasGender
Joseph P. Kennedy,. Sr.
Ted Kennedy John F. Kennedy
Male
John F. Kennedy. Jr.
hasChild hasChild
hasGenderhasChild
hasGender
hasChild hasChild
hasChild
hasGender hasGender
Which Path is Better ?
Our Method: Offline
I INPUT: Relation Phrases and Supporting Entity Pairs.
I OUTPUT: Possible Mapping Predicates / Predicate Paths
Relation Phrase Supporting Entity Pairs“play in” (〈 Antonio Banderas 〉, 〈 Philadelphia(film) 〉) ,
(〈 Julia Roberts 〉, 〈 Runaway Bride 〉),......
“uncle of” (〈 Ted Kennedy 〉, 〈 John F. Kennedy, Jr. 〉)(〈 Peter Corr 〉, 〈 Jim Corr 〉),......
Joseph P. Kennedy,. Sr.
Ted Kennedy John F. Kennedy
Male
John F. Kennedy. Jr.
hasChild hasChild
hasGenderhasChild
hasGender
Joseph P. Kennedy,. Sr.
Ted Kennedy John F. Kennedy
Male
John F. Kennedy. Jr.
hasChild hasChild
hasGenderhasChild
hasGender
hasChild hasChild
hasChild
hasGender hasGender
Which Path is Better ?
Our Method: Offline
relation phrase: reli
“uncle of”
virtual document
hasChild hasChild
hasChild
hasGender hasGender
TF-IDF
Our Method: Offline
relation phrase: reli
“uncle of”
virtual document
hasChild hasChild
hasChild
hasGender hasGender
“co-author with” virtual documentwrite write
hasGender hasGender
TF-IDF
Our Method: Offline
relation phrase: reli
“uncle of”
virtual document
hasChild hasChild
hasChild
hasGender hasGender
“co-author with” virtual documentwrite write
hasGender hasGender
TF-IDF
Our Method: OfflineI INPUT: Relation Phrases and Supporting Entity Pairs.I OUTPUT: Possible Mapping Predicates / Predicate Paths
DefinitionGiven a predicate path L , the tf-value of L in PS(reli ) is defined as follows:
tf (L,PS(reli )) = |{Path(v ji , v′ji )|L ∈ Path(v j
i , v′ji )}|
The idf-value of L over the whole relation phrase dictionary T = {rel1, ..., reln}is defined as follows:
idf (L,T ) = log|T |
|{reli ∈ T |L ∈ PS(reli )}|+ 1
The tf-idf value of L is defined as follows:
tf−idf(L,PS(reli ),T ) = tf (L,PS(reli ))× idf (L,T )
We define the confidence probability of mapping relation phrase rel topredicate or predicate path L as follows.
δ(rel , L) = tf−idf(L,PS(reli ),T ) (1)
Online: Framework
Question Understanding
1.1 Building
Dependency Tree
1.2 Finding Relation
Embeddings
1.3 Finding Relation
Embeddings
Query Evaluation
2.1 Phrase Mapping2.2 Top-k Sub-
graph Match
Online: Step 1.1 Who was married to an actor
that play in Philadelphia ?
“married”
“who” “was” “to”
“actor”
“an” “play”
“that” “in”
“Philadelphia”
nsubjpass auxpass prep
pobj
det rcmod
nsubj prep
pobj
Online: Step 1.2 Who was married to an actor
that play in Philadelphia ?
“married”
“who” “was” “to”
“actor”
“an” “play”
“that” “in”
“Philadelphia”
nsubjpass auxpass prep
pobj
det rcmod
nsubj prep
pobj
“(be) married to”
Online: Step 1.3 Who was married to an actor
that play in Philadelphia ?
“married”
“who” “was” “to”
“actor”
“an” “play”
“that” “in”
“Philadelphia”
nsubjpass auxpass prep
pobj
det rcmod
nsubj prep
pobj
“(be) married to”
(“(be)marriedto′′, “who′′, “actor′′)
Online: Step 1.3 Who was married to an actor
that play in Philadelphia ?
“married”
“who” “was” “to”
“actor”
“an” “play”
“that” “in”
“Philadelphia”
nsubjpass auxpass prep
pobj
det rcmod
nsubj prep
pobj
“(be) married to”
(“(be) married to”, “who”, “actor”)
(“play in”, “that”, “Philadelphia”)
Online: Step 1.4
Who was married to an actor
that play in Philadelphia ?
(“(be) married to”, “who”, “actor”)
(“play in”, “that”, “Philadelphia”)
Online: Step 1.4
Who was married to an actor
that play in Philadelphia ?
(“(be) married to”, “who”, “actor”)
(“play in”, “that”, “Philadelphia”)
“Who”
“actor” “that”
“Philadelphia”
“(be) married to” “play in”
Online: Step 1.4
Who was married to an actor
that play in Philadelphia ?
coreference resolution
(“(be) married to”, “who”, “actor”)
(“play in”, “that”, “Philadelphia”)
“Who”
“actor” “that”
“Philadelphia”
“(be) married to” “play in”
Online: Step 1.4
Who was married to an actor
that play in Philadelphia ?
(“(be) married to”, “who”, “actor”)
(“play in”, “that”, “Philadelphia”)
“Who”
“actor”/“that”
“Philadelphia”
“(be) married to” “play in”
Online: Step 2.1 Mapping Edge
“Who”
“actor”/”that”
“Philadelphia”
“be married to” “play in”
(spouse, 1.0)
(playForTeam, 1.0)
(starring, 0.9)
(director, 1.0)
?Who (actor, 1.0)
(An Actor Prepares, 0.9)
(Philadelphia, 1.0)
(Philadelphia (film), 0.9)
(Philadelphia 76ers, 0.8)
Online: Step 2.1 Mapping Edge
“Who”
“actor”/”that”
“Philadelphia”
“be married to” “play in”
(spouse, 1.0)
(playForTeam, 1.0)
(starring, 0.9)
(director, 1.0)
?Who (actor, 1.0)
(An Actor Prepares, 0.9)
(Philadelphia, 1.0)
(Philadelphia (film), 0.9)
(Philadelphia 76ers, 0.8)
Online: Step 2.1 Mapping Edge
“Who”
“actor”/”that”
“Philadelphia”
“be married to” “play in”
(spouse, 1.0)
(playForTeam, 1.0)
(starring, 0.9)
(director, 1.0)
?Who (actor, 1.0)
(An Actor Prepares, 0.9)
(Philadelphia, 1.0)
(Philadelphia (film), 0.9)
(Philadelphia 76ers, 0.8)
Online: Step 2.2 Mapping Vertices
“Who”
“actor”/”that”
“Philadelphia”
“be married to” “play in”
(spouse, 1.0)
(playForTeam, 1.0)
(starring, 0.9)
(director, 1.0)
?Who
(actor, 1.0)
(An Actor Prepares, 0.9)
(Philadelphia, 1.0)
(Philadelphia (film), 0.9)
(Philadelphia 76ers, 0.8)
Online: Step 2.2 Mapping Vertices
“Who”
“actor”/”that”
“Philadelphia”
“be married to” “play in”
(spouse, 1.0)
(playForTeam, 1.0)
(starring, 0.9)
(director, 1.0)
?Who (actor, 1.0)
(An Actor Prepares, 0.9)
(Philadelphia, 1.0)
(Philadelphia (film), 0.9)
(Philadelphia 76ers, 0.8)
Online: Step 2.2 Mapping Vertices
“Who”
“actor”/”that”
“Philadelphia”
“be married to” “play in”
(spouse, 1.0)
(playForTeam, 1.0)
(starring, 0.9)
(director, 1.0)
?Who (actor, 1.0)
(An Actor Prepares, 0.9)
(Philadelphia, 1.0)
(Philadelphia (film), 0.9)
(Philadelphia 76ers, 0.8)
Online: Step 2.3 Finding Top-k Matches
actor film city
Antonio Banderas
Philadelphia (film)
Philadelphia
Melanie Griffith
Jonathan Demme
Aaron McKie
James Anderson
Philadelphia 76ers
Basketball team
Constantin Stanislavski
An Actor Prepares
Book
typetype
type
starringspouse
director
bornIn
PlayForTeam type
create type
“Who”
?Who
(Melanie Griffith)
“actor”/”that”
(actor, 1.0)
(An Actor Prepares, 0.9)
“Philadelphia”
(Philadelphia, 1.0)
(Philadelphia (film), 0.9)
(Philadelphia 76ers, 0.8)
“be married to” “play in”
(spouse, 1.0)
(playForTeam, 1.0)
(starring, 0.9)
(director, 1.0)
Score(M) =log(
∏vi∈V (QS ) δ(argi , ui )×
∏vi vj∈E(QS ) δ(relvi vj ,Pij ))
=∑
vi∈V (QS ) log(δ(argi , ui )) +∑
vi vj∈E(QS ) log(δ(relvi vj ,Pij ))
(2)Finding Top-k Matches
Combine Disambiguation
and Query Together !
Online: Step 2.3 Finding Top-k Matches
actor film city
Antonio Banderas
Philadelphia (film)
Philadelphia
Melanie Griffith
Jonathan Demme
Aaron McKie
James Anderson
Philadelphia 76ers
Basketball team
Constantin Stanislavski
An Actor Prepares
Book
typetype
type
starringspouse
director
bornIn
PlayForTeam type
create type
“Who”
?Who
(Melanie Griffith)
“actor”/”that”
(actor, 1.0)
(An Actor Prepares, 0.9)
“Philadelphia”
(Philadelphia, 1.0)
(Philadelphia (film), 0.9)
(Philadelphia 76ers, 0.8)
“be married to” “play in”
(spouse, 1.0)
(playForTeam, 1.0)
(starring, 0.9)
(director, 1.0)
Score(M) =log(
∏vi∈V (QS ) δ(argi , ui )×
∏vi vj∈E(QS ) δ(relvi vj ,Pij ))
=∑
vi∈V (QS ) log(δ(argi , ui )) +∑
vi vj∈E(QS ) log(δ(relvi vj ,Pij ))
(2)
Finding Top-k Matches
Combine Disambiguation
and Query Together !
Online: Step 2.3 Finding Top-k Matches
actor film city
Antonio Banderas
Philadelphia (film)
Philadelphia
Melanie Griffith
Jonathan Demme
Aaron McKie
James Anderson
Philadelphia 76ers
Basketball team
Constantin Stanislavski
An Actor Prepares
Book
typetype
type
starringspouse
director
bornIn
PlayForTeam type
create type
“Who”
?Who
(Melanie Griffith)
“actor”/”that”
(actor, 1.0)
(An Actor Prepares, 0.9)
“Philadelphia”
(Philadelphia, 1.0)
(Philadelphia (film), 0.9)
(Philadelphia 76ers, 0.8)
“be married to” “play in”
(spouse, 1.0)
(playForTeam, 1.0)
(starring, 0.9)
(director, 1.0)
Score(M) =log(
∏vi∈V (QS ) δ(argi , ui )×
∏vi vj∈E(QS ) δ(relvi vj ,Pij ))
=∑
vi∈V (QS ) log(δ(argi , ui )) +∑
vi vj∈E(QS ) log(δ(relvi vj ,Pij ))
(2)Finding Top-k Matches
Combine Disambiguation
and Query Together !
Online: Step 2.3 Finding Top-k Matches
actor film city
Antonio Banderas
Philadelphia (film)
Philadelphia
Melanie Griffith
Jonathan Demme
Aaron McKie
James Anderson
Philadelphia 76ers
Basketball team
Constantin Stanislavski
An Actor Prepares
Book
typetype
type
starringspouse
director
bornIn
PlayForTeam type
create type
“Who”
?Who
(Melanie Griffith)
“actor”/”that”
(actor, 1.0)
(An Actor Prepares, 0.9)
“Philadelphia”
(Philadelphia, 1.0)
(Philadelphia (film), 0.9)
(Philadelphia 76ers, 0.8)
“be married to” “play in”
(spouse, 1.0)
(playForTeam, 1.0)
(starring, 0.9)
(director, 1.0)
Score(M) =log(
∏vi∈V (QS ) δ(argi , ui )×
∏vi vj∈E(QS ) δ(relvi vj ,Pij ))
=∑
vi∈V (QS ) log(δ(argi , ui )) +∑
vi vj∈E(QS ) log(δ(relvi vj ,Pij ))
(2)Finding Top-k Matches
Combine Disambiguation
and Query Together !
Online: Step 2.3 Finding Top-k Matches
TheoremFinding Top-k subgraph matches of QS over RDF graph G is anNP-hard problem.
Require: Input: A semantic query graph QS and a RDF G . Output: Top-k SPARQL Queries, i.e., the top-k
matches from QS to G .1: for each candidate list Lri , i = 1, ......, |E(QS )| do2: Sorting all candidate relations in Lri in a non-ascending order
3: for each candidate list Largj , j = 1, ......, |V (QS )| do
4: Sorting all candidate entities/classes (i.e., vertices in G ′) in Largj in a non-ascending order.
5: Set cursor ci to the head of Lri and cursor cj to the head of Largj , respectively.
6: Set the upper bound Upbound(Q) and the threshold θ = −∞7: while true do8: for each cursor cj in list Largj , j = 1, ......, |V (QS )| do
9: Performance a exploration based subgraph isomorphism algorithm from cursor cj , such as VF2, to
find subgraph matches (of QS over G), which contains cj .
10: Update the threshold θ to be the top-k match sore so far.
11: Move all cursors ci and cj by one step forward in each list.
12: Update the upper bound Upbound(Q) according to Equation ??.
13: if θ ≥ Upbound(Q) then
14: Break // TA-style stopping strategy
Online: Step 2.3 Finding Top-k Matches
TheoremFinding Top-k subgraph matches of QS over RDF graph G is anNP-hard problem.Require: Input: A semantic query graph QS and a RDF G . Output: Top-k SPARQL Queries, i.e., the top-k
matches from QS to G .1: for each candidate list Lri , i = 1, ......, |E(QS )| do2: Sorting all candidate relations in Lri in a non-ascending order
3: for each candidate list Largj , j = 1, ......, |V (QS )| do
4: Sorting all candidate entities/classes (i.e., vertices in G ′) in Largj in a non-ascending order.
5: Set cursor ci to the head of Lri and cursor cj to the head of Largj , respectively.
6: Set the upper bound Upbound(Q) and the threshold θ = −∞7: while true do8: for each cursor cj in list Largj , j = 1, ......, |V (QS )| do
9: Performance a exploration based subgraph isomorphism algorithm from cursor cj , such as VF2, to
find subgraph matches (of QS over G), which contains cj .
10: Update the threshold θ to be the top-k match sore so far.
11: Move all cursors ci and cj by one step forward in each list.
12: Update the upper bound Upbound(Q) according to Equation ??.
13: if θ ≥ Upbound(Q) then
14: Break // TA-style stopping strategy
Online: Step 2.3 Finding Top-k Matches
TheoremFinding Top-k subgraph matches of QS over RDF graph G is anNP-hard problem.Require: Input: A semantic query graph QS and a RDF G . Output: Top-k SPARQL Queries, i.e., the top-k
matches from QS to G .1: for each candidate list Lri , i = 1, ......, |E(QS )| do2: Sorting all candidate relations in Lri in a non-ascending order
3: for each candidate list Largj , j = 1, ......, |V (QS )| do
4: Sorting all candidate entities/classes (i.e., vertices in G ′) in Largj in a non-ascending order.
5: Set cursor ci to the head of Lri and cursor cj to the head of Largj , respectively.
6: Set the upper bound Upbound(Q) and the threshold θ = −∞7: while true do8: for each cursor cj in list Largj , j = 1, ......, |V (QS )| do
9: Performance a exploration based subgraph isomorphism algorithm from cursor cj , such as VF2, to
find subgraph matches (of QS over G), which contains cj .
10: Update the threshold θ to be the top-k match sore so far.
11: Move all cursors ci and cj by one step forward in each list.
12: Update the upper bound Upbound(Q) according to Equation ??.
13: if θ ≥ Upbound(Q) then
14: Break // TA-style stopping strategy
TA-style Top-k Algorithm !
Experiments: Datasets
I RDF repository: DBPedia
Table : Statistics of RDF GraphDBpedia
Number of Entities 5.2 million
Number of Triples 60 million
Number of Predicates 1643
Size of RDF Graphs (in GB) 6.1
I Relation Phrase Dictionary: Patty
Table : Statistics of Relation Phrase Dataset
wordnet-wikipedia freebase-wikipediaNumber of Textual Patterns 350,568 1,631,530
Number of Entity Pairs 3,862,304 15,802,947
Average Entity Pair 11 9Number For Each Pattern
Experiments: Online
Benchmark: QALD-3, 99 Natural Language Questions
Table : Evaluating QALD-3 Testing Questions (on DBpedia)
Processed Right Partially Recall Precision F-1OurMethod
76 32 11 0.40 0.40 0.40
squall2sparql 96 77 13 0.85 0.89 0.87CASIA 52 29 8 0.36 0.35 0.36Scalewelis 70 1 38 0.33 0.33 0.33RTV 55 30 4 0.34 0.32 0.33Intui2 99 28 4 0.32 0.32 0.32SWIP 21 14 2 0.15 0.16 0.16DEANNA 27 21 0 0.21 0.21 0.21
Experiments: OnlineID Questions Response Time (in ms)
Q2 Who was the successor of John F. Kennedy? 1699Q3 Who is the mayor of Berlin? 677Q14 Give me all members of Prodigy? 811Q17 Give me all cars that are produced in Germany ? 297Q19 Give me all people that were born in Vienna and died in Berlin ? 2557Q20 How tall is Michael Jordan ? 942Q21 What is the capital of Canada ? 1342Q22 Who is the governor of Wyoming ? 796Q24 Who was the father of Queen Elizabeth II? 538Q27 Sean Parnell is the governor of which U.S. state ? 1210Q28 Give me all movies directed by Francis Ford Coppola. 577Q30 What is the birth name of Angela Merkel ? 250Q35 Who developed Minecraft ?. 2565Q39 Give me all companies in Munich. 1312Q41 Who founded Intel? 1105Q42 Who is the husband of Amanda Palmer ? 1418Q44 Which cities does the Weser flow through ? 1139Q45 Which countries are connected by the Rhine ? 736Q54 What are the nicknames of San Francisco ? 321Q58 What is the time zone of Salt Lake City ? 316Q63 Give me all Argentine films. 427Q70 Is Michelle Obama the wife of Barack Obama ? 316Q74 When did Michael Jackson die ? 258Q76 List the children of Margaret Thatcher. 1139Q77 Who was called Scarface? 719Q81 Which books by Kerouac were published by Viking Press ? 796Q83 How high is the Mount Everest ? 635Q84 Who created the comic Captain America ? 589Q86 What is the largest city in Australia ? 1419Q89 In which city was the former Dutch queen Juliana buried ? 1700Q98 Which country does the creator of Miffy come from ? 2121Q100 Who produces Orangina ? 367
Experiments: Online
10
100
1000
10000
100000
Q2 Q20 Q21 Q22 Q28 Q35 Q41 Q42 Q44 Q45 Q54 Q74 Q76 Q83 Q84 Q86
Question Understanding in DEANNA Question Understanding in Our System
Overall time in DEANNA Overall time in Our System
Runnin
gT
ime
(in m
s)
Figure : Online Running Time Comparison
Experiments: Online
Figure : QALD-4 Results
How to Build Templates for RDF Q/A— An Uncertain Graph Similarity Join Approach[Zheng et al., SIGMOD 15]
nsubj
det
Which physicist graduated from Princeton University? Which <___> graduated from <___>?
Which
physicist
graduated
from
Princeton University
pobj
prep nsubj
det
Which
<___>
graduated
from
<___>
pobj
prep
syntactic dependency tree
which <___> graduated from <___>?SELECT ?person WHERE
{ ?person type ____
?person graduatedFrom ____
}
Template
EVI (TrueKnowledge)
I Manually defined templates
I About 1200 templates in 2010[Tunstall-Pedoe, AI Magazine 2010]
CAN we generate templatesAUTOMATICALLY ?
How to Build Templates for RDF Q/A— An Uncertain Graph Similarity Join Approach[Zheng et al., SIGMOD 15]
nsubj
det
Which physicist graduated from Princeton University? Which <___> graduated from <___>?
Which
physicist
graduated
from
Princeton University
pobj
prep nsubj
det
Which
<___>
graduated
from
<___>
pobj
prep
syntactic dependency tree
which <___> graduated from <___>?SELECT ?person WHERE
{ ?person type ____
?person graduatedFrom ____
}
Template
EVI (TrueKnowledge)
I Manually defined templates
I About 1200 templates in 2010[Tunstall-Pedoe, AI Magazine 2010]
CAN we generate templatesAUTOMATICALLY ?
How to Build Templates for RDF Q/A— An Uncertain Graph Similarity Join Approach[Zheng et al., SIGMOD 15]
nsubj
det
Which physicist graduated from Princeton University? Which <___> graduated from <___>?
Which
physicist
graduated
from
Princeton University
pobj
prep nsubj
det
Which
<___>
graduated
from
<___>
pobj
prep
syntactic dependency tree
which <___> graduated from <___>?SELECT ?person WHERE
{ ?person type ____
?person graduatedFrom ____
}
Template
EVI (TrueKnowledge)
I Manually defined templates
I About 1200 templates in 2010[Tunstall-Pedoe, AI Magazine 2010]
CAN we generate templatesAUTOMATICALLY ?
What do we have at hand ?
I Natural language question workload, such as Yahoo WebQuestions.I SPARQL workload, such as DBpeida SPARQL workload.
what is the name of justin bieber brother?
Who is the daughter of Ingrid Bergman married to?
Where did saki live?
Which river does the Tower Bridge cross?
who does joakim noah play for?
In which city did John F. Kennedy die?
where are the nfl redskins from?
In which country does the Yangtze River start?
how old is sacha baron cohen?
Give me the birthdays of all actors of the television show Charmed.
who did draco malloy end up marrying?
which countries border the us?
where is rome italy located on a map?
NLQ Workload1. SELECT DISTINCT ?uri
WHERE {
res:Albert_Einstein dbo:deathPlace ?uri .
?uri rdf:type dbo:City .
}
2. SELECT DISTINCT ?uri
WHERE {
res:Bill_Clinton dbo:child ?child .
?child dbo:spouse ?uri .
}
3. PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX res: <http://dbpedia.org/resource/>
SELECT DISTINCT ?uri
WHERE {
res:Brooklyn_Bridge dbo:crosses ?uri .
}
4. SELECT DISTINCT ?date
WHERE {
res:The_Truman_Show dbo:starring ?actor .
?actor dbo:birthDate ?date .
}
5. PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX res: <http://dbpedia.org/resource/>
SELECT DISTINCT ?uri
WHERE {
res:Nile dbo:sourceCountry ?uri .
}
SPARQL Query Workload
What do we have at hand ?
I Natural language question workload, such as Yahoo WebQuestions.I SPARQL workload, such as DBpeida SPARQL workload.
what is the name of justin bieber brother?
Who is the daughter of Ingrid Bergman married to?
Where did saki live?
Which river does the Tower Bridge cross?
who does joakim noah play for?
In which city did John F. Kennedy die?
where are the nfl redskins from?
In which country does the Yangtze River start?
how old is sacha baron cohen?
Give me the birthdays of all actors of the television show Charmed.
who did draco malloy end up marrying?
which countries border the us?
where is rome italy located on a map?
NLQ Workload1. SELECT DISTINCT ?uri
WHERE {
res:Albert_Einstein dbo:deathPlace ?uri .
?uri rdf:type dbo:City .
}
2. SELECT DISTINCT ?uri
WHERE {
res:Bill_Clinton dbo:child ?child .
?child dbo:spouse ?uri .
}
3. PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX res: <http://dbpedia.org/resource/>
SELECT DISTINCT ?uri
WHERE {
res:Brooklyn_Bridge dbo:crosses ?uri .
}
4. SELECT DISTINCT ?date
WHERE {
res:The_Truman_Show dbo:starring ?actor .
?actor dbo:birthDate ?date .
}
5. PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX res: <http://dbpedia.org/resource/>
SELECT DISTINCT ?uri
WHERE {
res:Nile dbo:sourceCountry ?uri .
}
SPARQL Query Workload
Framework
Who wrote the book Scarlet and Black?
SELECT ?x WHERE
{ Scarlet_and_Black author ?x}
SPARQL Generation
Query Evaluation
SPARQL query Engine ?x: Stendhal
____________________________
Template Generation
Templates
SELECT ?x WHERE
{ ___ author ?x}Who wrote the book <___>?
Natural Language
Question Pattern SPARQL Pattern
...
...
Natural Language
Question Query Log
SPARQL Query
Log
Similar Graph Pairs
Natural Language
Question Query
SPARQL
Query
Q/A Answers
Uncertain Graphs Certain Graphs
Q/A with Templates
Templates
Template Generation
Natural Language
Question Workload
SPARQL Query
Workload
Similar Graph Pairs
Natural Language
Question Query
SPARQL
Query
Q/A Answers
Uncertain Graphs Certain Graphs
Q/A with Templates
Templates
SPARQL
Query Engine
Step 1
Step 2
Step 3
Step1: Uncertain Graph Generation
“Which acor”
“Michael Jordan”
“city”
“from” “be married to” “born in” “located in”
<United_States>,1.0
<Country>
<type>
<birthPlace>
?x, 1.0
<Actor>
<type>
<Spouse><Michael_Jordan>,0.6
<NBA_Player>
<Michael_Jordan>
<Professor>,0.3
?x,1.0
<City>
<type>
<New_York>, 0.7
<State>
<New_York_City>,0.3
<City>
<type>
<Michael_Jordan>
<Actor>,0.1
?b
?x
?a
?c
?d
<Spouse> <birthPlace> <locatedIn>
<Country>,1.0
<type>
<Actor>,1.0
<type>
<NBA_Player>,0.6
<Professor>,0.3
<Actor>,0.1<City>,1.0
<type>
<type><State>,0.7
<City>,0.3
<type>
<birthPlace>,1.0
<locatedIn>
(a) Semantic Query Graph
(b) Uncertain Graph
<birthPlace>
<type>
<type>
<type>
<type>
“USA”
Which actor from USA is married to Michael Jordan born in a city of NY?
“NY”
v10
v9
v7
v8
v1
v2
v3
v6
v4
v5
Step2: Finding Similar Graph Pairs
SPARQL Query Workload
1. Which actor from USA is married to
Michael Jordan born in a city of NY?
2. Which politician graduated from CIT?
1. SELECT ?person WHERE
{?person type Artist
?person graduatedFrom
Harvard_University}
2. SELECT ?person1 WHERE
{?person1 type Actor
?person1 birthPlace United_States
?person2 spouse ?person1
?person2 type NBA_star
?person2 birthPlace New_York_City}
Natural Language Question
(NLQ) Workload
(“NBA star”, 0.6)
“Actor”
(“Professor”, 0.3) (“Actor”, 0.1)v1 v2
v3
v7v8v9
spouse?x
“Country”
birthPlacetype
type birthPlace (“State”,0.7)
(“City”,0.3)
?c
“Artist”
“University”typeu1
u2
u4
?x
graduatedFrom
...
“Politician”
typev1
v2
v4
?xgraduatedFrom
g1
g2
q1
q2
......g1
g2
q2
q1
Uncertain Graph
Generation
(“University”,0.8)
(“Company”,0.2)
?atype
type
v4
v10type
“Actor”
“Country”
spouse
type
“City”
u1u2
u3
u5
u4
u8
?xbirthPlace
birthPlace?a
“NBA star”
type
type
type?av3
?b
typeu3 ?a
?b
?c
u6
u7
type “City”
locatedIn
v6
v5
?d
...
Step3: Template Generation
“Which politician”
“graduated from”
?x, 1.0
<Politician><type>
<California Institute of
Technology>, 0.8
<University>
<CIT Group>,0.2
<Company>
<type>
?x
CIT
<graduated
From>
<Politician>,1.0
<type><University>,0.8
<Company>,0.2
<type>
<graduatedFrom>
(a) Semantic Query
Graph
(b) Uncertain Graph
<type>
which politician graduated from CIT?
“CIT”
v1
v2
v3
v4
?x
Harvad_Univeristy
<graduatedFrom>
<Artist>
<type>
<University>
<type>
(c) SPAQRL
u1
u2
u3
u4
which <___> graduated from <___>?SELECT ?person WHERE
{?person type ____
?person graduatedFrom ____ }
(d) Template
q1g2
Problem Formulation
Definition (Finding Similar Graph Pairs)
Given a set of deterministic graphs D (derived from the SPARQLquery workload), a set of uncertain graphs U (derived from naturallanguage question workload), a graph edit distance threshold τ ,and a similarity probability threshold α ∈ (0, 1], a SimJ queryreturns the graph pairs 〈q, g〉 (q ∈ D and g ∈ U) withSimPτ (q, g) ≥ α.
Similarity Probability
SimPτ (q, g) =∑
pw(g)∈PW (g)
Pr{pw(g)|ged(q, pw(g)) ≤ τ}
Problem Formulation
Definition (Finding Similar Graph Pairs)
Given a set of deterministic graphs D (derived from the SPARQLquery workload), a set of uncertain graphs U (derived from naturallanguage question workload), a graph edit distance threshold τ ,and a similarity probability threshold α ∈ (0, 1], a SimJ queryreturns the graph pairs 〈q, g〉 (q ∈ D and g ∈ U) withSimPτ (q, g) ≥ α.
Similarity Probability
SimPτ (q, g) =∑
pw(g)∈PW (g)
Pr{pw(g)|ged(q, pw(g)) ≤ τ}
Existing GED Lower Bounds
I Global Filters
� vertex/edge counting [Zeng et al., 2009]:ged(q, g) ≥ ||V (g)| − |V (q)||+ ||E(g)| − |E(q)||
� label-based lower bounds [Zhao et al., 2012]:ged(q, g) ≥ Γ(MV (q),MV (g)) + Γ(ME (q),ME (g))
I n-gram based Lower Bounds� path-based fliter [Zhao et al., 2012]
distP(q, g) = max(|MG(q)| − τ · Dp , |MG(g)| − τ · Dp(g))
� tree-based fliter [Wang et al., 2012]distT (q, g) = max(|V (q)| − τ · Dt , |V (g)| − τ · Dt(g))
� star-based fliter [Zeng et al., 2009]
distS (q, g) =µ(q, g)
max{4, [max{ δ(q), δ(g)}] + 1}
I Partition-based Lower Bound
� Pars [Zhao et al., 2013]� Disjoint-Partition [Zheng et al., 2015]
Challenges
Challenges
Existing lower bounds are designed for deterministic graphs
Two Possible Variants
I ignoring all the vertex/edge labels
I enumerating all possible worlds
Common Structural Subgraph (CSS)
pw1(g1)
S
bt
sNS v2
?xA Ctb
v1
v8v7 v9
v3
v4
?c t
v10t
?b
bt
s
NS u2 Ci
ACt b
u1
u5u4 u7
u3
q2
?x
?a
u6t
u8t
?c
?b
?a v5
v6
lt
S?d
CSS-based Lower Bound
The relationship between GED and CSS[Brun et al., 2012]:
ged(q, g c)
= min∀s{(|V (q)|+ |E(q)|)− (|V (s)|+ |E(s)|) (part 1)
+(|V (g c)|+ |E(g c)|)− (|V (s)|+ |E(s)|) (part 2)
+(|Vs |+ |Es |)} (part 3)
= |V (q)|+ |E(q)|+ |V (g c)|+ |E(g c)| − F (q, g c) (3)
where Vs and Es are the subsets of vertices and edges in CSS, whose labelsneed to be substituted, and we have
F (q, g c) = max∀s{2 · (|V (s)|+ |E(s)|)− (|Vs |+ |Es |)}. (4)
ged(q, g) ≥ |V |+ |E | − λE (q, g) +dif (q, g)
2− λV (q, g), (5)
CSS-based Similarity Probability Upper Bound
SimPτ (q, g) =∑
pw(g)∈PW (g)
Pr{pw(g)|ged(q, pw(g)) ≤ τ}
Let C(q, g) be |V |+ |E | − λE (q, g) +dif (q, g)
2(constant ).
SimPτ (q, g) ≤∑
pw(g)∈PW (g)
Pr{pw(g)|λV (q, pw(g)) ≥ C(q, g)− τ}
= Pr{λV (q, g) ≥ C(q, g)− τ}.
CSS-based Similarity Probability Upper Bound
m independent variables y1, y2, . . . , and ymdenote (y1 + y2 + . . .+ ym) as a single random variable Yby applying the Markov’s inequality
SimPτ (q, g) ≤ ub SimPτ (q, g) =E(Y )
C(q, g)− τ
where E(Y ) =∑m
i=1 E(yi ) and E(yi ) = Σlj∈l(vi )Pr(lj |lj ∩ ΣV (q) 6= ∅)
Cost-based Query Optimization
divide all possible worlds of an uncertain graph g into disjointpossible world groups
Conclusions
I Graph Database is a Possible Way for RDF Knowledge BaseManagement.
I Subgraph Matching is a Strong Tool.
I Using RDF repository, how to Provide Knowledge Services forApplications and Common Users?
Conclusions
I Graph Database is a Possible Way for RDF Knowledge BaseManagement.
I Subgraph Matching is a Strong Tool.
I Using RDF repository, how to Provide Knowledge Services forApplications and Common Users?
Conclusions
I Graph Database is a Possible Way for RDF Knowledge BaseManagement.
I Subgraph Matching is a Strong Tool.
I Using RDF repository, how to Provide Knowledge Services forApplications and Common Users?
Thank you!
gStore gAnswer
Reference I
Brun, L., Gauzere, B., and Fourey, S. (2012).
Relationships between graph edit distance and maximal commonstructural subgraph.
In SSPR/SPR, pages 42–50.
Wang, G., Wang, B., Yang, X., and Yu, G. (2012).
Efficiently indexing large sparse graphs for similarity search.
TKDE, 24(3):440–451.
Zeng, Z., Tung, A. K. H., Wang, J., Feng, J., and Zhou, L. (2009).
Comparing stars: On approximating graph edit distance.
PVLDB, 2(1):25–36.
Zhao, X., Xiao, C., Lin, X., Liu, Q., and Zhang, W. (2013).
A partition-based approach to structure similarity search.
PVLDB, 7(3):169–180.
Zhao, X., Xiao, C., Lin, X., and Wang, W. (2012).
Efficient graph similarity joins with edit distance constraints.
In ICDE.
Reference II
Zheng, W., Zou, L., Lian, X., Wang, D., and Zhao, D. (2015).
efficient graph similairty search over large graph databases.
TKDE, 24(3):440–451.