natural language question answering over knowledge graph a...

98
Natural Language Question Answering Over Knowledge Graph—A Data-Driven Approach Lei Zou Peking University Institute of Computer Science and Technology Joint work with Jeffery Xu Yu(CUHK), Haixun Wang (Google), Tamer Ozsu (UW), Lei Chen (HKUST), Ruizhe Huang, Shuo Han, Youhuan Li, Dongyan Zhao (PKU)

Upload: others

Post on 20-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Natural Language Question Answering OverKnowledge Graph—A Data-Driven Approach

Lei Zou

Peking UniversityInstitute of Computer Science and Technology

Joint work with Jeffery Xu Yu(CUHK), Haixun Wang (Google),Tamer Ozsu (UW), Lei Chen (HKUST), Ruizhe Huang, Shuo

Han, Youhuan Li, Dongyan Zhao (PKU)

Page 2: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

RDF and Semantic Web

I RDF is a language for the conceptual modeling of informationabout web resources

I A building block of semantic webI Facilitates exchange of informationI Search engines can retrieve more relevant informationI Facilitates data integration (mashes)

I Machine understandableI Understand the information on the web and the

interrelationships among them

Page 3: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Outline

RDF Introduction

gAnswer: Natural Language Question Answering over RDFA Data Driven Approach [Zou et al., SIGMOD 2014; Zheng etal., SIGMOD 2015]

Page 4: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Outline

RDF Introduction

gAnswer: Natural Language Question Answering over RDFA Data Driven Approach [Zou et al., SIGMOD 2014; Zheng etal., SIGMOD 2015]

Page 5: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

RDF Introduction

I Everything is an uniquely namedresource

I Namespaces can be used to scopethe names

I Properties of resources can bedefined

I Relationships with other resourcescan be defined

I Resources can be contributed bydifferent people/groups and can belocated anywhere in the web

I Integrated web “database”

http://en.wikipedia.org/wiki/Abraham Lincoln

xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln

Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”

y:Washington DC

Abraham Lincoln:DiedIn

Page 6: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

RDF Introduction

I Everything is an uniquely namedresource

I Namespaces can be used to scopethe names

I Properties of resources can bedefined

I Relationships with other resourcescan be defined

I Resources can be contributed bydifferent people/groups and can belocated anywhere in the web

I Integrated web “database”

http://en.wikipedia.org/wiki/Abraham Lincoln

xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln

Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”

y:Washington DC

Abraham Lincoln:DiedIn

Page 7: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

RDF Introduction

I Everything is an uniquely namedresource

I Namespaces can be used to scopethe names

I Properties of resources can bedefined

I Relationships with other resourcescan be defined

I Resources can be contributed bydifferent people/groups and can belocated anywhere in the web

I Integrated web “database”

http://en.wikipedia.org/wiki/Abraham Lincoln

xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln

Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”

y:Washington DC

Abraham Lincoln:DiedIn

Page 8: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

RDF Introduction

I Everything is an uniquely namedresource

I Namespaces can be used to scopethe names

I Properties of resources can bedefined

I Relationships with other resourcescan be defined

I Resources can be contributed bydifferent people/groups and can belocated anywhere in the web

I Integrated web “database”

http://en.wikipedia.org/wiki/Abraham Lincoln

xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln

Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”

y:Washington DC

Abraham Lincoln:DiedIn

Page 9: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

RDF Introduction

I Everything is an uniquely namedresource

I Namespaces can be used to scopethe names

I Properties of resources can bedefined

I Relationships with other resourcescan be defined

I Resources can be contributed bydifferent people/groups and can belocated anywhere in the web

I Integrated web “database”

http://en.wikipedia.org/wiki/Abraham Lincoln

xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln

Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”

y:Washington DC

Abraham Lincoln:DiedIn

Page 10: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

RDF Data ModelI Triple: Subject, Predicate (Property),

Object (s, p, o)

Subject: the entity that is described(URI or blank node)

Predicate: a feature of the entity (URI)Object: value of the feature (URI,

blank node or literal)

I (s, p, o) ∈ (U ∪ B)× U × (U ∪ B ∪ L)

I Set of RDF triples is called an RDF graph

U

Subject Object

U B U B L

U: set of URIsB: set of blank nodesL: set of literals

Predicate

Subject Predicate ObjectAbraham Lincoln hasName “Abraham Lincoln”Abraham Lincoln BornOnDate “1809-02-12”Abraham Lincoln DiedOnDate “1865-04-15”

Page 11: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

RDF Example InstancePrefix: y=http://en.wikipedia.org/wiki

Subject Predicate Object

y: Abraham Lincoln hasName “Abraham Lincoln”y: Abraham Lincoln BornOnDate “1809-02-12”’y: Abraham Lincoln DiedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy: Abraham Lincoln DiedIn y: Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y: Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roosevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”

URI

Literal

URI

Page 12: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

RDF Graph

y:Abraham Lincoln

“Abraham Lincoln”

hasName

“1809-02-12”bornOnDate

“1865-04-15”

diedOnDate

“President”

title

“Male”

gender

y:Washington D.C.

“1790”

foundYear

“Washington D.C.”

hasName

y:Hodgenville KY “Hodgenville”hasName

y:United States

“United States”

hasName

“1776”

foundingYear

y:Reese Witherspoon

“1976-03-22”

bornOnDate

“Female”

gender

“Actress”

title

“Reese Witherspoon”

hasName

y:New Orleans LA

“1718”

foundingYear

y:Franklin Roosevelt

“Franklin D. Roosevelt”

hasName

“Male”

gender

“President”

title

y:Hyde Park NY

“1810”

foundingYear

y:Marilyn Monroe“1962-08-05”diedOnDate

“1926-07-01”

bornOnDate

“Female”

gender

“Marilyn Monroe”hasName

diedIn

bornIn

hasCapital

bornInlocatedIn locatedIn

bornIn

Page 13: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

RDF Query ModelI Query Model - SPARQL Protocol and RDF Query LanguageI Given U (set of URIs), L (set of literals), and V (set of

variables), a SPARQL expression is defined recursively:I an atomic triple pattern, which is an element of

(U ∪ V )× (U ∪ V )× (U ∪ V ∪ L)

I ?x hasName “Abraham Lincoln”

I P FILTER R, where P is a graph pattern expression and R is abuilt-in SPARQL condition (i.e., analogous to a SQL predicate)

I ?x price ?p FILTER(?p < 30)

I P1 AND/OPT/UNION P2, where P1 and P2 are graphpattern expressions

I Example:SELECT ?nameWHERE {

?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )

}

Page 14: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

SPARQL Queries

SELECT ?nameWHERE {

?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )

}

?m ?citybornIn

?name

hasName

?bd

bornOnDate

“1718”

foundingYear

FILTER(regex(str(?bd),“1976”))

Page 15: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Naıve Triple Store Design

SELECT ?nameWHERE {

?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )

}Subject Property Objecty:Abraham Lincoln hasName “Abraham Lincoln”y:Abraham Lincoln bornOnDate “1809-02-12”y:Abraham Lincoln diedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy:Abraham Lincoln diedIn y:Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y:Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roo-

sevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”

Too many self-joins!

Page 16: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Naıve Triple Store Design

SELECT ?nameWHERE {

?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )

}Subject Property Objecty:Abraham Lincoln hasName “Abraham Lincoln”y:Abraham Lincoln bornOnDate “1809-02-12”y:Abraham Lincoln diedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy:Abraham Lincoln diedIn y:Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y:Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roo-

sevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”

SELECT T2 . o b j e c tFROM T as T1 , T as T2 , T as T3 ,

T as T4WHERE T1 . p r o p e r t y=” b o r n I n ”AND T2 . p r o p e r t y=”hasName”AND T3 . p r o p e r t y=” bornOnDate ”AND T1 . s u b j e c t=T2 . s u b j e c tAND T2 . s u b j e c t=T3 . s u b j e c tAND T4 . p r o p e t y=” f o u n d i n g Y e a r ”AND T1 . o b j e c t=T4 . s u b j e c tAND T4 . o b j e c t=” 1718 ”AND T3 . o b j e c t LIKE ’%1976% ’

Too many self-joins!

Page 17: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Naıve Triple Store Design

SELECT ?nameWHERE {

?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )

}Subject Property Objecty:Abraham Lincoln hasName “Abraham Lincoln”y:Abraham Lincoln bornOnDate “1809-02-12”y:Abraham Lincoln diedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy:Abraham Lincoln diedIn y:Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y:Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roo-

sevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”

SELECT T2 . o b j e c tFROM T as T1 , T as T2 , T as T3 ,

T as T4WHERE T1 . p r o p e r t y=” b o r n I n ”AND T2 . p r o p e r t y=”hasName”AND T3 . p r o p e r t y=” bornOnDate ”AND T1 . s u b j e c t=T2 . s u b j e c tAND T2 . s u b j e c t=T3 . s u b j e c tAND T4 . p r o p e t y=” f o u n d i n g Y e a r ”AND T1 . o b j e c t=T4 . s u b j e c tAND T4 . o b j e c t=” 1718 ”AND T3 . o b j e c t LIKE ’%1976% ’

Too many self-joins!

Page 18: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Existing Solutions

1. Property tableI Each class of objects go to a different table ⇒ similar to

normalized relationsI Eliminates some of the joins

2. Vertically partitioned tablesI For each property, build a two-column table, containing both

subject and object, ordered by subjectsI Can use merge join (faster)I Good for subject-subject joins but does not help with

subject-object joins

3. Exhaustive indexingI Create indexes for each permutation of the three columnsI Query components become range queries over individual

relations with merge-join to combineI Excessive space usage

Page 19: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

gStore [Zou et al., VLDB 11], [Zou et al., VLDB J 14] –General Idea

I We work directly on the RDF graph and the SPARQL querygraph

I Answering SPARQL query ≡ subgraph matchingI Subgraph matching is computationally expensive

I Use a signature-based encoding of each entity and class vertexto speed up matching

I Filter-and-evaluateI Use a false positive algorithm to prune nodes and obtain a set

of candidates; then do more detailed evaluation on those

I We develop an index (VS∗-tree) over the data signature graph(has light maintenance load) for efficient pruning

Page 20: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

0. Start with RDF Graph G

?m ?citybornIn

?name

hasName

?bd

bornOnDate

“1718”

foundingYear

FILTER(regex(str(?bd),“1976”))

y:Abraham Lincoln

“Abraham Lincoln”

hasName

“1809-02-12”bornOnDate

“1865-04-15”

diedOnDate

“President”

title

“Male”

gender

y:Washington D.C.

“1790”

foundYear

“Washington D.C.”

hasName

y:Hodgenville KY “Hodgenville”hasName

y:United States

“United States”

hasName

“1776”

foundingYear

y:Reese Witherspoon

“1976-03-22”

bornOnDate

“Female”

gender

“Actress”

title

“Reese Witherspoon”

hasName

y:New Orleans LA

“1718”

foundingYear

y:Franklin Roosevelt

“Franklin D. Roosevelt”

hasName

“Male”

gender

“President”

title

y:Hyde Park NY

“1810”

foundingYear

y:Marilyn Monroe“1962-08-05”diedOnDate

“1926-07-01”

bornOnDate

“Female”

gender

“Marilyn Monroe”hasName

diedIn

bornIn

hasCapital

bornInlocatedIn locatedIn

bornIn

Page 21: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

0. Start with RDF Graph G

?m ?citybornIn

?name

hasName

?bd

bornOnDate

“1718”

foundingYear

FILTER(regex(str(?bd),“1976”))

y:Abraham Lincoln

“Abraham Lincoln”

hasName

“1809-02-12”bornOnDate

“1865-04-15”

diedOnDate

“President”

title

“Male”

gender

y:Washington D.C.

“1790”

foundYear

“Washington D.C.”

hasName

y:Hodgenville KY “Hodgenville”hasName

y:United States

“United States”

hasName

“1776”

foundingYear

y:Reese Witherspoon

“1976-03-22”

bornOnDate

“Female”

gender

“Actress”

title

“Reese Witherspoon”

hasName

y:New Orleans LA

“1718”

foundingYear

y:Franklin Roosevelt

“Franklin D. Roosevelt”

hasName

“Male”

gender

“President”

title

y:Hyde Park NY

“1810”

foundingYear

y:Marilyn Monroe“1962-08-05”diedOnDate

“1926-07-01”

bornOnDate

“Female”

gender

“Marilyn Monroe”hasName

diedIn

bornIn

hasCapital

bornInlocatedIn locatedIn

bornIn

Finding matches over a largegraph is not a trivial task!

Page 22: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Outline

RDF Introduction

gAnswer: Natural Language Question Answering over RDFA Data Driven Approach [Zou et al., SIGMOD 2014; Zheng etal., SIGMOD 2015]

Page 23: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

gAnswer: Natural Language Question Answering OverKnowledge Graph–A Graph Data Driven Approach

I An Easy-to-Use Interface to Access Knowledge GraphI It is interesting to both academia and industry.I Interdisciplinary research between database and NLP (natural

language processing) communities.

Page 24: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

gAnswer: Natural Language Question Answering OverKnowledge Graph–A Graph Data Driven Approach

I An Easy-to-Use Interface to Access Knowledge GraphI It is interesting to both academia and industry.I Interdisciplinary research between database and NLP (natural

language processing) communities.

gAnswer

Page 25: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Running Example

Question: Who was married to an actor that play in Philadelphia ?

Subject Property ObjectAntonio Banderas type actorAntonio Banderas spouse Melanie GriffithAntonio Banderas starring Philadelphia (film)Philadelphia (film) type filmJonathan Demme director filmPhiladelphia type cityAaron McKie bornIn PhiladelphiaJames Anderson playForTeam Philadelphia 76ersConstantin Stanislavski create An Actor PreparesPhiladelphia 76ers type Basketball teamAn Actor Prepares type Book

Melanie Griffith

Page 26: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Running Example

Question: Who was married to an actor that play in Philadelphia ?

Subject Property ObjectAntonio Banderas type actor

Antonio Banderas spouse Melanie Griffith

Antonio Banderas starring Philadelphia (film)Philadelphia (film) type filmJonathan Demme director filmPhiladelphia type cityAaron McKie bornIn PhiladelphiaJames Anderson playForTeam Philadelphia 76ersConstantin Stanislavski create An Actor PreparesPhiladelphia 76ers type Basketball teamAn Actor Prepares type Book

Melanie Griffith

Page 27: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Search Needs a Shake-Up

“ Academics and industry researchers mustinvest much more in bold strategies thatcan achieve natural-language searching andanswering.”—Oren Etzioni, NATURE, Vol 476, p25-26,2011.

Page 28: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Some Interesting ProductsEVI— acquired by Amazon on October 2012.

William Tunstall-Pedoe: True Knowledge: Open-Domain Question Answering UsingStructured Knowledge and Inference. AI Magazine 31(3): 80-92 (2010)

Page 29: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Some Interesting ProductsWolframAlpha

Page 30: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Existing Solutions

Who was married to an actor that play in Philadelphia ?

Translate NL Question to structured queriesSELECT ? yWHERE {? x s t a r r i n g P h i l a d e l p h i a ( f i l m ) .? x t y p e a c t o r .? x s p o u s e ? y . }

Query Processing

Melanie Griffith

Ambiguity

Page 31: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Existing Solutions

Who was married to an actor that play in Philadelphia ?

Translate NL Question to structured queriesSELECT ? yWHERE {? x s t a r r i n g P h i l a d e l p h i a ( f i l m ) .? x t y p e a c t o r .? x s p o u s e ? y . }

Query Processing

Melanie Griffith

Philadelpha

Philadelpha (film)

Philadelpha 76ers

Ambiguity

Page 32: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Existing Solutions

Who was married to an actor that play in Philadelphia?

Translate NL Question to structured queriesSELECT ? yWHERE {? x s t a r r i n g P h i l a d e l p h i a ( f i l m ) .? x t y p e a c t o r .? x s p o u s e ? y . }

Query Processing

Melanie Griffith

playForTeam

starring

director

Ambiguity

Page 33: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Data Driven [Zou et al., SIGMOD 14]

actor film city

Antonio Banderas

Philadelphia (film)

Philadelphia

Melanie Griffith

Jonathan Demme

Aaron McKie

James Anderson

Philadelphia 76ers

Basketball team

Constantin Stanislavski

An Actor Prepares

Book

typetype

type

starringspouse

director

bornIn

PlayForTeam type

create type

Page 34: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Data Driven [Zou et al., SIGMOD 14]

actor film city

Antonio Banderas

Philadelphia (film)

Philadelphia

Melanie Griffith

Jonathan Demme

Aaron McKie

James Anderson

Philadelphia 76ers

Basketball team

Constantin Stanislavski

An Actor Prepares

Book

typetype

type

starringspouse

director

bornIn

PlayForTeam type

create type

Who was married to an actor

that play in Philadelphia ?

“Who”

“actor”/”that”

“Philadelphia”

“be married to” “play in”

Page 35: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Data Driven [Zou et al., SIGMOD 14]

actor film city

Antonio Banderas

Philadelphia (film)

Philadelphia

Melanie Griffith

Jonathan Demme

Aaron McKie

James Anderson

Philadelphia 76ers

Basketball team

Constantin Stanislavski

An Actor Prepares

Book

typetype

type

starringspouse

director

bornIn

PlayForTeam type

create type

Who was married to an actor

that play in Philadelphia ?

“Who”

?Who

“actor”/”that”

(actor, 1.0)

(An Actor Prepares, 0.9)

“Philadelphia”

(Philadelphia, 1.0)

(Philadelphia (film), 0.9)

(Philadelphia 76ers, 0.8)

“be married to” “play in”

(spouse, 1.0)(playForTeam, 1.0)

(starring, 0.9)

(director, 1.0)

Page 36: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Data Driven [Zou et al., SIGMOD 14]

actor film city

Antonio Banderas

Philadelphia (film)

Philadelphia

Melanie Griffith

Jonathan Demme

Aaron McKie

James Anderson

Philadelphia 76ers

Basketball team

Constantin Stanislavski

An Actor Prepares

Book

typetype

type

starringspouse

director

bornIn

PlayForTeam type

create type

Who was married to an actor

that play in Philadelphia ?

“Who”

?Who

“actor”/”that”

(actor, 1.0)

(An Actor Prepares, 0.9)

“Philadelphia”

(Philadelphia, 1.0)

(Philadelphia (film), 0.9)

(Philadelphia 76ers, 0.8)

“be married to” “play in”

(spouse, 1.0)(playForTeam, 1.0)

(starring, 0.9)

(director, 1.0)

Combine Disambiguation

and Query Together !

Page 37: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Framework

Who actor

?who <spouse, 1.0>

<actor, 1.0>

1

2

3

2

1 23

5

3

1

4 6

(a) Natural Language Question

(b) Semantic Relations Extraction and

Building Semantic Query Graph

An_Actor_Prepares

7

8

9

10

5

Page 38: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Offline

I TASK: Building a paraphrase dictionary to map relation phrases topredicates in RDF graph.

I METHOD: Data Driven Approach

Page 39: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Offline

I INPUT: Relation Phrases and Supporting Entity Pairs.

I OUTPUT: Possible Mapping Predicates / Predicate Paths

Relation Phrase Supporting Entity Pairs“play in” (〈 Antonio Banderas 〉, 〈 Philadelphia(film) 〉),

(〈 Julia Roberts 〉, 〈 Runaway Bride 〉),......

“uncle of” (〈 Ted Kennedy 〉, 〈 John F. Kennedy, Jr. 〉)(〈 Peter Corr 〉, 〈 Jim Corr 〉),......

Joseph P. Kennedy,. Sr.

Ted Kennedy John F. Kennedy

Male

John F. Kennedy. Jr.

hasChild hasChild

hasGenderhasChild

hasGender

Joseph P. Kennedy,. Sr.

Ted Kennedy John F. Kennedy

Male

John F. Kennedy. Jr.

hasChild hasChild

hasGenderhasChild

hasGender

hasChild hasChild

hasChild

hasGender hasGender

Which Path is Better ?

Page 40: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Offline

I INPUT: Relation Phrases and Supporting Entity Pairs.

I OUTPUT: Possible Mapping Predicates / Predicate Paths

Relation Phrase Supporting Entity Pairs“play in” (〈 Antonio Banderas 〉, 〈 Philadelphia(film) 〉) ,

(〈 Julia Roberts 〉, 〈 Runaway Bride 〉),......

“uncle of” (〈 Ted Kennedy 〉, 〈 John F. Kennedy, Jr. 〉)(〈 Peter Corr 〉, 〈 Jim Corr 〉),......

Antono Banderas

Philadelphia (film)

Julia Roberts

Runaway Bride

starring starring

Joseph P. Kennedy,. Sr.

Ted Kennedy John F. Kennedy

Male

John F. Kennedy. Jr.

hasChild hasChild

hasGenderhasChild

hasGender

Joseph P. Kennedy,. Sr.

Ted Kennedy John F. Kennedy

Male

John F. Kennedy. Jr.

hasChild hasChild

hasGenderhasChild

hasGender

hasChild hasChild

hasChild

hasGender hasGender

Which Path is Better ?

Page 41: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Offline

I INPUT: Relation Phrases and Supporting Entity Pairs.

I OUTPUT: Possible Mapping Predicates / Predicate Paths

Relation Phrase Supporting Entity Pairs“play in” (〈 Antonio Banderas 〉, 〈 Philadelphia(film) 〉) ,

(〈 Julia Roberts 〉, 〈 Runaway Bride 〉),......

“uncle of” (〈 Ted Kennedy 〉, 〈 John F. Kennedy, Jr. 〉)(〈 Peter Corr 〉, 〈 Jim Corr 〉),......

Joseph P. Kennedy,. Sr.

Ted Kennedy John F. Kennedy

Male

John F. Kennedy. Jr.

hasChild hasChild

hasGenderhasChild

hasGender

Joseph P. Kennedy,. Sr.

Ted Kennedy John F. Kennedy

Male

John F. Kennedy. Jr.

hasChild hasChild

hasGenderhasChild

hasGender

hasChild hasChild

hasChild

hasGender hasGender

Which Path is Better ?

Page 42: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Offline

I INPUT: Relation Phrases and Supporting Entity Pairs.

I OUTPUT: Possible Mapping Predicates / Predicate Paths

Relation Phrase Supporting Entity Pairs“play in” (〈 Antonio Banderas 〉, 〈 Philadelphia(film) 〉) ,

(〈 Julia Roberts 〉, 〈 Runaway Bride 〉),......

“uncle of” (〈 Ted Kennedy 〉, 〈 John F. Kennedy, Jr. 〉)(〈 Peter Corr 〉, 〈 Jim Corr 〉),......

Joseph P. Kennedy,. Sr.

Ted Kennedy John F. Kennedy

Male

John F. Kennedy. Jr.

hasChild hasChild

hasGenderhasChild

hasGender

Joseph P. Kennedy,. Sr.

Ted Kennedy John F. Kennedy

Male

John F. Kennedy. Jr.

hasChild hasChild

hasGenderhasChild

hasGender

hasChild hasChild

hasChild

hasGender hasGender

Which Path is Better ?

Page 43: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Offline

I INPUT: Relation Phrases and Supporting Entity Pairs.

I OUTPUT: Possible Mapping Predicates / Predicate Paths

Relation Phrase Supporting Entity Pairs“play in” (〈 Antonio Banderas 〉, 〈 Philadelphia(film) 〉) ,

(〈 Julia Roberts 〉, 〈 Runaway Bride 〉),......

“uncle of” (〈 Ted Kennedy 〉, 〈 John F. Kennedy, Jr. 〉)(〈 Peter Corr 〉, 〈 Jim Corr 〉),......

Joseph P. Kennedy,. Sr.

Ted Kennedy John F. Kennedy

Male

John F. Kennedy. Jr.

hasChild hasChild

hasGenderhasChild

hasGender

Joseph P. Kennedy,. Sr.

Ted Kennedy John F. Kennedy

Male

John F. Kennedy. Jr.

hasChild hasChild

hasGenderhasChild

hasGender

hasChild hasChild

hasChild

hasGender hasGender

Which Path is Better ?

Page 44: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Offline

relation phrase: reli

“uncle of”

virtual document

hasChild hasChild

hasChild

hasGender hasGender

TF-IDF

Page 45: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Offline

relation phrase: reli

“uncle of”

virtual document

hasChild hasChild

hasChild

hasGender hasGender

“co-author with” virtual documentwrite write

hasGender hasGender

TF-IDF

Page 46: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: Offline

relation phrase: reli

“uncle of”

virtual document

hasChild hasChild

hasChild

hasGender hasGender

“co-author with” virtual documentwrite write

hasGender hasGender

TF-IDF

Page 47: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Our Method: OfflineI INPUT: Relation Phrases and Supporting Entity Pairs.I OUTPUT: Possible Mapping Predicates / Predicate Paths

DefinitionGiven a predicate path L , the tf-value of L in PS(reli ) is defined as follows:

tf (L,PS(reli )) = |{Path(v ji , v′ji )|L ∈ Path(v j

i , v′ji )}|

The idf-value of L over the whole relation phrase dictionary T = {rel1, ..., reln}is defined as follows:

idf (L,T ) = log|T |

|{reli ∈ T |L ∈ PS(reli )}|+ 1

The tf-idf value of L is defined as follows:

tf−idf(L,PS(reli ),T ) = tf (L,PS(reli ))× idf (L,T )

We define the confidence probability of mapping relation phrase rel topredicate or predicate path L as follows.

δ(rel , L) = tf−idf(L,PS(reli ),T ) (1)

Page 48: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Framework

Question Understanding

1.1 Building

Dependency Tree

1.2 Finding Relation

Embeddings

1.3 Finding Relation

Embeddings

Query Evaluation

2.1 Phrase Mapping2.2 Top-k Sub-

graph Match

Page 49: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 1.1 Who was married to an actor

that play in Philadelphia ?

“married”

“who” “was” “to”

“actor”

“an” “play”

“that” “in”

“Philadelphia”

nsubjpass auxpass prep

pobj

det rcmod

nsubj prep

pobj

Page 50: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 1.2 Who was married to an actor

that play in Philadelphia ?

“married”

“who” “was” “to”

“actor”

“an” “play”

“that” “in”

“Philadelphia”

nsubjpass auxpass prep

pobj

det rcmod

nsubj prep

pobj

“(be) married to”

Page 51: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 1.3 Who was married to an actor

that play in Philadelphia ?

“married”

“who” “was” “to”

“actor”

“an” “play”

“that” “in”

“Philadelphia”

nsubjpass auxpass prep

pobj

det rcmod

nsubj prep

pobj

“(be) married to”

(“(be)marriedto′′, “who′′, “actor′′)

Page 52: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 1.3 Who was married to an actor

that play in Philadelphia ?

“married”

“who” “was” “to”

“actor”

“an” “play”

“that” “in”

“Philadelphia”

nsubjpass auxpass prep

pobj

det rcmod

nsubj prep

pobj

“(be) married to”

(“(be) married to”, “who”, “actor”)

(“play in”, “that”, “Philadelphia”)

Page 53: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 1.4

Who was married to an actor

that play in Philadelphia ?

(“(be) married to”, “who”, “actor”)

(“play in”, “that”, “Philadelphia”)

Page 54: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 1.4

Who was married to an actor

that play in Philadelphia ?

(“(be) married to”, “who”, “actor”)

(“play in”, “that”, “Philadelphia”)

“Who”

“actor” “that”

“Philadelphia”

“(be) married to” “play in”

Page 55: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 1.4

Who was married to an actor

that play in Philadelphia ?

coreference resolution

(“(be) married to”, “who”, “actor”)

(“play in”, “that”, “Philadelphia”)

“Who”

“actor” “that”

“Philadelphia”

“(be) married to” “play in”

Page 56: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 1.4

Who was married to an actor

that play in Philadelphia ?

(“(be) married to”, “who”, “actor”)

(“play in”, “that”, “Philadelphia”)

“Who”

“actor”/“that”

“Philadelphia”

“(be) married to” “play in”

Page 57: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 2.1 Mapping Edge

“Who”

“actor”/”that”

“Philadelphia”

“be married to” “play in”

(spouse, 1.0)

(playForTeam, 1.0)

(starring, 0.9)

(director, 1.0)

?Who (actor, 1.0)

(An Actor Prepares, 0.9)

(Philadelphia, 1.0)

(Philadelphia (film), 0.9)

(Philadelphia 76ers, 0.8)

Page 58: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 2.1 Mapping Edge

“Who”

“actor”/”that”

“Philadelphia”

“be married to” “play in”

(spouse, 1.0)

(playForTeam, 1.0)

(starring, 0.9)

(director, 1.0)

?Who (actor, 1.0)

(An Actor Prepares, 0.9)

(Philadelphia, 1.0)

(Philadelphia (film), 0.9)

(Philadelphia 76ers, 0.8)

Page 59: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 2.1 Mapping Edge

“Who”

“actor”/”that”

“Philadelphia”

“be married to” “play in”

(spouse, 1.0)

(playForTeam, 1.0)

(starring, 0.9)

(director, 1.0)

?Who (actor, 1.0)

(An Actor Prepares, 0.9)

(Philadelphia, 1.0)

(Philadelphia (film), 0.9)

(Philadelphia 76ers, 0.8)

Page 60: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 2.2 Mapping Vertices

“Who”

“actor”/”that”

“Philadelphia”

“be married to” “play in”

(spouse, 1.0)

(playForTeam, 1.0)

(starring, 0.9)

(director, 1.0)

?Who

(actor, 1.0)

(An Actor Prepares, 0.9)

(Philadelphia, 1.0)

(Philadelphia (film), 0.9)

(Philadelphia 76ers, 0.8)

Page 61: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 2.2 Mapping Vertices

“Who”

“actor”/”that”

“Philadelphia”

“be married to” “play in”

(spouse, 1.0)

(playForTeam, 1.0)

(starring, 0.9)

(director, 1.0)

?Who (actor, 1.0)

(An Actor Prepares, 0.9)

(Philadelphia, 1.0)

(Philadelphia (film), 0.9)

(Philadelphia 76ers, 0.8)

Page 62: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 2.2 Mapping Vertices

“Who”

“actor”/”that”

“Philadelphia”

“be married to” “play in”

(spouse, 1.0)

(playForTeam, 1.0)

(starring, 0.9)

(director, 1.0)

?Who (actor, 1.0)

(An Actor Prepares, 0.9)

(Philadelphia, 1.0)

(Philadelphia (film), 0.9)

(Philadelphia 76ers, 0.8)

Page 63: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 2.3 Finding Top-k Matches

actor film city

Antonio Banderas

Philadelphia (film)

Philadelphia

Melanie Griffith

Jonathan Demme

Aaron McKie

James Anderson

Philadelphia 76ers

Basketball team

Constantin Stanislavski

An Actor Prepares

Book

typetype

type

starringspouse

director

bornIn

PlayForTeam type

create type

“Who”

?Who

(Melanie Griffith)

“actor”/”that”

(actor, 1.0)

(An Actor Prepares, 0.9)

“Philadelphia”

(Philadelphia, 1.0)

(Philadelphia (film), 0.9)

(Philadelphia 76ers, 0.8)

“be married to” “play in”

(spouse, 1.0)

(playForTeam, 1.0)

(starring, 0.9)

(director, 1.0)

Score(M) =log(

∏vi∈V (QS ) δ(argi , ui )×

∏vi vj∈E(QS ) δ(relvi vj ,Pij ))

=∑

vi∈V (QS ) log(δ(argi , ui )) +∑

vi vj∈E(QS ) log(δ(relvi vj ,Pij ))

(2)Finding Top-k Matches

Combine Disambiguation

and Query Together !

Page 64: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 2.3 Finding Top-k Matches

actor film city

Antonio Banderas

Philadelphia (film)

Philadelphia

Melanie Griffith

Jonathan Demme

Aaron McKie

James Anderson

Philadelphia 76ers

Basketball team

Constantin Stanislavski

An Actor Prepares

Book

typetype

type

starringspouse

director

bornIn

PlayForTeam type

create type

“Who”

?Who

(Melanie Griffith)

“actor”/”that”

(actor, 1.0)

(An Actor Prepares, 0.9)

“Philadelphia”

(Philadelphia, 1.0)

(Philadelphia (film), 0.9)

(Philadelphia 76ers, 0.8)

“be married to” “play in”

(spouse, 1.0)

(playForTeam, 1.0)

(starring, 0.9)

(director, 1.0)

Score(M) =log(

∏vi∈V (QS ) δ(argi , ui )×

∏vi vj∈E(QS ) δ(relvi vj ,Pij ))

=∑

vi∈V (QS ) log(δ(argi , ui )) +∑

vi vj∈E(QS ) log(δ(relvi vj ,Pij ))

(2)

Finding Top-k Matches

Combine Disambiguation

and Query Together !

Page 65: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 2.3 Finding Top-k Matches

actor film city

Antonio Banderas

Philadelphia (film)

Philadelphia

Melanie Griffith

Jonathan Demme

Aaron McKie

James Anderson

Philadelphia 76ers

Basketball team

Constantin Stanislavski

An Actor Prepares

Book

typetype

type

starringspouse

director

bornIn

PlayForTeam type

create type

“Who”

?Who

(Melanie Griffith)

“actor”/”that”

(actor, 1.0)

(An Actor Prepares, 0.9)

“Philadelphia”

(Philadelphia, 1.0)

(Philadelphia (film), 0.9)

(Philadelphia 76ers, 0.8)

“be married to” “play in”

(spouse, 1.0)

(playForTeam, 1.0)

(starring, 0.9)

(director, 1.0)

Score(M) =log(

∏vi∈V (QS ) δ(argi , ui )×

∏vi vj∈E(QS ) δ(relvi vj ,Pij ))

=∑

vi∈V (QS ) log(δ(argi , ui )) +∑

vi vj∈E(QS ) log(δ(relvi vj ,Pij ))

(2)Finding Top-k Matches

Combine Disambiguation

and Query Together !

Page 66: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 2.3 Finding Top-k Matches

actor film city

Antonio Banderas

Philadelphia (film)

Philadelphia

Melanie Griffith

Jonathan Demme

Aaron McKie

James Anderson

Philadelphia 76ers

Basketball team

Constantin Stanislavski

An Actor Prepares

Book

typetype

type

starringspouse

director

bornIn

PlayForTeam type

create type

“Who”

?Who

(Melanie Griffith)

“actor”/”that”

(actor, 1.0)

(An Actor Prepares, 0.9)

“Philadelphia”

(Philadelphia, 1.0)

(Philadelphia (film), 0.9)

(Philadelphia 76ers, 0.8)

“be married to” “play in”

(spouse, 1.0)

(playForTeam, 1.0)

(starring, 0.9)

(director, 1.0)

Score(M) =log(

∏vi∈V (QS ) δ(argi , ui )×

∏vi vj∈E(QS ) δ(relvi vj ,Pij ))

=∑

vi∈V (QS ) log(δ(argi , ui )) +∑

vi vj∈E(QS ) log(δ(relvi vj ,Pij ))

(2)Finding Top-k Matches

Combine Disambiguation

and Query Together !

Page 67: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 2.3 Finding Top-k Matches

TheoremFinding Top-k subgraph matches of QS over RDF graph G is anNP-hard problem.

Require: Input: A semantic query graph QS and a RDF G . Output: Top-k SPARQL Queries, i.e., the top-k

matches from QS to G .1: for each candidate list Lri , i = 1, ......, |E(QS )| do2: Sorting all candidate relations in Lri in a non-ascending order

3: for each candidate list Largj , j = 1, ......, |V (QS )| do

4: Sorting all candidate entities/classes (i.e., vertices in G ′) in Largj in a non-ascending order.

5: Set cursor ci to the head of Lri and cursor cj to the head of Largj , respectively.

6: Set the upper bound Upbound(Q) and the threshold θ = −∞7: while true do8: for each cursor cj in list Largj , j = 1, ......, |V (QS )| do

9: Performance a exploration based subgraph isomorphism algorithm from cursor cj , such as VF2, to

find subgraph matches (of QS over G), which contains cj .

10: Update the threshold θ to be the top-k match sore so far.

11: Move all cursors ci and cj by one step forward in each list.

12: Update the upper bound Upbound(Q) according to Equation ??.

13: if θ ≥ Upbound(Q) then

14: Break // TA-style stopping strategy

Page 68: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 2.3 Finding Top-k Matches

TheoremFinding Top-k subgraph matches of QS over RDF graph G is anNP-hard problem.Require: Input: A semantic query graph QS and a RDF G . Output: Top-k SPARQL Queries, i.e., the top-k

matches from QS to G .1: for each candidate list Lri , i = 1, ......, |E(QS )| do2: Sorting all candidate relations in Lri in a non-ascending order

3: for each candidate list Largj , j = 1, ......, |V (QS )| do

4: Sorting all candidate entities/classes (i.e., vertices in G ′) in Largj in a non-ascending order.

5: Set cursor ci to the head of Lri and cursor cj to the head of Largj , respectively.

6: Set the upper bound Upbound(Q) and the threshold θ = −∞7: while true do8: for each cursor cj in list Largj , j = 1, ......, |V (QS )| do

9: Performance a exploration based subgraph isomorphism algorithm from cursor cj , such as VF2, to

find subgraph matches (of QS over G), which contains cj .

10: Update the threshold θ to be the top-k match sore so far.

11: Move all cursors ci and cj by one step forward in each list.

12: Update the upper bound Upbound(Q) according to Equation ??.

13: if θ ≥ Upbound(Q) then

14: Break // TA-style stopping strategy

Page 69: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Online: Step 2.3 Finding Top-k Matches

TheoremFinding Top-k subgraph matches of QS over RDF graph G is anNP-hard problem.Require: Input: A semantic query graph QS and a RDF G . Output: Top-k SPARQL Queries, i.e., the top-k

matches from QS to G .1: for each candidate list Lri , i = 1, ......, |E(QS )| do2: Sorting all candidate relations in Lri in a non-ascending order

3: for each candidate list Largj , j = 1, ......, |V (QS )| do

4: Sorting all candidate entities/classes (i.e., vertices in G ′) in Largj in a non-ascending order.

5: Set cursor ci to the head of Lri and cursor cj to the head of Largj , respectively.

6: Set the upper bound Upbound(Q) and the threshold θ = −∞7: while true do8: for each cursor cj in list Largj , j = 1, ......, |V (QS )| do

9: Performance a exploration based subgraph isomorphism algorithm from cursor cj , such as VF2, to

find subgraph matches (of QS over G), which contains cj .

10: Update the threshold θ to be the top-k match sore so far.

11: Move all cursors ci and cj by one step forward in each list.

12: Update the upper bound Upbound(Q) according to Equation ??.

13: if θ ≥ Upbound(Q) then

14: Break // TA-style stopping strategy

TA-style Top-k Algorithm !

Page 70: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Experiments: Datasets

I RDF repository: DBPedia

Table : Statistics of RDF GraphDBpedia

Number of Entities 5.2 million

Number of Triples 60 million

Number of Predicates 1643

Size of RDF Graphs (in GB) 6.1

I Relation Phrase Dictionary: Patty

Table : Statistics of Relation Phrase Dataset

wordnet-wikipedia freebase-wikipediaNumber of Textual Patterns 350,568 1,631,530

Number of Entity Pairs 3,862,304 15,802,947

Average Entity Pair 11 9Number For Each Pattern

Page 71: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Experiments: Online

Benchmark: QALD-3, 99 Natural Language Questions

Table : Evaluating QALD-3 Testing Questions (on DBpedia)

Processed Right Partially Recall Precision F-1OurMethod

76 32 11 0.40 0.40 0.40

squall2sparql 96 77 13 0.85 0.89 0.87CASIA 52 29 8 0.36 0.35 0.36Scalewelis 70 1 38 0.33 0.33 0.33RTV 55 30 4 0.34 0.32 0.33Intui2 99 28 4 0.32 0.32 0.32SWIP 21 14 2 0.15 0.16 0.16DEANNA 27 21 0 0.21 0.21 0.21

Page 72: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Experiments: OnlineID Questions Response Time (in ms)

Q2 Who was the successor of John F. Kennedy? 1699Q3 Who is the mayor of Berlin? 677Q14 Give me all members of Prodigy? 811Q17 Give me all cars that are produced in Germany ? 297Q19 Give me all people that were born in Vienna and died in Berlin ? 2557Q20 How tall is Michael Jordan ? 942Q21 What is the capital of Canada ? 1342Q22 Who is the governor of Wyoming ? 796Q24 Who was the father of Queen Elizabeth II? 538Q27 Sean Parnell is the governor of which U.S. state ? 1210Q28 Give me all movies directed by Francis Ford Coppola. 577Q30 What is the birth name of Angela Merkel ? 250Q35 Who developed Minecraft ?. 2565Q39 Give me all companies in Munich. 1312Q41 Who founded Intel? 1105Q42 Who is the husband of Amanda Palmer ? 1418Q44 Which cities does the Weser flow through ? 1139Q45 Which countries are connected by the Rhine ? 736Q54 What are the nicknames of San Francisco ? 321Q58 What is the time zone of Salt Lake City ? 316Q63 Give me all Argentine films. 427Q70 Is Michelle Obama the wife of Barack Obama ? 316Q74 When did Michael Jackson die ? 258Q76 List the children of Margaret Thatcher. 1139Q77 Who was called Scarface? 719Q81 Which books by Kerouac were published by Viking Press ? 796Q83 How high is the Mount Everest ? 635Q84 Who created the comic Captain America ? 589Q86 What is the largest city in Australia ? 1419Q89 In which city was the former Dutch queen Juliana buried ? 1700Q98 Which country does the creator of Miffy come from ? 2121Q100 Who produces Orangina ? 367

Page 73: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Experiments: Online

10

100

1000

10000

100000

Q2 Q20 Q21 Q22 Q28 Q35 Q41 Q42 Q44 Q45 Q54 Q74 Q76 Q83 Q84 Q86

Question Understanding in DEANNA Question Understanding in Our System

Overall time in DEANNA Overall time in Our System

Runnin

gT

ime

(in m

s)

Figure : Online Running Time Comparison

Page 74: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Experiments: Online

Figure : QALD-4 Results

Page 75: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

How to Build Templates for RDF Q/A— An Uncertain Graph Similarity Join Approach[Zheng et al., SIGMOD 15]

nsubj

det

Which physicist graduated from Princeton University? Which <___> graduated from <___>?

Which

physicist

graduated

from

Princeton University

pobj

prep nsubj

det

Which

<___>

graduated

from

<___>

pobj

prep

syntactic dependency tree

which <___> graduated from <___>?SELECT ?person WHERE

{ ?person type ____

?person graduatedFrom ____

}

Template

EVI (TrueKnowledge)

I Manually defined templates

I About 1200 templates in 2010[Tunstall-Pedoe, AI Magazine 2010]

CAN we generate templatesAUTOMATICALLY ?

Page 76: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

How to Build Templates for RDF Q/A— An Uncertain Graph Similarity Join Approach[Zheng et al., SIGMOD 15]

nsubj

det

Which physicist graduated from Princeton University? Which <___> graduated from <___>?

Which

physicist

graduated

from

Princeton University

pobj

prep nsubj

det

Which

<___>

graduated

from

<___>

pobj

prep

syntactic dependency tree

which <___> graduated from <___>?SELECT ?person WHERE

{ ?person type ____

?person graduatedFrom ____

}

Template

EVI (TrueKnowledge)

I Manually defined templates

I About 1200 templates in 2010[Tunstall-Pedoe, AI Magazine 2010]

CAN we generate templatesAUTOMATICALLY ?

Page 77: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

How to Build Templates for RDF Q/A— An Uncertain Graph Similarity Join Approach[Zheng et al., SIGMOD 15]

nsubj

det

Which physicist graduated from Princeton University? Which <___> graduated from <___>?

Which

physicist

graduated

from

Princeton University

pobj

prep nsubj

det

Which

<___>

graduated

from

<___>

pobj

prep

syntactic dependency tree

which <___> graduated from <___>?SELECT ?person WHERE

{ ?person type ____

?person graduatedFrom ____

}

Template

EVI (TrueKnowledge)

I Manually defined templates

I About 1200 templates in 2010[Tunstall-Pedoe, AI Magazine 2010]

CAN we generate templatesAUTOMATICALLY ?

Page 78: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

What do we have at hand ?

I Natural language question workload, such as Yahoo WebQuestions.I SPARQL workload, such as DBpeida SPARQL workload.

what is the name of justin bieber brother?

Who is the daughter of Ingrid Bergman married to?

Where did saki live?

Which river does the Tower Bridge cross?

who does joakim noah play for?

In which city did John F. Kennedy die?

where are the nfl redskins from?

In which country does the Yangtze River start?

how old is sacha baron cohen?

Give me the birthdays of all actors of the television show Charmed.

who did draco malloy end up marrying?

which countries border the us?

where is rome italy located on a map?

NLQ Workload1. SELECT DISTINCT ?uri

WHERE {

res:Albert_Einstein dbo:deathPlace ?uri .

?uri rdf:type dbo:City .

}

2. SELECT DISTINCT ?uri

WHERE {

res:Bill_Clinton dbo:child ?child .

?child dbo:spouse ?uri .

}

3. PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX res: <http://dbpedia.org/resource/>

SELECT DISTINCT ?uri

WHERE {

res:Brooklyn_Bridge dbo:crosses ?uri .

}

4. SELECT DISTINCT ?date

WHERE {

res:The_Truman_Show dbo:starring ?actor .

?actor dbo:birthDate ?date .

}

5. PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX res: <http://dbpedia.org/resource/>

SELECT DISTINCT ?uri

WHERE {

res:Nile dbo:sourceCountry ?uri .

}

SPARQL Query Workload

Page 79: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

What do we have at hand ?

I Natural language question workload, such as Yahoo WebQuestions.I SPARQL workload, such as DBpeida SPARQL workload.

what is the name of justin bieber brother?

Who is the daughter of Ingrid Bergman married to?

Where did saki live?

Which river does the Tower Bridge cross?

who does joakim noah play for?

In which city did John F. Kennedy die?

where are the nfl redskins from?

In which country does the Yangtze River start?

how old is sacha baron cohen?

Give me the birthdays of all actors of the television show Charmed.

who did draco malloy end up marrying?

which countries border the us?

where is rome italy located on a map?

NLQ Workload1. SELECT DISTINCT ?uri

WHERE {

res:Albert_Einstein dbo:deathPlace ?uri .

?uri rdf:type dbo:City .

}

2. SELECT DISTINCT ?uri

WHERE {

res:Bill_Clinton dbo:child ?child .

?child dbo:spouse ?uri .

}

3. PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX res: <http://dbpedia.org/resource/>

SELECT DISTINCT ?uri

WHERE {

res:Brooklyn_Bridge dbo:crosses ?uri .

}

4. SELECT DISTINCT ?date

WHERE {

res:The_Truman_Show dbo:starring ?actor .

?actor dbo:birthDate ?date .

}

5. PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX res: <http://dbpedia.org/resource/>

SELECT DISTINCT ?uri

WHERE {

res:Nile dbo:sourceCountry ?uri .

}

SPARQL Query Workload

Page 80: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Framework

Who wrote the book Scarlet and Black?

SELECT ?x WHERE

{ Scarlet_and_Black author ?x}

SPARQL Generation

Query Evaluation

SPARQL query Engine ?x: Stendhal

____________________________

Template Generation

Templates

SELECT ?x WHERE

{ ___ author ?x}Who wrote the book <___>?

Natural Language

Question Pattern SPARQL Pattern

...

...

Natural Language

Question Query Log

SPARQL Query

Log

Similar Graph Pairs

Natural Language

Question Query

SPARQL

Query

Q/A Answers

Uncertain Graphs Certain Graphs

Q/A with Templates

Templates

Template Generation

Natural Language

Question Workload

SPARQL Query

Workload

Similar Graph Pairs

Natural Language

Question Query

SPARQL

Query

Q/A Answers

Uncertain Graphs Certain Graphs

Q/A with Templates

Templates

SPARQL

Query Engine

Step 1

Step 2

Step 3

Page 81: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Step1: Uncertain Graph Generation

“Which acor”

“Michael Jordan”

“city”

“from” “be married to” “born in” “located in”

<United_States>,1.0

<Country>

<type>

<birthPlace>

?x, 1.0

<Actor>

<type>

<Spouse><Michael_Jordan>,0.6

<NBA_Player>

<Michael_Jordan>

<Professor>,0.3

?x,1.0

<City>

<type>

<New_York>, 0.7

<State>

<New_York_City>,0.3

<City>

<type>

<Michael_Jordan>

<Actor>,0.1

?b

?x

?a

?c

?d

<Spouse> <birthPlace> <locatedIn>

<Country>,1.0

<type>

<Actor>,1.0

<type>

<NBA_Player>,0.6

<Professor>,0.3

<Actor>,0.1<City>,1.0

<type>

<type><State>,0.7

<City>,0.3

<type>

<birthPlace>,1.0

<locatedIn>

(a) Semantic Query Graph

(b) Uncertain Graph

<birthPlace>

<type>

<type>

<type>

<type>

“USA”

Which actor from USA is married to Michael Jordan born in a city of NY?

“NY”

v10

v9

v7

v8

v1

v2

v3

v6

v4

v5

Page 82: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Step2: Finding Similar Graph Pairs

SPARQL Query Workload

1. Which actor from USA is married to

Michael Jordan born in a city of NY?

2. Which politician graduated from CIT?

1. SELECT ?person WHERE

{?person type Artist

?person graduatedFrom

Harvard_University}

2. SELECT ?person1 WHERE

{?person1 type Actor

?person1 birthPlace United_States

?person2 spouse ?person1

?person2 type NBA_star

?person2 birthPlace New_York_City}

Natural Language Question

(NLQ) Workload

(“NBA star”, 0.6)

“Actor”

(“Professor”, 0.3) (“Actor”, 0.1)v1 v2

v3

v7v8v9

spouse?x

“Country”

birthPlacetype

type birthPlace (“State”,0.7)

(“City”,0.3)

?c

“Artist”

“University”typeu1

u2

u4

?x

graduatedFrom

...

“Politician”

typev1

v2

v4

?xgraduatedFrom

g1

g2

q1

q2

......g1

g2

q2

q1

Uncertain Graph

Generation

(“University”,0.8)

(“Company”,0.2)

?atype

type

v4

v10type

“Actor”

“Country”

spouse

type

“City”

u1u2

u3

u5

u4

u8

?xbirthPlace

birthPlace?a

“NBA star”

type

type

type?av3

?b

typeu3 ?a

?b

?c

u6

u7

type “City”

locatedIn

v6

v5

?d

...

Page 83: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Step3: Template Generation

“Which politician”

“graduated from”

?x, 1.0

<Politician><type>

<California Institute of

Technology>, 0.8

<University>

<CIT Group>,0.2

<Company>

<type>

?x

CIT

<graduated

From>

<Politician>,1.0

<type><University>,0.8

<Company>,0.2

<type>

<graduatedFrom>

(a) Semantic Query

Graph

(b) Uncertain Graph

<type>

which politician graduated from CIT?

“CIT”

v1

v2

v3

v4

?x

Harvad_Univeristy

<graduatedFrom>

<Artist>

<type>

<University>

<type>

(c) SPAQRL

u1

u2

u3

u4

which <___> graduated from <___>?SELECT ?person WHERE

{?person type ____

?person graduatedFrom ____ }

(d) Template

q1g2

Page 84: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Problem Formulation

Definition (Finding Similar Graph Pairs)

Given a set of deterministic graphs D (derived from the SPARQLquery workload), a set of uncertain graphs U (derived from naturallanguage question workload), a graph edit distance threshold τ ,and a similarity probability threshold α ∈ (0, 1], a SimJ queryreturns the graph pairs 〈q, g〉 (q ∈ D and g ∈ U) withSimPτ (q, g) ≥ α.

Similarity Probability

SimPτ (q, g) =∑

pw(g)∈PW (g)

Pr{pw(g)|ged(q, pw(g)) ≤ τ}

Page 85: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Problem Formulation

Definition (Finding Similar Graph Pairs)

Given a set of deterministic graphs D (derived from the SPARQLquery workload), a set of uncertain graphs U (derived from naturallanguage question workload), a graph edit distance threshold τ ,and a similarity probability threshold α ∈ (0, 1], a SimJ queryreturns the graph pairs 〈q, g〉 (q ∈ D and g ∈ U) withSimPτ (q, g) ≥ α.

Similarity Probability

SimPτ (q, g) =∑

pw(g)∈PW (g)

Pr{pw(g)|ged(q, pw(g)) ≤ τ}

Page 86: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Existing GED Lower Bounds

I Global Filters

� vertex/edge counting [Zeng et al., 2009]:ged(q, g) ≥ ||V (g)| − |V (q)||+ ||E(g)| − |E(q)||

� label-based lower bounds [Zhao et al., 2012]:ged(q, g) ≥ Γ(MV (q),MV (g)) + Γ(ME (q),ME (g))

I n-gram based Lower Bounds� path-based fliter [Zhao et al., 2012]

distP(q, g) = max(|MG(q)| − τ · Dp , |MG(g)| − τ · Dp(g))

� tree-based fliter [Wang et al., 2012]distT (q, g) = max(|V (q)| − τ · Dt , |V (g)| − τ · Dt(g))

� star-based fliter [Zeng et al., 2009]

distS (q, g) =µ(q, g)

max{4, [max{ δ(q), δ(g)}] + 1}

I Partition-based Lower Bound

� Pars [Zhao et al., 2013]� Disjoint-Partition [Zheng et al., 2015]

Page 87: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Challenges

Challenges

Existing lower bounds are designed for deterministic graphs

Two Possible Variants

I ignoring all the vertex/edge labels

I enumerating all possible worlds

Page 88: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Common Structural Subgraph (CSS)

pw1(g1)

S

bt

sNS v2

?xA Ctb

v1

v8v7 v9

v3

v4

?c t

v10t

?b

bt

s

NS u2 Ci

ACt b

u1

u5u4 u7

u3

q2

?x

?a

u6t

u8t

?c

?b

?a v5

v6

lt

S?d

Page 89: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

CSS-based Lower Bound

The relationship between GED and CSS[Brun et al., 2012]:

ged(q, g c)

= min∀s{(|V (q)|+ |E(q)|)− (|V (s)|+ |E(s)|) (part 1)

+(|V (g c)|+ |E(g c)|)− (|V (s)|+ |E(s)|) (part 2)

+(|Vs |+ |Es |)} (part 3)

= |V (q)|+ |E(q)|+ |V (g c)|+ |E(g c)| − F (q, g c) (3)

where Vs and Es are the subsets of vertices and edges in CSS, whose labelsneed to be substituted, and we have

F (q, g c) = max∀s{2 · (|V (s)|+ |E(s)|)− (|Vs |+ |Es |)}. (4)

ged(q, g) ≥ |V |+ |E | − λE (q, g) +dif (q, g)

2− λV (q, g), (5)

Page 90: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

CSS-based Similarity Probability Upper Bound

SimPτ (q, g) =∑

pw(g)∈PW (g)

Pr{pw(g)|ged(q, pw(g)) ≤ τ}

Let C(q, g) be |V |+ |E | − λE (q, g) +dif (q, g)

2(constant ).

SimPτ (q, g) ≤∑

pw(g)∈PW (g)

Pr{pw(g)|λV (q, pw(g)) ≥ C(q, g)− τ}

= Pr{λV (q, g) ≥ C(q, g)− τ}.

Page 91: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

CSS-based Similarity Probability Upper Bound

m independent variables y1, y2, . . . , and ymdenote (y1 + y2 + . . .+ ym) as a single random variable Yby applying the Markov’s inequality

SimPτ (q, g) ≤ ub SimPτ (q, g) =E(Y )

C(q, g)− τ

where E(Y ) =∑m

i=1 E(yi ) and E(yi ) = Σlj∈l(vi )Pr(lj |lj ∩ ΣV (q) 6= ∅)

Page 92: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Cost-based Query Optimization

divide all possible worlds of an uncertain graph g into disjointpossible world groups

Page 93: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Conclusions

I Graph Database is a Possible Way for RDF Knowledge BaseManagement.

I Subgraph Matching is a Strong Tool.

I Using RDF repository, how to Provide Knowledge Services forApplications and Common Users?

Page 94: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Conclusions

I Graph Database is a Possible Way for RDF Knowledge BaseManagement.

I Subgraph Matching is a Strong Tool.

I Using RDF repository, how to Provide Knowledge Services forApplications and Common Users?

Page 95: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Conclusions

I Graph Database is a Possible Way for RDF Knowledge BaseManagement.

I Subgraph Matching is a Strong Tool.

I Using RDF repository, how to Provide Knowledge Services forApplications and Common Users?

Page 96: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Thank you!

gStore gAnswer

Page 97: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Reference I

Brun, L., Gauzere, B., and Fourey, S. (2012).

Relationships between graph edit distance and maximal commonstructural subgraph.

In SSPR/SPR, pages 42–50.

Wang, G., Wang, B., Yang, X., and Yu, G. (2012).

Efficiently indexing large sparse graphs for similarity search.

TKDE, 24(3):440–451.

Zeng, Z., Tung, A. K. H., Wang, J., Feng, J., and Zhou, L. (2009).

Comparing stars: On approximating graph edit distance.

PVLDB, 2(1):25–36.

Zhao, X., Xiao, C., Lin, X., Liu, Q., and Zhang, W. (2013).

A partition-based approach to structure similarity search.

PVLDB, 7(3):169–180.

Zhao, X., Xiao, C., Lin, X., and Wang, W. (2012).

Efficient graph similarity joins with edit distance constraints.

In ICDE.

Page 98: Natural Language Question Answering Over Knowledge Graph A ...ws.nju.edu.cn/wiki/attach/863-讨论讲稿/20150716... · Knowledge Graph|A Data-Driven Approach Lei Zou Peking University

Reference II

Zheng, W., Zou, L., Lian, X., Wang, D., and Zhao, D. (2015).

efficient graph similairty search over large graph databases.

TKDE, 24(3):440–451.