partii:schema-agnostic knowledge base queryingink-ron.usc.edu/xiangren/cikm17-kb_querying.pdf ·...
TRANSCRIPT
![Page 1: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/1.jpg)
Construction and Querying of Large-scale Knowledge Bases
Part II: Schema-agnostic Knowledge Base Querying
![Page 2: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/2.jpg)
2
Desktop search Mobile search
Transformation in Information Search
NewYork-NewYorkhotelAnswer:
LengthyDocuments?DirectAnswers!
“WhichhotelhasarollercoasterinLasVegas?”
Surgeof
mobileInternet
usein
China
![Page 3: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/3.jpg)
Application: Facebook Entity Graph
3
People,Places,andThingsFacebook’s knowledge graph (entity graph) stores as entities the users, places, pages and other objects within the Facebook.
ConnectingThe connections between the entities indicate the type of relationship between them, such as friend, following, photo, check-in, etc.
![Page 4: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/4.jpg)
QA Engine instead of Search Engine
• Behind the scene: A knowledge graph with millions of entities and billions of facts
4
![Page 5: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/5.jpg)
Structured Query: RDF + SPARQL
Subject Predicate ObjectBarack_Obama parentOf Malia_ObamaBarack_Obama parentOf Natasha_ObamaBarack_Obama spouse Michelle_ObamaBarack_Obama_Sr. parentOf Barack_Obama
5
Triples in an RDF graph
Barack_Obama_Sr.
Barack_Obama
Malia_Obama
Natasha_Obama
Michelle_ObamaRDF graph
SELECT?xWHERE{Barack_Obama_Sr.parentOf ?y.?yparentOf ?x.}
<Malia_Obama><Natasha_Obama>
SPARQL query
Answer
![Page 6: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/6.jpg)
Why Structured Query Falls Short?
6
KnowledgeBase #Entities #Triples #Classes #RelationsFreebase 45M 3B 53K 35KDBpedia 6.6M 13B 760 2.8KGoogleKnowledgeGraph* 570M 18B 1.5K 35KYAGO 10M 120M 350K 100KnowledgeVault 45M 1.6B 1.1K 4.5K
• It’s more than large: High heterogeneity of KBs • If it’s hard to write SQL on simple relational
tables, it’s only harder to write SPARQL on large knowledge bases
• EvenharderonautomaticallyconstructedKBswithamassive,loosely-definedschema
* as of 2014
![Page 7: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/7.jpg)
“findallpatientsdiagnosedwitheyetumor”
“Semanticqueriesbyexample”,Lipyeow Limetal.,EDBT2014
7
Certainly, You Do Not Want to Write This!
![Page 8: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/8.jpg)
Schema-agnostic KB Querying
8
“BarackObamaSr.grandchildren”
Keyword query: query like search engine BarackObamaSr.
grandchildren
Graph query: add a little structure
“WhoareBarackObamaSr.’sgrandchildren?”
Natural language query: like asking a friend
<BarackObamaSr.,MaliaObama>
Query by example: Just show me examples
![Page 9: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/9.jpg)
Graph Query
9
Toronto
Prof., 70 yrs.
“Find a professor, ~70 yrs., who works in Toronto and joined Google recently.” Univ. of Toronto
DNNResearch
Geoffrey Hinton(1947-)
Searchintent Graphquery
Amatch(result)
![Page 10: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/10.jpg)
Mismatch between Knowledge Base and Query
10
Knowledge Base Query“University of Washington” “UW”“neoplasm” “tumor”“Doctor” “Dr.”“Barack Obama” “Obama”“Jeffrey Jacob Abrams” “J. J. Abrams” “teacher” “educator”“1980” “~30”“3 mi” “4.8 km”“Hinton” - “DNNresearch” - “Google” “Hinton” - “Google”
… …
![Page 11: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/11.jpg)
Schema-less Graph Querying (SLQ)
11
[Yang et al. VLDB’14]
Query A Match
üAcronymtransformation: ‘UT’à ‘UniversityofToronto’üAbbreviationtransformation: ‘Prof.’à ‘Professor’üNumerictransformation: ‘~70’à ‘1947’üStructuraltransformation: anedgeà apath
Toronto
Prof., 70 yrs.
Univ. of Toronto
DNNResearch
Geoffrey Hinton(1947-)
![Page 12: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/12.jpg)
Transformations
12
[Yang et al. VLDB’14]
![Page 13: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/13.jpg)
Candidate Match Ranking
13
• Features• Nodematchingfeatures:
• Edgematchingfeatures:
• Overall Matching Score
( ( ) | ) exp( ( , ( )) ( , ( )))Q Q
V Ev V e E
P Q Q F v v F e ej j jÎ Î
µ +å å
Query:𝑄 CandidateMatch:𝜑(𝑄)
( , ( )) ( , ( ))V i ii
F v v f v vj a j=å( , ( )) ( , ( ))E j j
j
F e e g e ej b j=å
Toronto
Prof., 70 yrs.
GoogleUniv. of Toronto
GoogleDNNResearch
Geoffrey Hinton (1947-)
Conditional Random Field
[Yang et al. VLDB’14]
![Page 14: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/14.jpg)
Query-specific Ranking via Relevance Feedback• Generic ranking: sub-optimal for specific queries
• By“Washington”,userAmeansWashingtonD.C., whileuserBmightmeanUniversityofWashington
• Query-specific ranking: tailored for each query• Butneedadditionalquery-specificinformationforfurtherdisambiguation
14
RelevanceFeedback:Usersindicatethe(ir)relevance ofahandfulofanswers
[Su et al. KDD’15]
![Page 15: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/15.jpg)
Problem Definition
15
: A graph queryG: A knowledge graph
( ) : A candidate match to ( ( ) | , ) : A generic ranking function
: A set of positive/relevant matches of : A set of negative/non-relevant matches of
Q
Q QF Q Q
ff+
-
q
M
M
GraphRelevanceFeedback(GRF):Generateaquery-specificrankingfunction𝐹&for𝑄 basedonand+M -M
[Su et al. KDD’15]
![Page 16: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/16.jpg)
16
[Su et al. KDD’15]
![Page 17: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/17.jpg)
Query-specific Tuning• The 𝜃 represents (query-independent) feature
weights. However, each query carries its own view of feature importance
• Find query-specific 𝜃∗ that better aligned with the query using user feedback
17
* *
( ) ( )* *
( ( ) | , ) ( ( ) | , )( ) (1 )( ) ( , )Q Q
F Q Q F Q Qg Rf f
f fl l
+ -Î Î
+ -= - - +
å åq qq q qM M
M M
UserFeedback Regularization
[Su et al. KDD’15]
![Page 18: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/18.jpg)
Type Inference• Infer the implicit type of each query node• The types of the positive entities constitute a
composite type for each query node
18
Query PositiveFeedback CandidateNodes
[Su et al. KDD’15]
![Page 19: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/19.jpg)
Context Inference• Entity context: neighborhood of the entity• The contexts of the positive entities constitute a
composite context for each query node
19
Query PositiveEntities Candidates
[Su et al. KDD’15]
![Page 20: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/20.jpg)
Experiment Setup• Knowledge graph: DBpedia (4.6M nodes, 100M edges)• Graph query sets: WIKI and YAGO
20
YAGOClass GraphQueryStructured
Informationneed
LinksbetweenYAGOandDBpedia
Answer
Naval Battles of World War II
Involving the United States
Battle of MidwayBattle of the Caribbean
……
Instances
……
[Su et al. KDD’15]
![Page 21: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/21.jpg)
EvaluationwithExplicitFeedback
21
• Explicit feedback: User gives relevance feedback on top-10 results• GRF improves SLQ for over 100%• Three GRF components complement each other
(a)WIKI (b)YAGO
1 5 10 20 50 1000.1
0.2
0.3
0.4
0.5
0.6
0.7
K
MAP
@K
GRFTuning+ContextTuning+TypeTuningSLQ
1 5 10 20 50 1000.2
0.3
0.4
0.5
0.6
0.7
0.8
K
MAP
@K
GRFTuning+ContextTuning+TypeTuningSLQ
Metric:meanaverageprecision(MAP)
[Su et al. KDD’15]
![Page 22: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/22.jpg)
EvaluationwithPseudoFeedback
22
• Pseudo feedback: Blindly assume top-10 results are correct• Erroneous feedback information but no additional user effort
[Su et al. KDD’15]
![Page 23: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/23.jpg)
NaturalLanguageQuery(a.k.a.,KnowledgeBasedQuestionAnswering)
23Figure credit to Scott Yih
![Page 24: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/24.jpg)
Challenges
• Language mismatch• Lotsofwaystoaskthesamequestion
• Find terrorist organizations involved in September 11 attacks• Who did September 11 attacks?• The nine eleven were carried out with the involvement of what terrorist
organizations?• AllneedtobemappedtotheKBrelation:terrorist_attack
24
![Page 25: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/25.jpg)
Challenges• Language mismatch• Large search space
• United_States hasover1millionneighborsinFreebase
25
![Page 26: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/26.jpg)
Challenges• Language mismatch• Large search space
• United_States hasover1millionneighborsinFreebase
• Scalability• Howtoscaleuptomoreadvancedinputs,andscaleouttomoredomains?
• KBQAdataishighlydomain-specific
26
![Page 27: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/27.jpg)
Challenges• Language mismatch• Large search space
• United_States hasover1millionneighborsinFreebase
• Scalability• Howtoscaleuptomoreadvancedinputs,andscaleouttomoredomains?
• KBQAdataishighlydomain-specific
• Compositionality• IfamodelunderstandsrelationAandB,canitanswerA+B?
27
![Page 28: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/28.jpg)
What will be covered
• Model• Generalpipeline• Semanticmatching:CNNandSeq2Seq
• Data• Low-costdatacollectionviacrowdsourcing• Cross-domainsemanticparsingvianeuraltransferlearning
28
![Page 29: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/29.jpg)
General pipeline
29
Topic Entity Linking
Candidate Logical Form Generation
Semantic Matching
Execution
CNN:[Yih etal.ACL’15]Seq2Seq:[SuandYan,EMNLP’17]
Seq2Seq:[JiaandLiang,ACL’16][Liangetal.ACL’17]
![Page 30: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/30.jpg)
Query Graph
30
[Yih et al. ACL’15]
Slides adapted from Scott Yih
![Page 31: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/31.jpg)
Topic Entity Linking
• An advanced entity linker for short text• YangandChang,“S-MART:NovelTree-basedStructuredLearningAlgorithmsAppliedonTweetEntityLinking.”ACL’15
• Prepare surface form lexicon for KB entities• Entity mention candidates: all consecutive word
sequences in lexicon• Score entity mention candidates with the
statistical model, keep top-10 entities
31
[Yih et al. ACL’15]
![Page 32: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/32.jpg)
Candidate Logical Form Generation
• (Roughly) enumerate all admissive logical forms up to a certain complexity (2-hop)
32
[Yih et al. ACL’15]
![Page 33: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/33.jpg)
SemanticMatching(w/CNN)
33
[Yih et al. ACL’15]
Discriminative model: p 𝑅 𝑃 = ./0(123 45,47 )∑ ./0(123 459,47 )�59
![Page 34: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/34.jpg)
SemanticMatching(w/Seq2Seq)
34
WhovoicedMegon<e>
Encoder Decoder
Cast actor ... ...
[Jia+ ACL’16, Liang+ ACL’17, Su+ EMNLP’17]
Generative model: p 𝑅 𝑃 = ∏ 𝑝(𝑅=|𝑃, 𝑅?=)�=
![Page 35: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/35.jpg)
What will be covered
• Model• Generalpipeline• Semanticmatching:CNNandSeq2Seq
• Data• Low-costdatacollectionviacrowdsourcing• Cross-domainsemanticparsingvianeuraltransferlearning
35
![Page 36: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/36.jpg)
Scalability
• Vertical scalability• Scaleuptomorecomplexinputsandlogicalconstructs
36
WhatteamdidMichaelJordanplayfor?
InwhichseasondidMichaelJordangetthemostpoints?
WhowastheheadcoachwhenMichaelJordanstartedplayingfortheChicagoBulls?
![Page 37: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/37.jpg)
Scalability
• Vertical scalability• Scaleuptomorecomplexinputsandlogicalconstructs
• Horizontal scalability• Scaleouttomoredomains• Weather,calendar,hotel,flight,restaurant,…• Knowledgebase,relationaldatabase,API,robots,…• Graph,table,text,image,audio,…
• More data + Better (more data-efficient) model
On Generating Characteristic-rich Question Sets for QA Evaluation (EMNLP’16) Cross-domain Semantic Parsing via Paraphrasing (EMNLP’17)
Building Natural Language Interfaces to Web APIs (CIKM’17)
![Page 38: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/38.jpg)
Low-cost Data Collection via Crowdsourcing
38
count(𝜆x.children(Eddard_Stark,x)⋀place_of_birth(x,Winterfell))
1:Logicalformgeneration
“What is the number of person who is born in Winterfell, and who is child of Eddard Stark?”
“How many children of Eddard Stark were born in Winterfell?”
2:Canonicalutterancegeneration
3:Paraphrasingviacrowdsourcing
[Wang+ ACL’15, Su+ EMNLP’16, Su+ CIKM’17]
![Page 39: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/39.jpg)
Existing KBQA datasets mainly contain simple questions
39
“Where was Obama born?”
“What party did Clay establish?”
“What kind of money to take to bahamas?”
… …
![Page 40: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/40.jpg)
GraphQuestions: A New KBQA Dataset with Rich Characteristics• Structural complexity
• “people who are on a gluten-free diet can’t eat what cereal grain that is used to make challah?”
• Quantitative analysis (functions)• “In which month does the average rainfall of New York City
exceed 86 mm?”
• Commonness• “Where was Obama born?” vs.• “What is the tilt of axis of Polestar?”
• Paraphrase• “What is the nutritional composition of coca-cola?”• “What is the supplement information for coca-cola?”• “What kind of nutrient does coke have?”
• …40
https://github.com/ysu1989/GraphQuestions
![Page 41: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/41.jpg)
LargeRoomtoImproveonGraphQuestions
41
Model AverageF1(%)
Sempre (Berant+EMNLP’13) 10.8
Jacana(Yao+ACL’14) 5.1
ParaSempre (Berant+ACL’14) 12.8
UDepLambda (Reddy+EMNLP’17) 17.6
Para4QA(Li+EMNLP’17) 20.4
![Page 42: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/42.jpg)
Crowdsourcing is great, but…
• There is an unlimited number of application domains; prohibitive cost to collect (sufficient) training data for every one.
• Transfer learning: Use existing data of some sourcedomains to help target domain
• Problem: KBQA data is highly domain-specific
42
[Su+ EMNLP’17]
![Page 43: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/43.jpg)
What is transferrable in semantic parsing?
43
InwhichseasondidKobeBryantplayfortheLakers?
𝐑 season . (player.KobeBryant⊓ team.Lakers)
𝑝 team "playfor"
WhendidAlicestartworkingforMckinsey?
𝐑 start . (employee.Alice⊓ employer.Mckinsey)
[Su+ EMNLP’17]
![Page 44: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/44.jpg)
44
Cross-domainSemanticParsingviaParaphrasing[Su+EMNLP’17]
• First convert logical forms to canonical utterances• Train a neural paraphrase model on the source
domains; adapt the model to the target domain
![Page 45: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/45.jpg)
45
Whyitworks?
• Source domain: “play for” ⇒“whose team is”• Word embedding: “play” ⇒“work”, “team” ⇒“employer”• Target domain: “work for” ⇒“whose employer is”
![Page 46: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/46.jpg)
Neural Transfer Learning for Semantic Parsing
46
Pre-trained Word Embedding
SourceDomain TargetDomain
![Page 47: PartII:Schema-agnostic Knowledge Base Queryingink-ron.usc.edu/xiangren/CIKM17-KB_querying.pdf · Knowledge Base # Entities # Triples # Classes # Relations Freebase 45M 3B 53K 35K](https://reader034.vdocument.in/reader034/viewer/2022051806/5ffdf159cedbbd622039f944/html5/thumbnails/47.jpg)
47
Evaluation
• Overnight dataset: 8 domains (basketball, calendar, etc.), each with a knowledge base
• For each target domain, use other 7 domains as source
58.8
72.7
75.8
79.6 80.6
WANGETAL.(2015) XIAOETAL.(2016) JIAANDLIANG(2016) HERZIGANDBERANT(2017) OURS
AccuracyonOvernightBenchmark