hykss: hybrid keyword and semantic search
DESCRIPTION
HyKSS: Hybrid Keyword and Semantic Search. Andrew Zitzelberger. 1. Keyword Search. 2. Form Based Search. 3. What about?. over 8,000 meters in elevation. less than 100K miles. faster than 100 mph. 4. 5. HyKSS. Hy brid K eyword and S emantic S earch - PowerPoint PPT PresentationTRANSCRIPT
HyKSS: Hybrid Keyword and Semantic Search
Andrew Zitzelberger
1
Keyword Search
2
Form Based Search
3
4
over 8,000 meters in elevation less than 100K miles faster than 100 mph
What about?
5
HyKSS
• Hybrid Keyword and Semantic Search• Semantics – extracted annotations–Multiple ontologies
• Keywords – text
6
Thesis Statement
• HyKSS (hybrid search)– Outperforms keyword and semantic search– Dynamic query weighting outperforms various
other hybrid search approaches– Allows queries over multiple ontologies– Allows pay-as-you-go improvement
7
Extraction Ontologies
8
Data Frames
9
Indexing Architecture
10
Keyword Indexer Semantic Indexer
Keyword Index Semantic Index
Document Collection
Indexing Architecture Implementation
1111
Keyword Indexer
Semantic Indexer
Keyword Index
Semantic Index
Document Collection
OntoES
OntologyLibrary
Sesame
Lucene
Query Processing
12
Free Form Query
Execute Query
Post-Process Query
Combine Results
Pre-Process Query
Execute Query
Post-Process Query
Pre-Process Query
Keyword Processing Semantic Processing
Keyword Query Pre-Processing
13
• Remove Lucene special characters (except quotes)• Remove (inequality) comparison constraints• Remove non-phrase stopwords
hondas in "excellent condition" in orem for under 12 grand
hondas “excellent condition” orem
Keyword Query Execution and Post-Processing
• Executed by Lucene• Empty Post-Processing step
14
Semantic Query Pre-ProcessingIndividual Ontology Scoring
hondas in "excellent condition" in orem for under 12 grand
15
Semantic Query Pre-ProcessingOntology Set Creation
• For each ontology sorted by score:– For each remaining ontology:• Add point for each new or subsuming match• If added points > 0 add ontology
• Completely subsumed ontologies are removed during query generation
16
Semantic Query Pre-ProcessingOntology Set Creation
17
Price < 12000
LocationVehicle
ContractualServices Location
Vehicle
ContractualServices
Vehicle_Score + 1
US_City=“orem”
Price < 12000
Price < 12000
ContractualServices_Score + 1 Vehicle_Score
US_City=“orem”
Semantic Query Pre-ProcessingStructured Query Generation
• Open world assumption• SPARQL query
18
Semantic Query Execution and Post-Processing
• Sesame query execution• Semantic ranking:– 1 point for each requested projection satisfied– Normalized by # of projections requested
hondas in "excellent condition" in orem for under 12 grand– Projections on Make, Price and US_City
19
Hybrid Query Processing
• Linear interpolation:– (kw_weight * kw_score) + (sm_weight * sm_score)
• Dynamic solution:– # keywords remaining (#kw)– concept match score (cms)
= ½ * (selections + projections)– kw_weight = #kw/(#kw + cms)– sm_weight = cms/(#kw + cms)
20
Basic Search
21
Results Display
22
23
Form Based Search
Results Display
Experimental Setup – Ontology Libraries
• 5 Ontology Levels– Number– Generic Units– Vehicle Units– Vehicle– Vehicle+
25
Experimental Setup – Query Sets
• 113 syntactically unique queries from database students
• 60 syntactically unique queries from linguistic students
26
Experimental Setup – Document Collection
• 250 vehicle advertisements (Craigslist)– 100 training, 50 validation, 100 test
• 318 mountain pages (Wikipedia)• 66 roller coaster (Wikipedia)• 88 video game advertisements (Craigslist)
27
Experiments
1) Training queries over test vehicle documents2) Test queries over test vehicle documents3) Training queries over test vehicle documents +
additional noise4) Test queries over test vehicle documents + additional
noise5) 5 queries over noisy data (Generic Units only)
28
Experiments - Metric
• Mean Average Precision
29
Experimental Results
30
Experimental Results
31
Experimental Results
32
Conclusions
• Hybrid search outperforms keyword and semantic search
• HyKSS’s dynamic query weighting approach outperforms various other weighting techniques
• Using multiple does not outperform selecting and using a single ontology
33
External Image Citations• Slide 2 Google search screenshot: http://www.google.com (07/30/11)• Slide 3 partial car search form screenshots: http://autotrader.com/fyc (07/30/11)• Slide 4 mountain image: http://en.wikipedia.org/wiki/Lhotse (04/26/11)• Slide 4 car image: http://en.wikipedia.org/wiki/Honda (04/26/11)• Slide 4 roller coaster image: http://en.wikipedia.org/wiki/Kingda_Ka (04/26/11)• Slide 4 Wikipedia logo: http://en.wikipedia.org/wiki/Main_Page (04/26/11)• Slide 4 craigslist logo: http://provo.craigslist.org/ (04/26/11)
34