easyquerier: a keyword interface in web database integration system
DESCRIPTION
EasyQuerier: A Keyword Interface in Web Database Integration System. Xian Li 1 , Weiyi Meng 2 , Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton. Traditional Integrated Interface. Domain list. Manually. Q. Manually. Integrated interface of Job. What does EasyQuerier look like. - PowerPoint PPT PresentationTRANSCRIPT
EasyQuerier: A Keyword Interface in EasyQuerier: A Keyword Interface in Web Database Integration SystemWeb Database Integration System
Xian LiXian Li11, Weiyi Meng, Weiyi Meng22, Xiaofeng Meng, Xiaofeng Meng11
1 1 WAMDM Lab, RUCWAMDM Lab, RUC& & 2 2 SUNY BinghamtonSUNY Binghamton
Traditional Integrated InterfaceTraditional Integrated Interface
Domain list
Integrated interface of Job
Q
Manually
Manually
Airfare Auto Job seek
Survey of value converge
0
50
100
150
200
250
300
350
0 50 100 150number of sources
nu
mb
er
of
dis
tinct
va
lue
s
airflight title cabin departure citycar brand car class car body stylejob preference industry job title
Attributes and Value number in integrated inteface
35 49 63 42
316 202
705
52
577
342
0
100
200
300
400
500
600
700
800
book airfare auto sales j ob hunting real estate
Domains
Attribute and values
numbers
attributs number max number of provided values
What does EasyQuerier look like What does EasyQuerier look like
EasyQuerierEasyQuerier
Book
Job
House
……
Integrated interface of Job
Q
Q
Q
Manually
Automatically
Automatically
New Features of EasyQuerierNew Features of EasyQuerier Automatically domain mappingAutomatically domain mapping
User do not need to select domain from long listUser do not need to select domain from long list More flexible Keyword QueryMore flexible Keyword Query
Different kinds of data typeDifferent kinds of data type Text, numeric, currency, dateText, numeric, currency, date
More logic relation covered More logic relation covered ““and”, “or”, “between…and”and”, “or”, “between…and”
Q1: New York or Washington, education, $2000-$3000 UU11={={New York, Washington}, logic: or}, logic: or UU22={education}={education} UU33={$2000, $3000}, logic: range={$2000, $3000}, logic: range
Automatically query translationAutomatically query translation
EasyQuerier: overviewEasyQuerier: overview Part 1: Domain MapPart 1: Domain Map
Collect the domain knowledge Collect the domain knowledge from candidate domainsfrom candidate domains
Similarity based domain Similarity based domain mapping strategymapping strategy
Part 2: Query translationPart 2: Query translation Partially Keyword-attribute Partially Keyword-attribute
mapmap Holistically Keyword-attribute Holistically Keyword-attribute
mapmap
Domai n mappi ng
Domai nknowl edge
base
User i nputhi s/her query
Domai n
Domai n
Domai n
Domai n Knowl edgeCol l ector
Know
ledg
e
Query Transl ati on
Sel ected I ntegrated I nterface
Query
Challenge 1: Domain MappingChallenge 1: Domain Mapping
Problem statementProblem statement Map a user query to the correct domain
automatically without domain information to be separately entered.
Our solutionOur solution Domain representation model Term weight assignment Query-domain similarity
Domain mapping(1)Domain mapping(1)
Domain representation model D =< d_ID; CT; AT; V T >
d_ID: unique domain identifier. CT = {cti|i=1,2,…} is a set of Conceptual Terms, which descr
ibe the whole domain concept AT =∪A∈D DAL(d_ID, Ai) is a set of Attribute Label Terms co
nsisting of attribute labels of the products in this domain InteLabel, LocalLabel, OtherLabel
VT = ∪A∈D DAV(d_ID, Ai) is a set of the Value Terms associated with the products’ attributes in the domain
Text Attribute: inteValue, LocalValue, Other Value Non-text Attribute: VT can be characterized by the pre-defin
ed ranges available on the integrated interfaces.
Domain mapping(2)Domain mapping(2)
Different terms have different ability to differentiate the domains. “price” is less powerful than “title” in differenti
ating the book from others Term weight assignmentTerm weight assignment
Adopt idea of CVV, Adopt idea of CVV, used to measure the skew of the distribution of terms across all document databases
IfIfijij means how many means how many times tj appears in either AT or VT in Di CVVj as the CVV for tj
Weight(Di tj) = CVVj * ifij.
Domain mapping(3)Domain mapping(3)
Q = {uQ = {u11, u, u22, …, u, …, unn}, u}, ui i ={v={vii11, v, vii
22, …}, …} Q1 exampleQ1 example
UU11= {= {New York, Washington}, v}, vii11={New York}, v={New York}, vii
22= = {Washington}{Washington}
For each term tj in VT or ATFor each term tj in VT or AT we only record the most matching term tjwe only record the most matching term tj
==
Challenge 2: Query translationChallenge 2: Query translation
Problem statementProblem statement Translate the query to the integrated interfaceTranslate the query to the integrated interface Just like filling the integrated interface with a set of keywordsJust like filling the integrated interface with a set of keywords
Computation modelComputation model Def 4.1 (Keyword-Attribute Matching (KAM)). KAM(u,A). Def 4.2 (Degree of Matching (DM)). For each KAM is has a matching
degree. Def 4.3 (Query Translation Solution (QTS)) A QTS represents a stra
tegy of filling in the query interface. A QTS is comprised of several KAMs.
Def 4.4 (Conviction) This measurement determines whether a QTS is reasonable. The larger the DM of a KAM, the more reasonable the KAM is. Such KAMs combined together will generate optimal QTS
Query translation(1)Query translation(1)
Computation of DMComputation of DM For Q = {uQ = {u11, u, u22, …, u, …, unn}, u}, ui i ={v={vii
11, v, vii22, …}, …} , Sim(vx
i, Aj) is the maximum value of all Sim(vx
i,tj) Where the tj in the VT of Aj , Sim(vx
i,tj) (same as domain map)
Query translation(2)Query translation(2) ConvictionConviction
Conviction value of a QTS is a weighted sum of the DMs of the related KAMs
Why weight? If an attribute appears in more local interfaces of a domain, it
is more important in the domain. weight w(Aj) for each attribute Aj based on its interface freque
ncy ifi For an attribute within the domain D
ExperimentExperiment SettingsSettings
9 domains, each covers 50 web databases9 domains, each covers 50 web databases 10 students, 20 keyword queries for each domain10 students, 20 keyword queries for each domain
MeasurementMeasurement Correct/acceptable/wrongCorrect/acceptable/wrong Overall/with domain/with attribute label/value onlyOverall/with domain/with attribute label/value only
Fig1: domain mapping accuracy
Fig2: query translation accuracy
ConclusionConclusion
In this paper, we proposed a novel keyword based interface system EasyQuerier for ordinary users to query structured data in various Web databases.
We developed solutions to two technical challenges map keyword query to appropriate domains translate the keyword query to a query for the
integrated search interface of the domain
Thank you~Thank you~