easyquerier: a keyword interface in web database integration system

15
EasyQuerier: A Keyword EasyQuerier: A Keyword Interface in Web Database Interface in Web Database Integration System Integration System Xian Li Xian Li 1 , Weiyi Meng , Weiyi Meng 2 , Xiaofeng , Xiaofeng Meng Meng 1 1 1 WAMDM Lab, RUC WAMDM Lab, RUC & & 2 2 SUNY Binghamton SUNY Binghamton

Upload: pearl

Post on 28-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

EasyQuerier: A Keyword Interface in Web Database Integration System. Xian Li 1 , Weiyi Meng 2 , Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton. Traditional Integrated Interface. Domain list. Manually. Q. Manually. Integrated interface of Job. What does EasyQuerier look like. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EasyQuerier: A Keyword Interface in Web Database Integration System

EasyQuerier: A Keyword Interface in EasyQuerier: A Keyword Interface in Web Database Integration SystemWeb Database Integration System

Xian LiXian Li11, Weiyi Meng, Weiyi Meng22, Xiaofeng Meng, Xiaofeng Meng11

1 1 WAMDM Lab, RUCWAMDM Lab, RUC& & 2 2 SUNY BinghamtonSUNY Binghamton

Page 2: EasyQuerier: A Keyword Interface in Web Database Integration System

Traditional Integrated InterfaceTraditional Integrated Interface

Domain list

Integrated interface of Job

Q

Manually

Manually

Airfare Auto Job seek

Survey of value converge

0

50

100

150

200

250

300

350

0 50 100 150number of sources

nu

mb

er

of

dis

tinct

va

lue

s

airflight title cabin departure citycar brand car class car body stylejob preference industry job title

Attributes and Value number in integrated inteface

35 49 63 42

316 202

705

52

577

342

0

100

200

300

400

500

600

700

800

book airfare auto sales j ob hunting real estate

Domains

Attribute and values

numbers

attributs number max number of provided values

Page 3: EasyQuerier: A Keyword Interface in Web Database Integration System

What does EasyQuerier look like What does EasyQuerier look like

EasyQuerierEasyQuerier

Book

Job

House

……

Integrated interface of Job

Q

Q

Q

Manually

Automatically

Automatically

Page 4: EasyQuerier: A Keyword Interface in Web Database Integration System

New Features of EasyQuerierNew Features of EasyQuerier Automatically domain mappingAutomatically domain mapping

User do not need to select domain from long listUser do not need to select domain from long list More flexible Keyword QueryMore flexible Keyword Query

Different kinds of data typeDifferent kinds of data type Text, numeric, currency, dateText, numeric, currency, date

More logic relation covered More logic relation covered ““and”, “or”, “between…and”and”, “or”, “between…and”

Q1: New York or Washington, education, $2000-$3000 UU11={={New York, Washington}, logic: or}, logic: or UU22={education}={education} UU33={$2000, $3000}, logic: range={$2000, $3000}, logic: range

Automatically query translationAutomatically query translation

Page 5: EasyQuerier: A Keyword Interface in Web Database Integration System

EasyQuerier: overviewEasyQuerier: overview Part 1: Domain MapPart 1: Domain Map

Collect the domain knowledge Collect the domain knowledge from candidate domainsfrom candidate domains

Similarity based domain Similarity based domain mapping strategymapping strategy

Part 2: Query translationPart 2: Query translation Partially Keyword-attribute Partially Keyword-attribute

mapmap Holistically Keyword-attribute Holistically Keyword-attribute

mapmap

Domai n mappi ng

Domai nknowl edge

base

User i nputhi s/her query

Domai n

Domai n

Domai n

Domai n Knowl edgeCol l ector

Know

ledg

e

Query Transl ati on

Sel ected I ntegrated I nterface

Query

Page 6: EasyQuerier: A Keyword Interface in Web Database Integration System

Challenge 1: Domain MappingChallenge 1: Domain Mapping

Problem statementProblem statement Map a user query to the correct domain

automatically without domain information to be separately entered.

Our solutionOur solution Domain representation model Term weight assignment Query-domain similarity

Page 7: EasyQuerier: A Keyword Interface in Web Database Integration System

Domain mapping(1)Domain mapping(1)

Domain representation model D =< d_ID; CT; AT; V T >

d_ID: unique domain identifier. CT = {cti|i=1,2,…} is a set of Conceptual Terms, which descr

ibe the whole domain concept AT =∪A∈D DAL(d_ID, Ai) is a set of Attribute Label Terms co

nsisting of attribute labels of the products in this domain InteLabel, LocalLabel, OtherLabel

VT = ∪A∈D DAV(d_ID, Ai) is a set of the Value Terms associated with the products’ attributes in the domain

Text Attribute: inteValue, LocalValue, Other Value Non-text Attribute: VT can be characterized by the pre-defin

ed ranges available on the integrated interfaces.

Page 8: EasyQuerier: A Keyword Interface in Web Database Integration System

Domain mapping(2)Domain mapping(2)

Different terms have different ability to differentiate the domains. “price” is less powerful than “title” in differenti

ating the book from others Term weight assignmentTerm weight assignment

Adopt idea of CVV, Adopt idea of CVV, used to measure the skew of the distribution of terms across all document databases

IfIfijij means how many means how many times tj appears in either AT or VT in Di CVVj as the CVV for tj

Weight(Di tj) = CVVj * ifij.

Page 9: EasyQuerier: A Keyword Interface in Web Database Integration System

Domain mapping(3)Domain mapping(3)

Q = {uQ = {u11, u, u22, …, u, …, unn}, u}, ui i ={v={vii11, v, vii

22, …}, …} Q1 exampleQ1 example

UU11= {= {New York, Washington}, v}, vii11={New York}, v={New York}, vii

22= = {Washington}{Washington}

For each term tj in VT or ATFor each term tj in VT or AT we only record the most matching term tjwe only record the most matching term tj

==

Page 10: EasyQuerier: A Keyword Interface in Web Database Integration System

Challenge 2: Query translationChallenge 2: Query translation

Problem statementProblem statement Translate the query to the integrated interfaceTranslate the query to the integrated interface Just like filling the integrated interface with a set of keywordsJust like filling the integrated interface with a set of keywords

Computation modelComputation model Def 4.1 (Keyword-Attribute Matching (KAM)). KAM(u,A). Def 4.2 (Degree of Matching (DM)). For each KAM is has a matching

degree. Def 4.3 (Query Translation Solution (QTS)) A QTS represents a stra

tegy of filling in the query interface. A QTS is comprised of several KAMs.

Def 4.4 (Conviction) This measurement determines whether a QTS is reasonable. The larger the DM of a KAM, the more reasonable the KAM is. Such KAMs combined together will generate optimal QTS

Page 11: EasyQuerier: A Keyword Interface in Web Database Integration System

Query translation(1)Query translation(1)

Computation of DMComputation of DM For Q = {uQ = {u11, u, u22, …, u, …, unn}, u}, ui i ={v={vii

11, v, vii22, …}, …} , Sim(vx

i, Aj) is the maximum value of all Sim(vx

i,tj) Where the tj in the VT of Aj , Sim(vx

i,tj) (same as domain map)

Page 12: EasyQuerier: A Keyword Interface in Web Database Integration System

Query translation(2)Query translation(2) ConvictionConviction

Conviction value of a QTS is a weighted sum of the DMs of the related KAMs

Why weight? If an attribute appears in more local interfaces of a domain, it

is more important in the domain. weight w(Aj) for each attribute Aj based on its interface freque

ncy ifi For an attribute within the domain D

Page 13: EasyQuerier: A Keyword Interface in Web Database Integration System

ExperimentExperiment SettingsSettings

9 domains, each covers 50 web databases9 domains, each covers 50 web databases 10 students, 20 keyword queries for each domain10 students, 20 keyword queries for each domain

MeasurementMeasurement Correct/acceptable/wrongCorrect/acceptable/wrong Overall/with domain/with attribute label/value onlyOverall/with domain/with attribute label/value only

Fig1: domain mapping accuracy

Fig2: query translation accuracy

Page 14: EasyQuerier: A Keyword Interface in Web Database Integration System

ConclusionConclusion

In this paper, we proposed a novel keyword based interface system EasyQuerier for ordinary users to query structured data in various Web databases.

We developed solutions to two technical challenges map keyword query to appropriate domains translate the keyword query to a query for the

integrated search interface of the domain

Page 15: EasyQuerier: A Keyword Interface in Web Database Integration System

Thank you~Thank you~