efficiently incorporating user feedback into information extraction and integration programs

Post on 10-Feb-2016

39 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Efficiently Incorporating User Feedback into Information Extraction and Integration Programs. Xiaoyong Chai, Ba-Quy Vuong, AnHai Doan, Jeffrey F. Naughton University of Wisconsin-Madison. The Need for Incorporating User Feedback. Panels Chair. Current Approach. Code. Data. …. 3. - PowerPoint PPT Presentation

TRANSCRIPT

Xiaoyong Chai, Ba-Quy Vuong, AnHai Doan, Jeffrey F. NaughtonUniversity of Wisconsin-Madison

Efficiently Incorporating User Feedback Efficiently Incorporating User Feedback into Information Extraction and into Information Extraction and

Integration ProgramsIntegration Programs

The Need for Incorporating User FeedbackThe Need for Incorporating User Feedback

Panels Chair

3

Current ApproachCurrent Approach

Code

Data

4

This Is Not Just For DBLifeThis Is Not Just For DBLife A growing number of applications use IE and II

– Avatar@IBM Almaden– AliBaba@Humboldt Univ. of Berlin– YAGO@MPI– Kylin@Univ. of Washington– …

A systematic user-feedback solution could significantlybenefit them

5

What User Feedback To Incorporate?What User Feedback To Incorporate?

Types of User Feedback

Flagging an Error Fixing an Error

Editing Data Editing Code

Input IntermediateResults

Output

6

ChallengesChallenges

How to expose program data for user feedback?

How to incorporate user feedback?

How to efficiently execute a program?

7

Exposing Program Data for User FeedbackExposing Program Data for User Feedback

dataSources

services Views User Interfaces

extractConf

crawl

extractNames

findRoles

…09/01/2008http://.../cidr09/

dateurl

Joe Hellersteinname

PC ChairCIDR 2009roleconf

… … …

name pagerole… … …

url… Form

Spreadsheet

Wikiname conf role… … …

name role page… … …

roles

Extracting conference services

8

Writing User-Feedback RulesWriting User-Feedback Rulesto Expose Program Datato Expose Program Data

Write extraction program, e.g., in xlog [Shen et al, 07]

R6: dataSourcesForUserFeedback(url) : dataSources(url, date), date >= “01/01/2009”R7: rolesForUserFeedback(pos, page#no-edit)#spreadsheet-UI : roles(role, page)R8: servicesForUserFeedback(name, conf, role)#wiki-UI : services(name, conf, role)

Write user-feedback rules to specify views and user interfaces#form-UI

R1: pages(page) : dataSources(url, date), crawl(url, page)

R3: names(name, page) : pages(page), extractNames(page, name)

R2: conferences(conf, page): pages(page), extractConf(page, conf)

R5: services(name, conf, role) : conferences(conf, page), roles(name, role, page)

R4: roles(name, role, page) : names(name, page), findRoles(name, page, role)

9

Program SemanticsProgram SemanticsViews

url…

name conf role… … …

name role page… … …extractConf

crawl

extractNames

findRoles

dataSources…

09/01/2008http://.../cidr09/dateurl

services

Joe Hellersteinname

PC ChairCIDR 2009roleconf

… … …

name pagerole… … …

roles

User Interfaces

Form

Spreadsheet

Wiki

10

Incorporating Previous User FeedbackIncorporating Previous User Feedback

I

O t t’

p

Interpretation: for operator p, if t is in the output, change t into t’

nameA. Smith

A. Jones

pagep1

… D.Smith, A. Jones, ...

nameA. Smith

pagep2

Dr. A. Smith is ...… …

Change “A. Smith” to “D. Smith”

extractNames extractNames

O’

I

O

p

11

Interpreting User Feedback Based On Interpreting User Feedback Based On Tuple ProvenanceTuple Provenance

Provenance of output tuple t :– the set of input tuples that operator p used to produce t

nameA. SmithA. Jones

pagep1

extractNames

p1p1

Change “A. Smith” to “D. Smith”

If the operator produces {“A. Smith”, “A. Jones”} from {p1},

then replace {“A. Smith”, “A. Jones”} with {“D. Smith”, “A. Jones”}

p1p2

page

extractNames

p1p1p2

nameA. SmithA. JonesA. Smith

12

ChallengesChallenges

How to expose program data for user feedback?

How to incorporate user feedback?

How to efficiently execute a program?– Incremental execution– Improved concurrency control

13

Incrementally Executing the Program Incrementally Executing the Program

?

p2p1

page

…name

extractNames

p2p1

page

extractNames

p3

Similar problem in incremental view maintenance Incremental-update properties

– Closed-formed insertion– Closed-formed deletion– Input partitionability– Partition correlation– Attribute independence

extractNames(I+I)

extractNames(I)=

extractNames(I)+

14

Concurrently Executing Transactions Concurrently Executing Transactions

dataSources

services

extractConf

crawl

extractNames

findRoles

…09/01/2008http://.../cidr09/

dateurl

Joe Hellersteinname

PC ChairCIDR 2009roleconf

… … …

name pagerole… … …

rolesT2

T1 Locks only the input and output tables of the crawl operator

Table-Locking

Skips executing the join operator after updating the roles table

Operator-Skipping

15

Experiment SetupExperiment Setup Testbed

– A 5-stage DBLife workflow– 13 blackbox operators: 6 IE operators and 3 II operators

Wrote xlog program and user-feedback rules in < 1 hr

Simulated user-feedback transactions– On each stage of the workflow– Each transaction randomly deletes, inserts, or modifies

1/10 of the tuples in a table

16

Incremental-Update Properties are Incremental-Update Properties are Broadly ApplicableBroadly Applicable

Inc. Update Properties DBLife Operators ci cd ip ai pc

Get Data Pages Get People Variations

Get Publication Variations Get Organization Variations

Find People Variations Find Publication Variations

Find Organization Variations Find People Entities

Find Publication Entities Find Organization Entities

Find Related People Find Authorship

Find Related Organizations

17

Incremental Update Incremental Update Reduces Execution TimeReduces Execution Time

18

Table-Locking and Operator-Skipping Table-Locking and Operator-Skipping Improve Concurrency DegreeImprove Concurrency Degree

Increase transaction throughput by 50% and 500%

Reduce transaction response time by 43% and 98%

Min Max Average Graph-locking ~0s 7,584s 3,203s Table-locking 1s 5,485s 1,841s Operator-skipping ~0s 457s 43s

-43%-98%

19

Related WorkRelated Work User feedback in IE and II

– [Doan et al, 01], [Chiticariu et al, 08], [Jeffery et al, 08]– Leveraging user feedback to improve results of individual operations

Provenance– [Woodruff & Stonebraker, 97], [Cui & Widom, 01], [Buneman et al, 01],

[Bohannon et al, 08] ], [Huang et al, 08]

Incremental execution– View maintenance [Blakeley et al, 86], [Griffin & Libkin, 95], [Gupta &

Mumick, 95] – Schema matching [Bernstein et al, 06], IE [Chen et al, 07]

20

Conclusions and Future WorkConclusions and Future Work

Incorporating user feedback into IE and II programsis important

Identify key issues and provide initial solutions:– Write user-feedback rules to expose program data to UIs– Model and incorporate user feedback– Efficiently execute program to process user feedback

Future work:– Handle unreliable user feedback– Propagate user feedback down in the workflow– Conduct user study

top related