building mashups by examplesurvey of top 50 mashups • divide into five categories based on...

79
1 Building Mashups by Example Rattapoom Tuchinda Doctoral Defense July 22, 2008

Upload: others

Post on 22-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

1

Building Mashups by Example

Rattapoom Tuchinda

Doctoral Defense July 22, 2008

Page 2: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

2

What’s a Mashup?

a) LA crime map c) Ski bonk b) zillow.com

A website or application that combines content from more

than one source into an integrated experience [wikipedia]

Combined Data gives new insight / provides new services

-Crime Report from

different counties

-Map

-Real Estate Listing

-Property Tax

-Weather

-Snow Report

-Snow Resorts

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 3: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

3

Statistics and Trends

Survey of top 50 Mashups

• Divide into five categories

based on programming

structures

• Focus of this thesis is on

the first four categories

which account for 47% of

the most popular Mashups

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 4: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

4

Wrapper Wrapper Data

Retrieval

Clean Clean

Attribute Attribute Calibration

-source modeling

-cleaning

Combine Integration

Customize

Display Display

Mashup Building Issues

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 5: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

5

Wrapper Data

Retrieval

Clean

Attribute Calibration

-source modeling

-cleaning

Customize

Display Display

Type 1: One Simple Source

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 6: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

6

Wrapper Wrapper Data

Retrieval

Clean Clean

Attribute Attribute Calibration

-source modeling

-cleaning

Union Integration

Customize

Display Display

Type 2: Union

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 7: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

7

Wrapper Data

Retrieval

Clean

Attribute Calibration

-source modeling

-cleaning

Customize

Display Display

Type 3: One Source with Form

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 8: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

8

Wrapper Wrapper Data

Retrieval

Clean Clean

Attribute Attribute Calibration

-source modeling

-cleaning

Join Integration

Customize

Display Display

Type 4: Database Join

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 9: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

9

Wrapper Wrapper Data

Retrieval

Clean Clean

Attribute Attribute Calibration

-source modeling

-cleaning

Combine Integration

Customize

Display Display

Type 5: Customized Display

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 10: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

10

Goal: Create Mashups without Programming

• Doesn’t translate to not having to understand programming.

Yahoo’s Pipes

Widget Paradigm

- Widgets (i.e., 43 for Pipes, 300+

for MS) represents an operation

on the data.

- Locating and learning to

customize widget can be time

consuming

- Most tools focus on particular

issues and ignore others.

Can we come up with a framework that addresses all of the

issues while still making the Mashup building process easy?

Existing Approaches

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 11: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

11

Web users can build Mashups effectively using an

integrated framework that lets them solve the

problems of data extraction, source modeling, data

cleaning, and data integration by specifying

examples instead of programming operations.

Thesis Statement

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 12: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

12

Contributions

• A programming by demonstration

approach that uses a single table for

building a Mashup

• An integrated approach that links data

extraction, source modeling, data cleaning,

and data integration together.

• A query formulation technique that allows

users to specify examples to build

complicated queries.

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 13: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

13

Key Ideas

• Focus on data, not operations

– Users are more familiar with data.

• Leverage existing database

– Help source modeling, cleaning, and data integration.

• Consolidate as opposed to Divide-And-Conquer

– Solving a problem in one issue can help solve another issue.

– Interacting within a single spreadsheet platform

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 14: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

14

Embedded Browser Our system: Karma

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 15: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

15

Embedded Browser Our system: Karma

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 16: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

16

Embedded Browser Table

Our system: Karma

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 17: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

17

Embedded Browser Table

Interaction Modes

Our system: Karma

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 18: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

18

{Restaurant name, address, phone,

Review}

{Restaurant name, address, phone, review, Date of Inspection, Score}

Map

Clean

Extract

{Restaurant name, address, Date of

Inspection, Score}

Clean

Extract

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 19: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

19

{Restaurant name, address, phone,

Review}

{Restaurant name, address, phone, review, Date of Inspection, Score}

Map

Clean

Extract

{Restaurant name, address, Date of

Inspection, Score}

Clean

Extract

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 20: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

20

{Restaurant name, address, phone,

Review}

{Restaurant name, address, phone, review, Date of Inspection, Score}

Map

Clean

Extract

Database

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 21: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

21

{Restaurant name, address, phone,

Review}

{Restaurant name, address, phone, review, Date of Inspection, Score}

Map

Clean

Extract

Database

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 22: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

22

{Restaurant name, address, phone,

Review}

{Restaurant name, address, phone, review, Date of Inspection, Score}

Map

Clean

Extract

Database

Introduction Approach Evaluation Related Work Conclusion • • • •

Database contains past

Mashups and data tables

Page 23: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

23

Data Retrieval: Extraction

Tbody/tr[1]/td[2]/a TBODY

tr tr

td td

1. 2.

Japon Bistro

td

a br br

970 E Colora..

Upscale yet affordabl..

td

a br br

8400 Wilshir.

Chic elegance…..

Hokusai

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 24: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

24

Data Retrieval: Extraction

Tbody/tr[1]/td[2]/a TBODY

tr tr

td td

1. 2.

Japon Bistro

td

a br br

970 E Colora..

Upscale yet affordabl..

td

a br br

8400 Wilshir.

Chic elegance…..

Hokusai

Introduction Approach Evaluation Related Work Conclusion • • • •

Tbody/tr*/td*/a

Page 25: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

25

Data Retrieval: Navigation

TBODY tr tr

td td

1. 2. Japon Bistro

td

a br br

970 E Colora.. Upscale yet affordab

td

a br br

8400 Wilshir. Chic elegance…

Hokusai

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 26: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

26

Data Retrieval: Navigation

TBODY tr tr

td td

1. 2. Japon Bistro

td

a br br

970 E Colora.. Upscale yet affordab

td

a br br

8400 Wilshir. Chic elegance…

Hokusai

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 27: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

27

Data Retrieval: Navigation

TBODY tr tr

td td

1. 2. Japon Bistro

td

a br br

970 E Colora.. Upscale yet affordab

td

a br br

8400 Wilshir. Chic elegance…

Hokusai

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 28: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

28

Source Modeling (Attribute selection)

Possible Attribute

restaurant name (3)

artist name (1)

{a |a,s: a att (s) (val(a,s) V)} …

Sushi Sasabune

Hokusai

Japon Bistro

Newly extracted data

.. .. 23 Katana

.. .. 25 Sushi Roku

.. .. 27 Sushi Sasabune

.. .. zagat Rating

restaurant name

Zagat

.. .. .. ..

.. .. French Renoir

.. .. Japanese Hokusai

.. .. nationality artist name

Artist Info

95 .. 927 E.. Japon Bistro

99 .. 8439.. Katana

90 .. 8400.. Hokusai

Health Rating

.. Address restaurant name

LA Health Rating

Introduction Approach Evaluation Related Work Conclusion • • • •

Database

Page 29: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

29

Data Cleaning: using existing values Data repository

95 .. 927 E.. Japon Bistro

99 .. 8439.. Katana

90 .. 8400.. Hokusai

Health Rating

.. Address restaurant name

.. .. 23 Katana

.. .. 25 Sushi Roku

.. .. 27 Sushi Sasabune

.. .. zagat Rating

restaurant name

Zagat

LA Health Rating

Sushi Roka

Sushi Sasabune

Hokusai

Japon Bistro

Newly extracted data

Restaurant name

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 30: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

30

Data Cleaning: using existing values Data repository

95 .. 927 E.. Japon Bistro

99 .. 8439.. Katana

90 .. 8400.. Hokusai

Health Rating

.. Address restaurant name

.. .. 23 Katana

.. .. 25 Sushi Roku

.. .. 27 Sushi Sasabune

.. .. zagat Rating

restaurant name

Zagat

LA Health Rating

Sushi Roka

Sushi Sasabune

Hokusai

Japon Bistro

Newly extracted data

Restaurant name

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 31: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

31

Data Cleaning: using predefined rules

.

.

.

Predefined

Rules

31 Reviews 31

Subset Rule:

(s1s2..sk) (d1d2…dt) (k <= t)

si {d1,d2,…,dt}

di dj

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 32: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

32

Data Cleaning: using predefined rules

.

.

.

Predefined

Rules

31 Reviews 31

Subset Rule:

(s1s2..sk) (d1d2…dt) (k <= t)

si {d1,d2,…,dt}

di dj

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 33: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

33

Data Integration [tuchinda 2007]

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 34: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

34

Data Integration [tuchinda 2007]

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 35: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

35

Data Integration [tuchinda 2007]

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 36: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

36

Data Integration (cont.)

Data repository

95 .. 927 E.. Japon Bistro

99 .. 8439.. Katana

90 .. 8400.. Hokusai

Health Rating

.. Address restaurant name

.. .. 23 Katana

.. .. 25 Sushi Roku

.. .. 27 Sushi Sasabune

.. .. zagat Rating

restaurant name

Zagat

LA Health Rating

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 37: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

37

Data Integration (cont.)

Data repository

95 .. 927 E.. Japon Bistro

99 .. 8439.. Katana

90 .. 8400.. Hokusai

Health Rating

.. Address restaurant name

.. .. 23 Katana

.. .. 25 Sushi Roku

.. .. 27 Sushi Sasabune

.. .. zagat Rating

restaurant name

Zagat

LA Health Rating

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 38: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

38

Data Integration (cont.)

Data repository

95 .. 927 E.. Japon Bistro

99 .. 8439.. Katana

90 .. 8400.. Hokusai

Health Rating

.. Address restaurant name

.. .. 23 Katana

.. .. 25 Sushi Roku

.. .. 27 Sushi Sasabune

.. .. zagat Rating

restaurant name

Zagat

LA Health Rating

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 39: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

39

Data Integration (cont.)

{v} = val(a,s) where a {x}

s is any source where att(s) {x} {}

{a}R = possible new attribute selection for row i.

{x} = Set intersection({a}) over all the value rows.

Data repository

95 .. 927 E.. Japon Bistro

99 .. 8439.. Katana

90 .. 8400.. Hokusai

Health Rating

.. Address restaurant name

.. .. 23 Katana

.. .. 25 Sushi Roku

.. .. 27 Sushi Sasabune

.. .. zagat Rating

restaurant name

Zagat

LA Health Rating

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 40: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

40

Honolulu : {(city, tax_properties),

(city, favorite_vacation_spot)}

Honolulu

v all value

Los Angeles : {(city, tax_properties),

(song name, pop_music)}

Los Angeles

a all attribute

v all values attributeOf(v) {city, song

name} Can we determine

the attribute now? Yes

Single Column Example

{x} = Set intersection({a}) over all the value rows.

{v} = val(a,s) where a {x} s is any source where att(s) {x} {}

City

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 41: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

41

Map Generation

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 42: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

42

Evaluation

• Baseline: A combination of Dapper/Pipes

• Claims:

1. Users with no programming experiences can build all four Mashup types.

2. Karma takes less time to complete each subtask.

3. Overall, the user takes less time to build the same Mashup in Karma compared to Dapper/Pipes

• Users:

– Programmers (20)

– Non-programmers (3)

Introduction Approach Evaluation Related Work Conclusion • • • •

If subjects (programmers) who are familiar with workflow and widgets spend

more time on Dapper/Pipes in general, then the non-programmer subjects would spend more time on Dapper/Pipes as well if they were to learn how to use

those systems.

Page 43: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

43

Evaluation: Setup

Introduction Approach Evaluation Related Work Conclusion • • • •

Familiarization -Programmers

(2 assignments on

DP)

-Review Package

-30 minutes tutorial

Practice -2-3 tasks using

Karma

Test (3 tasks) -Programmers: Alternating

between Karma vs. DP for

each task

-Non Programmers: use only

Karma

-Screen are recorded using

video capture software

Using 5 minutes cut off time

Page 44: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

44

Evaluation: Tasks

Task No. Mashup

Type

Data

Extraction

Source

Modeling

Data

Cleaning

Data

Integration

1 1

(1 source)

Moderate Simple Difficult N/A

2 2,3

(union+form)

Difficult Simple Simple Union

(simple)

3 4

(join 2 sources)

Simple Simple N/A Join

(difficult)

Introduction Approach Evaluation Related Work Conclusion • • • •

Claim 1

Users with no programming experiences can build all four Mashup types.

Page 45: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

45

Evaluation: Tasks

Task No. Mashup

Type

Data

Extraction

Source

Modeling

Data

Cleaning

Data

Integration

1 1

(1 source)

Moderate Simple Difficult N/A

2 2,3

(union+form)

Difficult Simple Simple Union

(simple)

3 4

(join 2 sources)

Simple Simple N/A Join

(difficult)

Introduction Approach Evaluation Related Work Conclusion • • • •

Claim 2

When the Mashup subtask is difficult, Karma takes less time to complete

that subtask.

Page 46: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

46

Evaluation: Tasks

Task No. Mashup

Type

Data

Extraction

Source

Modeling

Data

Cleaning

Data

Integration

1 1

(1 source)

Moderate Simple Difficult N/A

2 2,3

(union+form)

Difficult Simple Simple Union

(simple)

3 4

(join 2 sources)

Simple Simple N/A Join

(difficult)

Introduction Approach Evaluation Related Work Conclusion • • • •

Claim 3

Overall, the user takes less time to build the same Mashup in Karma

compared to Dapper/Pipes

Page 47: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

47

Claim 1: Users with no

programming experiences can build

all four Mashup types

Page 48: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

48

Evaluation: Non-Programmers

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 49: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

49

Claim 2: Karma takes less time to

complete each subtask

Page 50: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

50

Evaluation: Extraction

Introduction Approach Evaluation Related Work Conclusion • • • •

Dapper/Pipes Karma

Karma

(programmer) Dapper/Pipes

Page 51: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

51

Evaluation: Extraction

Introduction Approach Evaluation Related Work Conclusion • • • •

• As the extraction task gets more

difficult, Dapper/Pipes takes - longer

- more subjects failing to complete the task (11% for moderate and 25%

for difficult)

Dapper/Pipes Karma

Page 52: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

52

Evaluation: Source Modeling

Introduction Approach Evaluation Related Work Conclusion • • • •

• Karma performed worse in task 1 and

tasks 2 - only 30 sec difference

-subjects take times selecting attributes

- the saving will be realized in the

data integration step. • Karma performed better in task 3

because of union

Dapper/Pipes Karma

Page 53: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

53

Evaluation: Data Cleaning

Introduction Approach Evaluation Related Work Conclusion • • • •

• Karma performed better in both tasks

• When the cleaning task gets harder,

more subjects are failing in Dapper/Pipes (35% for simple and 83% in hard)

Dapper/Pipes Karma

Page 54: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

54

Evaluation: Data Integration

Introduction Approach Evaluation Related Work Conclusion • • • •

• Because of the table structure,

subjects can specify union indirectly by dropping data into the right cell

• The time spent in source modeling

step allows Karma to suggest the linking

source

• Dapper/Pipes: 30% fail in the union case and 95% fail in the join case

Dapper/Pipes Karma

Page 55: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

55

Claim 3: Overall, the user takes less

time to build the same Mashup in

Karma compared to Dapper/Pipes

Page 56: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

56

Evaluation: Overall

Introduction Approach Evaluation Related Work Conclusion • • • •

Dapper/Pipes Karma

Page 57: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

57

Evaluation: Average

Introduction Approach Evaluation Related Work Conclusion • • • •

2.18x

1.13x

4.16x

6.49x

3.54x

Dapper/Pipes Karma

Page 58: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

58

Related Work: Mashup Systems

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 59: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

59

Related Work: Mashup Systems

Introduction Approach Evaluation Related Work Conclusion • • • •

Require an expert

Page 60: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

60

Related Work: Mashup Systems

Introduction Approach Evaluation Related Work Conclusion • • • •

Require an expert

Early work. Focus on DOM, too basic

Page 61: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

61

Related Work: Mashup Systems

Introduction Approach Evaluation Related Work Conclusion • • • •

Require an expert

Early work. Focus on DOM, too basic RDF / Manually specify data int

Page 62: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

62

Related Work: Mashup Systems

Introduction Approach Evaluation Related Work Conclusion • • • •

Require an expert

Mainly focus on extraction / linear

Early work. Focus on DOM, too basic RDF / Manually specify data int

Page 63: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

63

Related Work: Mashup Systems

Introduction Approach Evaluation Related Work Conclusion • • • •

Require an expert

Mainly focus on extraction / linear

Widgets

Fancier UI/ more widgets Fewer Widgets / Confusion on workflow

Early work. Focus on DOM, too basic RDF / Manually specify data int

Page 64: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

64

Related Work: Mashup Systems

Introduction Approach Evaluation Related Work Conclusion • • • •

Require an expert

Mainly focus on extraction / linear

Widgets

Fancier UI/ more widgets Fewer Widgets / Confusion on workflow

Early work. Focus on DOM, too basic

Create points on Map

RDF / Manually specify data int

Page 65: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

65

Related Work: Mashup Systems

Introduction Approach Evaluation Related Work Conclusion • • • •

Require an expert

Mainly focus on extraction / linear

Q/A approach / linear / scalability

Widgets

Fancier UI/ more widgets Fewer Widgets / Confusion on workflow

Early work. Focus on DOM, too basic

Create points on Map

RDF / Manually specify data int

Page 66: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

66

Related Work: Mashup Systems

Introduction Approach Evaluation Related Work Conclusion • • • •

Require an expert

Mainly focus on extraction / linear

Q/A approach / linear / scalability

Widgets

Fancier UI/ more widgets Fewer Widgets / Confusion on workflow

Early work. Focus on DOM, too basic

Create points on Map

RDF / Manually specify data int

Tuple = card. Drawing links for relations

Page 67: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

67

• Automatic extraction: table and lists only – RoadRunner (exploit HTML structure) [Crescenzi et al., 2001]

– Adel (grammer induction to detect rows) [Lerman+ 2001]

– VisualWeb (OCR technique to detect tables) [Gatterbauer+ 2007]

• Semi-Automatic: require more label examples – WIEN (inductive – less expressive than stalker) [Kushmerick 1997]

– Stalker (Cotesting) [Muslea+ 1999]

– SoftMealy (finite state transducer) [Hsu 1998]

– WHISK (rigid format, exact delimiter) [Soderland 1998]

• DOM: rely on well-formed HTML and less labeling

– Simile [Huynh+ 2005]

– Dapper

– Interactive Wrapper Generation (ML + prediction on DOM) [Irmak+ 2006]

– PLOW (add natural language) [Allen+ 2007]

– Cards [Dontcheva+ 2007]

– Karma [Tuchinda+ 2008]

Related Work: Data Extraction

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 68: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

68

• 1:1 mapping, N:M mapping – Schema-level match

• TranScm [Milo+ 98]

• DIKE [Palopoli+ 99]

• Artemis [Castano+ 01]

• Delta [Clifton+ 97]

– +Instance-based matcher

• SemInt [Li 00]

• LSD [Doan 01]

• ILA [Etzioni 95]

• iMapp [Dhamanka 04]

• Clio (interactive) [Ling 01]

• Inducing Source Description [Carman 07]

• Karma leverages existing techniques to narrow candidate matches

– String Similarities [Cohen+ 2003]

Related Work: Source Modeling

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 69: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

69

• Commercial Tools: Focus on writing transformation – ACR/Data, Migration Architect [Chaudhuri+ 1997]

• Discrepancy Detection: Use as a stepping stone for record linkage and cleaning system – Levenshtein distance [Needleman+ 70]

– Vector based [Baeza-Yates+ 99]

– EM [Ristad+ 98]

– SVM [Bilenko+ 03]

• Record linkage & cleaning systems: Focus on ranking [Winkler 06]

– Fuzzy Match [Chaudhuri+ 03]

– Apollo [Michalowski+ 05]

– Phoebus [Michelson+ 07]

– Potter’s wheel [Raman+ 01]

• Karma

– Gained reference sources through source modeling process

– Provided predefined transformations

Related Work: Data Cleaning

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 70: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

70

• Universal Relation: Make it easier to formulate the query but users still need to formulate the query [Ullman 1980, 1988]

• Query by example: Need to know which data sources to use and the query may not return results

– QBE [Zloof 1975]

• Retrieval by formulation: Need to understand domain model to formulate partial description

– Helgon [Fischer 1989]

– RABBIT [Williams 1982]

• Graphical Query Language: Users still need to navigate through sources (graphs)

– Gql [Benzi 1998, Haw 1994, Papantonakis 1988]

• Question-Answering Technique: Understanding about database operations required.

– Agent Wizard [Tuchinda+ 2004]

• Interactive Schema/data integration: Understanding about source schema required

– Clio [Ling 01]

• Karma is based on Programming by Demonstration [Cyper 2001; Lau2001]

Related Work: Data Integration

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 71: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

71

Conclusion

• Mashup is a fast growing area

– Need an efficient way to for casual web users to build it.

Introduction Approach Evaluation Related Work Conclusion • • • •

• Contributions

– A PBD approach that uses a single table for building a Mashup

– An integrated approach that links four Mashup buildling issues.

– A query formulation technique that allows users to specify examples to build complicated queries.

• Evaluated the validity of the Karma approach – Subjects were able to complete Mashup building tasks in

Karma

– The overall improvement is at least 3.5

Page 72: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

72

Future Work

• Customizing display by examples

• Implementing feedback and quality

• Adding planning components to handle

dynamic data.

Introduction Approach Evaluation Related Work Conclusion • • • •

Page 73: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

73

Thank You!

Page 74: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

74

Backup Slides

Page 75: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

75

Document Object Model (DOM)

Source: http://www.w3.org

Page 76: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

76

Vertical Expansion

Page 77: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

77

Page 78: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

78

Page 79: Building Mashups by ExampleSurvey of top 50 Mashups • Divide into five categories based on programming ... video capture software Using 5 minutes cut off time . 44 Evaluation: Tasks

79