on propagation of deletions and annotations through views

18
On Propagation of Deletions and On Propagation of Deletions and Annotations through Views Annotations through Views Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev Khanna

Upload: mufutau-kramer

Post on 30-Dec-2015

33 views

Category:

Documents


1 download

DESCRIPTION

On Propagation of Deletions and Annotations through Views. Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev Khanna. Data Annotations (share annotations). Knowledge sharing through “annotations” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: On Propagation of Deletions and Annotations through Views

On Propagation of Deletions and On Propagation of Deletions and Annotations through ViewsAnnotations through Views

Wang-Chiew TanUniversity of Pennsylvania

Database Group

Joint work with Peter Buneman and Sanjeev Khanna

Page 2: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 2

Data Annotations (share Data Annotations (share annotations)annotations)

• Knowledge sharing through “annotations”• Annotations on data at various levels of granularity,

annotations on annotations• Improve accuracy of data

– data and annotations can be reviewed by independent parties

• Annotations:– loosely structured

• Source Data:– proprietary– fixed schema

• A system that overlays annotations on existing data• “big business” in scientific databases

Page 3: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 3

Restaurant Cost Type

Peacock Alley

Bull & Bear

PacificaSoho Kitchen & Bar

$$$ French

$$$ Seafood

$ Chinese$ American

Restaurant Cost Type

PacificaSoho Kitchen & Bar

$ Chinese$ American

All Restaurants (View 1) Cheap Restaurants (View 2)

Yummy chicken curry!!

NYRestaurants (Source Table)

Restaurant Cost Type

Peacock Alley

Bull & Bear

PacificaSoho Kitchen & Bar

Zip

$$$ French 10022

$$$ Seafood 10022

$ Chinese 10013$ American10022

Serves fine French Cuisine in elegant setting. Jackets required.

Extensive wine list!

Data Annotations (share Data Annotations (share annotations)annotations)

Page 4: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 4

Data AnnotationsData Annotations

• Communicate “meta data” through annotations– “bounce” or “spread” annotations around by piggybacking

annotations on data items in the source-query-view model.

• An annotation is placed in the view– where do we place the annotation on source?

• Annotation placement problem presented in relational setting– results carry over to fragments of XML (hierarchical model)

Source:RelationalDatabase

View : result of query applied on source

Model:

Not an easy problem!

Query

Page 5: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 5

Location and Propagation RulesLocation and Propagation Rules• A location is a triple: (R, t, A)

A1 A2 A3 A1 A2 A3

A3

A1 A2 A3

A1 A2 A2 A3

A1 A2 A3

A1 A2 A3

A1 A2 A3

A1 A2 A3

R

R

R1 R2

R1

R2

relation name tuple in R A is an attribute in schema of R

• Propagation Rules:

– Select:

– Project:

– Join:

– Union:

Page 6: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 6

Annotation Placement ProblemAnnotation Placement Problem

• Annotation Placement Problem: – Given a view V = Q(S) and an annotation A placed in the

view V, decide if there is an annotation in the source that when propagated to the view, produces no other annotation except A.

• Q = query• S = data source

– “side-effect-free annotation” : an annotation on the source that produces no other annotation except A in the view

S

QV=Q(S)

Page 7: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 7

A Dichotomy TheoremA Dichotomy Theorem

(a) It is NP-hard to decide if there is a side-effect-free annotation for a PJ query.

(b) There is a polynomial time algorithm for queries which do not simultaneously contain a Project and a Join operation.

Theorem:

S

QV=Q(S)

Page 8: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 8

Project and Join QueryProject and Join Query• Intuition: PJ can encode 3SAT

(x1 + x2 + x3) . . . ( x3 + x5 + x2)

x1 x2 x3 C1

C1 Cm

C1 ... Cm

Query OutputQuery:Join, then Project on C1 … Cm

...

C1ddd

T - trueF - false

Assignment tuples:All possible satisfying assignments for C1

C1

C1

FFFTFF

C1FTFC1TTFC1FFTC1FTTC1TTT

Dummy tuple

Assignment tuples:All possible satisfying assignments for Cm

x3 x5 x2 Cm

Cm

Cm

Cmddd

TFFFTF

CmTTFCmFFTCmTFTCmFTTCmTTT

Dummy tuple

. . .

Page 9: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 9

• Intuition: PJ can encode 3SAT(x1 + x2 + x3) … ( x3 + x5 + x2)

Assignment tuples:All possible satisfying assignments for C1

x1 x2 x3 C1

C1

C1

C1

Assignment tuples:All possible satisfying assignments for Cm

ddd

C1 ... Cm

Output

C1 Cm

FFFTFF

C1FTFC1TTFC1FFTC1FTT

x3 x5 x2 Cm

Cm

Cm

Cmddd

TFFFTF

CmTTFCmFFTCmTFTCmFTT

T - trueF - false

C1TTT CmTTT

Dummy tuple

Dummy tuples

C’mddd

C1 ... C’m

...

Query:Join, then Project on C1 … Cm

Project and Join QueryProject and Join Query

Page 10: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 10

Related Work on AnnotationsRelated Work on Annotations

• Superimposed Information (D. Maier, L. Delcambre [WebDB’99])

– data “placed over” existing information eg. bookmark files, schema of a database

• Annotation Systems– Annotea (W3C)

• annotate web pages• location is defined with XPointer

– Multivalent Browser (R. Wilensky, T. A. Phelps. UC Berkeley DL Project)• annotate on PDF files, HTML, etc.• robust locations

– BioDAS (Distributed Annotation Server) (L.Stein et. al )• annotate on genome sequences• notion of location is genome specific

• No one has formally studied annotation placement problem

Page 11: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 11

The classical view deletion problemThe classical view deletion problem

• A view tuple is to be deleted– What changes should be made to the source?

• Many kinds of view-to-source deletion translations– eg. deletion-to-insertion, deletion-to-modification, etc.

• Update Semantics of Relational Views (F. Banchilon, N. Spyratos, [TODS’81])

• On the correct translation of Update Operations on Relational Views (U. Dayal, P. Bernstein, [TODS’82])

• Algorithms for Translating View Updates to Database Updates for Views Involving Selections, Projections and Joins (A. M. Keller, [PODS’85])

– deletion-to-deletion • Run-Time translations of View Tuple Deletions Using Data

Lineage

(Y. Cui, J. Widom, [2001])– exploits lineage information to find “side-effect free” deletions

whenever possible

Page 12: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 12

View Deletion ProblemView Deletion Problem(Deletion-to-deletion translation)(Deletion-to-deletion translation)

• View Deletion Problem (minimize view side-effect):– Given a view V=Q(S) and a tuple t in V, decide if there is a side-

effect free deletion for t

– “side-effect-free deletion” : a set of source tuples whose removal from the database will only remove t from the view

Source:RelationalDatabase

View : result of query applied on source

Query

Page 13: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 13

A Dichotomy TheoremA Dichotomy Theorem

(a) It is NP-hard to decide if there is a side-effect free deletion for a PJ or JU query in normal form.

(b) There is a polynomial time algorithm to find the set of source deletions with minimum side-effects for all other queries, i.e., queries that involve only S,P,U or S,J operators).

• Theorem (a) is true even for a constant size PJ query involving only two relations!

Theorem:

PROJ A,C(R1 JOIN R2)

Page 14: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 14

View Deletion: PJ QueryView Deletion: PJ Query

It is NP-hard to decide if there is a side-effect free deletion for a PJ query in normal form.

A BB C

c2 x2

c2 x4

c2 x5

c3x4 c3x1 c3x3

(x1+x2+x3)(x2+x4+x5)(x4+x1+x3)

R1R2

A Ca ca c1a c

3c2 cc2 c

1c2 c

3

PROJ A,C(R1 JOIN R2)

c1x2 c1x3

c1x1

a x5

a x1a x2a x

3a x4

cx1

cx2

cx3 cx4 cx5

For each xi, decide whether to delete (a,xi) or (xi,c).

Theorem:

Page 15: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 15

Ongoing and Future WorkOngoing and Future Work

• Implementation of annotation system– on RDBMS

• special cases of PJ queries with polynomial time algorithm

– PJ queries that do not project out key information

– on XML– effects on query languages?

Page 16: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 16

Do we need an “annotation-conscious” Do we need an “annotation-conscious” QL?QL?

• The same query in different languages, but different annotation behaviorEmp(Name, Sal, Dept)

[Name:”Joe”, Sal:50K , Dept:”Marketing” ]

Relational Algebra:Emp JOIN Department

SQL:SELECT e.Name, e.Sal, e.Dept, d.ManagerFROM Emp e, Department dWHERE e.Dept = d.Dept

[Name:”Joe”, Sal:50k ] [Name:”Joe”, Sal:50k]

Department(Dept, Manager)[Dept:”Marketing” , Manager:”Jane”]

[Name:”Joe”, Sal:50K , Dept:”Marketing” , Manager:”Jane”]

[Name:”Joe”, Sal:50K , Dept:”Marketing” , Manager:”Jane”]

Q1 = SELECT e.Name, e.Sal FROM Emp e WHERE e.Sal = “50K”

Q2 = SELECT e.Name, “50K” AS Sal FROM Emp e WHERE e.Sal = “50K”

• Equivalent queries in the same language, but different annotation behavior

=a

Page 17: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 17

• Relational algebra seems to suggest a natural set of propagation rules

• SQL seems to suggest another natural propagation rule– one that is based on variable bindings

• Not clear how we extend the semantics of query languages so that annotation propagation is “well-behaved”.

• Should a query language be “annotation-conscious” ?OR• Should the user be allowed to control which annotation

gets propagated to where?

Do we need an “annotation-conscious” Do we need an “annotation-conscious” QL?QL?

Page 18: On Propagation of Deletions and Annotations through Views

Wang-Chiew Tan, Penn Database Group 18

End of Talk