A Dichotomy in the Complexity of Deletion A Dichotomy in the Complexity of Deletion Propagation with Functional DependenciesPropagation with Functional Dependencies
2012 ACM SIGMOD/PODS Conference2012 ACM SIGMOD/PODS ConferenceScottsdale, Arizona, USAScottsdale, Arizona, USA
PODS 2012PODS 2012
Benny Kimelfeld
IBM Research – Almaden
This Work!This Work!This Work!This Work!
Deletion PropagationDeletion Propagation
• Translate a tuple deletion on the view back to the source relations … properly
• Classic database problem– Specializing the more general view-update problem– [Dayal & Bernstein 1982; Cosmadakis & Papadimitriou 1984; Keller 1986; Cui &
Widom 2001; Buneman & Khanna & Tan 2002; Cong & Fan & Geerts 2006; …]
• Renewed motivation: debug/causality for false positives [K, Vondrak, Williams, 2011]
• Various definitions of “properly” were studied– Minimize the view side effect
• # view tuples lost except the intentional one
– Minimize the source side effect• # source tuples to delete• = maximal “responsibility” for an answer [Meliou et al., 2010]
Example: File AccessExample: File Access
GroupFile
group file
ai a.txt
ai b.txt
db a. txt
db b.txt
os a.txt
UserGroup
user group
Emma ai
Emma db
Olivia os
Olivia db
Jacob ai
Access(u,f) :– UserGroup(u,g), GroupFile(g,f)
Delete source rows, s.t. Emma won’t access a.txt.But, maintain maximum access permissions!
[Cui & Widom 2001; Buneman et al. 2002]
Access
user file
Emma a.txt
Emma b.txt
Olivia a.txt
Olivia b.txt
Jacob a.txt
Jacob b.txt
= ⋈
Example: File AccessExample: File Access
Access(u,f) :– UserGroup(u,g), GroupFile(g,f)
Delete source rows, s.t. Emma won’t access a.txt.But, maintain maximum access permissions!
= ⋈
GroupFile
group file
ai a.txt
ai b.txt
db a. txt
db b.txt
os a.txt
UserGroup
user group
Emma ai
Emma db
Olivia os
Olivia db
Jacob ai
Access
user file
Emma a.txt
Emma b.txt
Olivia a.txt
Olivia b.txt
Jacob a.txt
Jacob b.txt
[Cui & Widom 2001; Buneman et al. 2002]
Example: File AccessExample: File Access
Delete source rows, s.t. Emma won’t access a.txt.But, maintain maximum access permissions!
GroupFile
group file
ai a.txt
ai b.txt
db a. txt
db b.txt
os a.txt
UserGroup
user group
Emma ai
Emma db
Olivia os
Olivia db
Jacob ai
Access
user file
Emma a.txt
Emma b.txt
Olivia a.txt
Olivia b.txt
Jacob a.txt
Jacob b.txt
Access(u,f) :– UserGroup(u,g), GroupFile(g,f)
= ⋈side-effect side-effect
freefree(& minimal side (& minimal side effect)effect)
side-effect side-effect
freefree(& minimal side (& minimal side effect)effect)
[Cui & Widom 2001; Buneman et al. 2002]
Formal DefinitionsFormal DefinitionsSchema S: rel. symbols + functional dependencies (fd)
R1,….,Rm Ri: attribute-set → attribute
Conjunctive Query (CQ) Q:
head variables existential variables
Q( y1 , y2 , y3 ) :– R1(x1 , y1), R2(x1
,'ibm'), R3(x2 , y1 , y2
, x3), R4(x4 , y3)
Solution: E ⊆ D s.t. a ∉ Q(E)
• Side-effect free: Q(E) = Q(D) – {a}
• Optimal: |Q(E)| is maximal
Input:
• DB D over S• Answer a ∈ Q(D)
to delete
No self joins!atom
Complexity QuestionsComplexity Questions
What is the complexity of
• Deciding if a side-effect-free solution exists?
• Finding an optimal solution?– Or one w/ approximately minimal side effect?– Or one w/ approximately maximal # surviving answers?
• Not the same [K, Vondrák, Williams, 2011]
Data complexity:
Fixed:Fixed: Schema S, CQ Q
Input:Input: DB D over S, answer a ∊ Q(D) to delete
Unirelation Algorithm (Unirelation Algorithm (1Rel1Rel): Example): Example
Delete a = (Emma, a.txt)
= ⋈
GroupFile
group file
ai a.txt
ai b.txt
db a. txt
db b.txt
os a.txt
UserGroup
user group
Emma ai
Emma db
Olivia os
Olivia db
Jacob ai
Access
user file
Emma a.txt
Emma b.txt
Olivia a.txt
Olivia b.txt
Jacob a.txt
Jacob b.txt
Access(u,f) :– UserGroup(u,g), GroupFile(g,f)
[Buneman et al., 2002]
Unirelation Algorithm (Unirelation Algorithm (1Rel1Rel): Example): Example
Recall: there is even better solution (side-effect free)
better than previous ⇒ selected solutionDelete a = (Emma, a.txt)
GroupFile
group file
ai a.txt
ai b.txt
db a. txt
db b.txt
os a.txt
UserGroup
user group
Emma ai
Emma db
Olivia os
Olivia db
Jacob ai
Access
user file
Emma a.txt
Emma b.txt
Olivia a.txt
Olivia b.txt
Jacob a.txt
Jacob b.txt
Access(u,f) :– UserGroup(u,g), GroupFile(g,f)
⋈
[Buneman et al., 2002]
=
1Rel1Rel: General Case: General Case
undesired a ∈ Q(D)
R1 R2 Rk
select bestselect best
(i=1,…,k) solution i:
delete from Ri each tuple consistent w/ a
solution 1solution 1
solution 2solution 2
solutionsolution kk
Q has k atoms…
…
R1 R2 Rk
…
R1 R2 Rk
…
D
D
D
Head Domination Head Domination [K, Vondrák, Williams, 2011][K, Vondrák, Williams, 2011]
Q: A CQ over a schema S
G∃[Q]:nodes = atoms(Q)edges = “sharing ≥1 existential var.”
head domination:∀ C ∊ CC(G∃[Q]) ∃∊ atoms(Q) s.t.,headVars(C) ⊆ vars()
Connected Components
Q( y1 , y2 , y3) :– R1(x1 , y1) , R2(x1
, y2) , R3(y1 , y2) , R4(x2 , y2
, y3)
Q( y1 , y2) :– R1(x , y1) , R2(x , y2)
Q( y1 , y2) :– R1(x1 , y1) , R2(x1
, y2) , R3(x1 , y1 , y2)
Access(u,f)
Previous Dichotomy Theorem [KVW 2011] Previous Dichotomy Theorem [KVW 2011]
Let Q be a CQ over a schema S(no self joins)
[K, Vondrak, Williams, 2011], no FDs:
Q has head domination
⇒ 1Rel returns an optimal solution (in PTime)
otherwise ⇒∃side-effect-free is NP-complete; NP-hard to find an (αQ-approx.) optimal solution
Q( y1 , y2 , y3) :– R1(x1 , y1) , R2(x1
, y2) , R3(y1 , y2) , R4(x2 , y2
, y3)
Q( y1 , y2) :– R1(x , y1) , R2(x , y2)
Q( y1 , y2) :– R1(x1 , y1) , R2(x1
, y2) , R3(x1 , y1 , y2)PTimePTime (1Rel)
PTimePTime (1Rel)
NP-hardNP-hardAccess(u,f)
Access Example RevisitedAccess Example RevisitedDelete (Emma, a.txt)
group ← file
PTimePTime
GroupFile
group file
ai a.txt
ai b.txt
db a. txt
db b.txt
os a.txt
UserGroup
user group
Emma ai
Emma db
Olivia os
Olivia db
Jacob ai
Accessuser file
Emma a.txtEmma b.txtOlivia a.txtOlivia b.txtJacob a.txtJacob b.txt
⋈=
NP-hardNP-hard
Access Example RevisitedAccess Example RevisitedDelete (Emma, a.txt)
user → group
NP-hardNP-hard
PTimePTime
GroupFile
group file
ai a.txt
ai b.txt
db a. txt
db b.txt
os a.txt
UserGroup
user group
Emma ai
Emma db
Olivia os
Olivia db
Jacob ai
Accessuser file
Emma a.txtEmma b.txtOlivia a.txtOlivia b.txtJacob a.txtJacob b.txt
= ⋈
group ← file
PTimePTime
Access Example RevisitedAccess Example RevisitedDelete (Emma, a.txt)NP-hardNP-hard
user → group
PTimePTime group ← file
PTimePTime
GroupFile
group file
ai a.txt
ai b.txt
db a. txt
db b.txt
os a.txt
UserGroup
user group
Emma ai
Emma db
Olivia os
Olivia db
Jacob ai
Accessuser file
Emma a.txtEmma b.txtOlivia a.txtOlivia b.txtJacob a.txtJacob b.txt
user ← group
PTimePTime
⋈=
Access Example RevisitedAccess Example RevisitedDelete (Emma, a.txt)NP-hardNP-hard
user → group
PTimePTime group ← file
PTimePTime
user ← group
PTimePTime group → file
PTimePTime Every nontrivial Every nontrivial set of FDs brings set of FDs brings the problem to the problem to
PTimePTime
GroupFile
group file
ai a.txt
ai b.txt
db a. txt
db b.txt
os a.txt
UserGroup
user group
Emma ai
Emma db
Olivia os
Olivia db
Jacob ai
Accessuser file
Emma a.txtEmma b.txtOlivia a.txtOlivia b.txtJacob a.txtJacob b.txt
⋈=
Additional ExamplesAdditional Examples
Q(y , y1 , y2) :– R1(y1 , x1) , R(x1
, y , x2) , R2(y2 , x2)
Q(y , y1 , y2) :– R1(x1 , y1) , R(x1
, y , x2) , R2(x2 , y2)
Q( y , y1 , y2) :– R1(x1 , y1) , R(x1
, y , x2) , R2(x2 , y2)
PTimePTime
NP-NP-hardhard
NP-NP-hardhard
Dichotomy with FDsDichotomy with FDs
[K, Vondrak, Williams, 2011], no FDs:
Q has head domination
⇒1Rel returns an optimal solution (in PTime)
otherwise ⇒
∃side-effect-free is NP-complete; NP-hard to find an (αQ-approx.) optimal solution
This paper: (FDs)
Q+ has
functional head dom.
⇒1Rel* returns an optimal solution (in PTime)
otherwise ⇒
∃side-effect-free is NP-complete; NP-hard to find an (αQ-approx.) optimal solution
Let Q be a CQ over a schema S(no self joins)
Depending on the CQ and FDs, the problem is either straightforward or
hard!
Remove tuple only if it is used for the undersired answer
FDs Among VariablesFDs Among Variables
Access(u,f) :– UserGroup(u,g), GroupFile(g,f)
FD: group → file
g → fu → g
FD: user → group
u → f {u,g} → f
Definition:
CQ Q over schema S, U, V ⊆ variables(Q)
U → V: ∀ D ∈ db(S) 1, 2 ∈ hom(Q→D)
1=2 on U ⇒ 1=2 on V
The CQ The CQ QQ++
Definition:
CQ Q over schema S, U, V ⊆ variables(Q)
U → V: ∀ D ∈ db(S) 1, 2 ∈ hom(Q→D)
1=2 on U ⇒ 1=2 on V
Q+ : add to Q’s head every x s.t. headVars → x
Access(u,f) :– UserGroup(u,g), GroupFile(g,f)
group ← file
Access+(u,g,f) :– UserGroup(u,g), GroupFile(g,f)
g ← {u,f} ⇒
Tractability Condition: Q+ has functional head domination
Tractability Condition: Q+ has functional head domination
Functional Head DominationFunctional Head Domination
functional head domination:
∀ C∈CC(G∃[Q]) ∃∊ atoms(Q), s.t. vars() → headVars(C)
head domination:∀ C∈CC(G∃[Q]) ∃∊ atoms(Q), s.t. vars()⊇ headVars(C)
Access(u,f) :– UserGroup(u,g), GroupFile(g,f)
group → file{u,g} → {u,f} ⇐
Q: A CQ over a schema S
G∃[Q]:nodes = atoms(Q)edges = “sharing ≥1 existential var.”
Tractability Condition: Q+ has functional head domination
Tractability Condition: Q+ has functional head domination
ExamplesExamplesTractability Condition:
Q+ has functional head domination Tractability Condition:
Q+ has functional head domination
Q(y , y1 , y2) :– R1(x1 , y1) , R(x1
, y , x2) , R2(x2 , y2)
PTimePTime (1Rel*)(1Rel*)
Q+(y , y1 , y2, x2) :– R1(x1 , y1) , R(x1
, y , x2) , R2(x2 , y2)
{y , y1 , y2} → x2
Q( y , y1 , y2) :– R1(x1 , y1) , R(x1
, y , x2) , R2(x2 , y2)
NP-NP-hardhard
Example: Key-Preserving Views Example: Key-Preserving Views
Theorem [Cong, Fan, Geerts, 2006]:
Q preserves keys* ⇒ deletion propagation in PTime
Tractability Condition: Q+ has functional head domination
Tractability Condition: Q+ has functional head domination
* Each relation has a key; none of the key attributes are projected out
Q preserves keys
⇒ Q+ has no existential vars ⇒ G∃[Q+] has no edges
⇒ Q+ trivially has functional head domination (every connected component is a node, dominated by itself…)
⇒ 1Rel* returns an optimal solution
For CQs w/o self joins, follows directly from our positive side:
About the ProofAbout the Proof
• The positive side is fairly simple – … once the tractability condition is found
• The negative side is intricate– Reduction from the special case of the Access CQ
– Challenge: simulating Access(u,f) by an instance that satisfies all the FDs
– Central concept: graph separation on the variable graph of the CQ
Q'(y1 , y2) :– R1(y1 , x1 , x) , R2(x , x2
, y2)
Q(y1 , y2) :– R1(y1 , x) , R2(x , y2)
R3(x1 , x2)→→
Conclusions & Ongoing WorkConclusions & Ongoing Work• Studied deletion propagation in the presence of
functional dependencies
• Established a dichotomy in complexity: – PTime by a straightforward algorithm vs.– Hardness (of approximation)
• Generalizes previously established special cases: no FDs, key-preserving views
• Ongoing work: deletion of multiple answers– Preview: trichotomy
• Straightforward • Hard but approximable (by a constant-factor)• Hard to approximate
Questions?Questions?