analysis indexingishikawa/papers/1994... · 2013. 6. 14. · ssf (sequen tial signature le) bssf...

30

Upload: others

Post on 11-Nov-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Analysis of Indexing Schemesto Support Set Retrieval of

Nested Objects

Yoshiharu Ishikawa

NAIST

Hiroyuki Kitagawa

Univ. of Tsukuba

Oct. 26, 1994

Page 2: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Contents

� Background

� Set Retrieval

� concept of set retrieval

� signature �les as set access facil-ities

� Set Retrieval of Nested Objects

� Set Access Facilities for Nested Ob-jects

� query/update algorithms

� retrieval/storage/update costs

� Cost Analysis

� Summary and Conclusion

1

Page 3: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Background

� Advanced Data Models

Nested relational data modelObject-oriented data models

*E�cient indexing methods arerequired

� Complex Object Handling

� Nested structures

Nested indexPath indexMulti-index

� Set-valued objects

+The need to support set-speci�ccomparison operators(e.g., 3; �)

2

Page 4: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Preliminary Research(SIGMOD'93)

Approach:

� �Signature �les as set access facilities� �

+Set retrieval of non-nested objects

Comparison:

� Two signature �le organizations:

� SSF (sequential signature �le)

� BSSF (bit-sliced signature �le)

� NIX (nested index)

+BSSF is a promising set access facility

3

Page 5: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

ADTI'94 Presentation

Target:

� �Set retrieval ofmulti-level nested objects� �

Four candidates of index con�guration:

� IBSSF: BSSF � 1

� INIX: NIX � 1

� IBSSF-NIX: BSSF � 1, NIX � 1

� INIX-NIX: NIX � 2

Comparison:

� Retrieval cost

� Storage cost

� Update cost

4

Page 6: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Query Example (T � Q)

DEPT

TID DNO SALES ITEMINAME

T1 314 pen Target Set (T )pencilink

T2 125 notebook Target Set (T )eraserclipcutter

... ... ...

Query Q1 (T � Q):

select DNOfrom DEPTwhere SALES ITEM � fpen, pencilg

"Query Set (Q)

5

Page 7: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Query Example (T � Q)

DEPT

TID DNO SALES ITEMINAME

T1 314 pen Target Set (T )pencilink

T2 125 notebook Target Set (T )eraserclipcutter

... ... ...

Query Q2 (T � Q):

select DNOfrom DEPTwhere SALES ITEM � fpen, pencil,

ink, cutterg"

Query Set (Q)

6

Page 8: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Set Retrieval Conditions

T : Target SetQ: Query Set

� T � Q: has-subset

T 3 q: has-element

� T � Q: is-subset

� T uQ: has-intersection

� T � Q: is-equal

7

Page 9: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Organization of aSignature File

� Creation of a Set Signature

Set Element Signature

fpen, pencil, inkg (F = 16, m = 3)

pen �! 0001000000000101

pencil �! 1100100000000000

ink �! 0100001010000000

+ bit-OR

Set Signature 1101101010000101

� Logical Structure of a Signature File

Target Set Target Signature TID

fpen, pencil, inkg �! 1101101010000101 T1

fnotebook, eraser, �! 1110110111000010 T2

clip, cutterg... ...

8

Page 10: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

BSSF Organization

0

1

01

0

01

0

1 0

01

0

0

0

0

0

01

0

0 0 0 0

.

.

0 0 0 0 0 0

.

.

....

.

.

....

.

.

....

.

.

.

F

OID file

oid1oid2oid3

oidN

.

.

.N1 0

.

0 1 0 1 0 0 0 0

bit-slice files

query signature

9

Page 11: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Query Processing (T � Q)

Query SetElement Signature

fpen, pencilg

pen ! 0001000000000101

pencil ! 1100100000000000

+ bit-OR

Query Signature 1101100000000101

f pen, pencil, inkg ! 1101101010000101 T1 ! Drop

fnotebook, eraser, ! 1110110111000010 T2

clip, cutterg... ...

10

Page 12: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Matching Condition

� T � Q:

query signature ^ target signature= query signature

� T � Q:

query signature ^ target signature= target signature

^: bit-wise AND operation

11

Page 13: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Sample Database

e3

ename

...

Tanaka

proj

ename

...

Yamada

proj

ename

...

Katoh

proj

ename

...

Suzuki

proj

pname

...

dept

P1

pname

...

dept

P2

pname

...

dept

P3

dname

...

items

D1

{a,b}

dname

...

items

D2

{a,c,d}

Emp Proj Depte1

e2

e4

p1

p2

p3

d1

d2

P

Path instance

P = e1:p1:d1:fa; bg

12

Page 14: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Set Access Facilities (1)

� IBSSF

1. make set signature S from fa; bg

2. hS; e1i ! BSSF

ename

...

Suzuki

proj

pname

...

dept

P1 dname D1e1 p1 d1

items {a, b}

Ibssf BSSF

...

� INIXha; e1i ! NIX, hb; e1i ! NIX

ename

...

Suzuki

proj

pname

...

dept

P1 dname D1e1 p1 d1

items {a, b}

NIXInix

...

13

Page 15: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Set Access Facilities (2)

� IBSSF-NIX

1. make set signature S from fa; bg

2. hS; d1i ! BSSF

3. hd1; e1i ! NIX

ename

...

Suzuki

proj

pname

...

dept

P1 dname D1e1 p1 d1

items {a, b}

Ibssf-nixBSSFNIX

� INIX-NIX

1. ha; d1i ! NIX1, hb; d1i ! NIX1

2. hd1; e1i ! NIX2

ename

...

Suzuki

proj

pname

...

dept

P1 dname D1e1 p1 d1

items {a, b}

Inix-nix

NIX2NIX1

...

14

Page 16: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Query/DeletionAlgorithms

Assumption:

Backward references do not

exist.

Example (IBSSF, T � Q, T � Q):

1. BSSF is searched! OID set of Emp objects

2. For each OID, a forward traversal isperformed! Dept objects are retrieved.

3. Each Dept object is examined. Ifit satisfy the condition, the corre-sponding Emp object is returned asa query result.

15

Page 17: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Cost Formulas for BSSFand NIX (Subsection 4.1)

BSSF retrieval cost:

RCBSSFfcg(N) (1)

BSSF storage cost:

SCBSSF(N) (5)

BSSF update cost:

ICBSSF (6)

DCBSSF(N) (7)

NIX search cost for a key value

rc(x; y)

NIX update cost:

ICNIX(x; y) (8)

DCNIX(x; y)

16

Page 18: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Retrieval/Storage Costs(Subsection 4.2, 4.3)

RCfIBSSF; cg (13)

RCfINIX; T � Qg (14)

RCfINIX; T � Qg (15)

RCfIBSSF-NIX; cg (16)

RCfINIX-NIX; T � Qg (17)

RCfINIX-NIX; T � Qg (18)

SCfIBSSFg (19)

SCfINIXg (20)

SCfIBSSF-NIXg (21)

SCfINIX-NIXg (22)

17

Page 19: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Insertion/Deletion Costs(Subsection 4.4)

ICfIBSSFg (23)

ICfINIXg (24)

ICfIBSSF-NIXg (25)

ICfINIX-NIXg (26)

DCfIBSSFg (27)

DCfINIXg (28)

DCfIBSSF-NIXg (29)

DCfINIX-NIXg (30)

18

Page 20: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Cost Analysis

Parameter settings:

� Ni (no. of objects in class Ci):

N1 = N2 = � � � = Nn = 30; 000

�Dt (cardinality of target sets)

Dt = 10 or Dt = 100

� n (path length)

n = 2; 3; 4

� F (signature size in bits)

F = 500 (Dt = 10)F = 5000 (Dt = 100)

�m (weight of element signature)

m = 2

19

Page 21: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Retrieval Cost (Fig. 4)(T � Q, Dt = 10, n= 3)

1

10

100

1000

1 2 3 4 5

page

s

Dq

has-subset (has-subset, Dt = 10, n = 3)

IbssfInix

Ibssf-nixInix-nix

�When Dq = 1

INIX < IBSSF-NIX; INIX-NIX < IBSSF

�When Dq � 2

IBSSF; IBSSF-NIX < INIX; INIX-NIX

20

Page 22: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Retrieval Cost (Fig. 5)(T � Q, Dt = 100, n= 3)

1

10

100

1000

10000

1 2 3 4 5

page

s

Dq

has-subset (has-subset, Dt = 100, n = 3)

IbssfInix

Ibssf-nixInix-nix

21

Page 23: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Retrieval Cost (Fig. 6)(T � Q, Dt = 10, n= 3)

100

1000

10000

10 20 30 40 50 60 70 80 90 100

page

s

Dq

has-subset (is-subset, Dt = 10, n = 3)

IbssfInix

Ibssf-nixInix-nix

� For the most of Dq values

IBSSF; IBSSF-NIX < INIX; INIX-NIX

�We can further improve the cost ofIBSSF and IBSSF-NIX by our smart re-

trieval strategy (reference [8]).

22

Page 24: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Retrieval Cost for LargeDq Values(T � Q, Dt = 10, n= 3)

1

10

100

1000

10000

100000

100 200 300 400 500 600 700 800 900 1000

page

s

Dq

has-subset (is-subset, Dt = 10, n = 3)

IbssfInix

Ibssf-nixInix-nix

� The cost of IBSSF drastically increasesfor large Dq values.

� The cost of IBSSF-NIX constantly de-creases.

23

Page 25: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Retrieval Cost (Fig. 7)(T � Q, Dt = 100, n= 3)

1000

10000

100000

100 200 300 400 500 600 700 800 900 1000

page

s

Dq

has-subset (is-subset, Dt = 100, n = 3)

IbssfInix

Ibssf-nixInix-nix

24

Page 26: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Storage Costs

Dt = 10Index Storage Cost (pages)IBSSF 559INIX 629IBSSF-NIX 693INIX-NIX 763

Dt = 100Index Storage Cost (pages)IBSSF 5059INIX 10047IBSSF-NIX 5193INIX-NIX 10181

25

Page 27: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Insertion Costs

Dt = 10Index Insertion Cost (pages)IBSSF 41INIX 40IBSSF-NIX 44INIX-NIX 43

Dt = 100Index Insertion Cost (pages)IBSSF 394INIX 400IBSSF-NIX 398INIX-NIX 403

26

Page 28: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Deletion Costs

Dt = 10Index Deletion Cost (pages)IBSSF 544INIX 37IBSSF-NIX 50INIX-NIX 43

Dt = 100Index Deletion Cost (pages)IBSSF 5395INIX 307IBSSF-NIX 423INIX-NIX 403

27

Page 29: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Summary

T � Q query:

� Four access facilities have similar per-formance except for Dq = 1.

�When Dq = 1, IBSSF is the worst andINIX is the best.

T � Q query:

� IBSSF and IBSSF-NIX show relatively sta-ble performance and are better thanINIX and INIX-NIX.

� For very large Dq values, IBSSF-NIX issuperior to IBSSF.

Storage costs:

IBSSF; IBSSF-NIX � INIX; INIX-NIX

Update costs:

� Four set access facilities are almostin the same order except the dele-tion cost of IBSSF.

28

Page 30: Analysis Indexingishikawa/papers/1994... · 2013. 6. 14. · SSF (sequen tial signature le) BSSF (bit-sliced signature le) ... noteb o ok, eraser,! 1110110111000010 T2 clip, cutter

Conclusion and ResearchIssues

Conclusion:

� Under the our parameter settings,IBSSF-NIX is a better indexing methodbecause of its stable performance andsmaller storage cost.

� If the case of T � Q and Dq = 1is important, INIX becomes anothercandidate.

Research Issues:

1. Analyses of another types of queries(e.g., T uQ, T � Q)

2. Cost analyses in other con�gurationsof nested objects

29