Analysis of Indexing Schemesto Support Set Retrieval of
Nested Objects
Yoshiharu Ishikawa
NAIST
Hiroyuki Kitagawa
Univ. of Tsukuba
Oct. 26, 1994
Contents
� Background
� Set Retrieval
� concept of set retrieval
� signature �les as set access facil-ities
� Set Retrieval of Nested Objects
� Set Access Facilities for Nested Ob-jects
� query/update algorithms
� retrieval/storage/update costs
� Cost Analysis
� Summary and Conclusion
1
Background
� Advanced Data Models
Nested relational data modelObject-oriented data models
*E�cient indexing methods arerequired
� Complex Object Handling
� Nested structures
Nested indexPath indexMulti-index
� Set-valued objects
+The need to support set-speci�ccomparison operators(e.g., 3; �)
2
Preliminary Research(SIGMOD'93)
Approach:
� �Signature �les as set access facilities� �
+Set retrieval of non-nested objects
Comparison:
� Two signature �le organizations:
� SSF (sequential signature �le)
� BSSF (bit-sliced signature �le)
� NIX (nested index)
+BSSF is a promising set access facility
3
ADTI'94 Presentation
Target:
� �Set retrieval ofmulti-level nested objects� �
Four candidates of index con�guration:
� IBSSF: BSSF � 1
� INIX: NIX � 1
� IBSSF-NIX: BSSF � 1, NIX � 1
� INIX-NIX: NIX � 2
Comparison:
� Retrieval cost
� Storage cost
� Update cost
4
Query Example (T � Q)
DEPT
TID DNO SALES ITEMINAME
T1 314 pen Target Set (T )pencilink
T2 125 notebook Target Set (T )eraserclipcutter
... ... ...
Query Q1 (T � Q):
select DNOfrom DEPTwhere SALES ITEM � fpen, pencilg
"Query Set (Q)
5
Query Example (T � Q)
DEPT
TID DNO SALES ITEMINAME
T1 314 pen Target Set (T )pencilink
T2 125 notebook Target Set (T )eraserclipcutter
... ... ...
Query Q2 (T � Q):
select DNOfrom DEPTwhere SALES ITEM � fpen, pencil,
ink, cutterg"
Query Set (Q)
6
Set Retrieval Conditions
T : Target SetQ: Query Set
� T � Q: has-subset
T 3 q: has-element
� T � Q: is-subset
� T uQ: has-intersection
� T � Q: is-equal
7
Organization of aSignature File
� Creation of a Set Signature
Set Element Signature
fpen, pencil, inkg (F = 16, m = 3)
pen �! 0001000000000101
pencil �! 1100100000000000
ink �! 0100001010000000
+ bit-OR
Set Signature 1101101010000101
� Logical Structure of a Signature File
Target Set Target Signature TID
fpen, pencil, inkg �! 1101101010000101 T1
fnotebook, eraser, �! 1110110111000010 T2
clip, cutterg... ...
8
BSSF Organization
0
1
01
0
01
0
1 0
01
0
0
0
0
0
01
0
0 0 0 0
.
.
0 0 0 0 0 0
.
.
....
.
.
....
.
.
....
.
.
.
F
OID file
oid1oid2oid3
oidN
.
.
.N1 0
.
0 1 0 1 0 0 0 0
bit-slice files
query signature
9
Query Processing (T � Q)
Query SetElement Signature
fpen, pencilg
pen ! 0001000000000101
pencil ! 1100100000000000
+ bit-OR
Query Signature 1101100000000101
f pen, pencil, inkg ! 1101101010000101 T1 ! Drop
fnotebook, eraser, ! 1110110111000010 T2
clip, cutterg... ...
10
Matching Condition
� T � Q:
query signature ^ target signature= query signature
� T � Q:
query signature ^ target signature= target signature
^: bit-wise AND operation
11
Sample Database
e3
ename
...
Tanaka
proj
ename
...
Yamada
proj
ename
...
Katoh
proj
ename
...
Suzuki
proj
pname
...
dept
P1
pname
...
dept
P2
pname
...
dept
P3
dname
...
items
D1
{a,b}
dname
...
items
D2
{a,c,d}
Emp Proj Depte1
e2
e4
p1
p2
p3
d1
d2
P
Path instance
P = e1:p1:d1:fa; bg
12
Set Access Facilities (1)
� IBSSF
1. make set signature S from fa; bg
2. hS; e1i ! BSSF
ename
...
Suzuki
proj
pname
...
dept
P1 dname D1e1 p1 d1
items {a, b}
Ibssf BSSF
...
� INIXha; e1i ! NIX, hb; e1i ! NIX
ename
...
Suzuki
proj
pname
...
dept
P1 dname D1e1 p1 d1
items {a, b}
NIXInix
...
13
Set Access Facilities (2)
� IBSSF-NIX
1. make set signature S from fa; bg
2. hS; d1i ! BSSF
3. hd1; e1i ! NIX
ename
...
Suzuki
proj
pname
...
dept
P1 dname D1e1 p1 d1
items {a, b}
Ibssf-nixBSSFNIX
� INIX-NIX
1. ha; d1i ! NIX1, hb; d1i ! NIX1
2. hd1; e1i ! NIX2
ename
...
Suzuki
proj
pname
...
dept
P1 dname D1e1 p1 d1
items {a, b}
Inix-nix
NIX2NIX1
...
14
Query/DeletionAlgorithms
Assumption:
Backward references do not
exist.
Example (IBSSF, T � Q, T � Q):
1. BSSF is searched! OID set of Emp objects
2. For each OID, a forward traversal isperformed! Dept objects are retrieved.
3. Each Dept object is examined. Ifit satisfy the condition, the corre-sponding Emp object is returned asa query result.
15
Cost Formulas for BSSFand NIX (Subsection 4.1)
BSSF retrieval cost:
RCBSSFfcg(N) (1)
BSSF storage cost:
SCBSSF(N) (5)
BSSF update cost:
ICBSSF (6)
DCBSSF(N) (7)
NIX search cost for a key value
rc(x; y)
NIX update cost:
ICNIX(x; y) (8)
DCNIX(x; y)
16
Retrieval/Storage Costs(Subsection 4.2, 4.3)
RCfIBSSF; cg (13)
RCfINIX; T � Qg (14)
RCfINIX; T � Qg (15)
RCfIBSSF-NIX; cg (16)
RCfINIX-NIX; T � Qg (17)
RCfINIX-NIX; T � Qg (18)
SCfIBSSFg (19)
SCfINIXg (20)
SCfIBSSF-NIXg (21)
SCfINIX-NIXg (22)
17
Insertion/Deletion Costs(Subsection 4.4)
ICfIBSSFg (23)
ICfINIXg (24)
ICfIBSSF-NIXg (25)
ICfINIX-NIXg (26)
DCfIBSSFg (27)
DCfINIXg (28)
DCfIBSSF-NIXg (29)
DCfINIX-NIXg (30)
18
Cost Analysis
Parameter settings:
� Ni (no. of objects in class Ci):
N1 = N2 = � � � = Nn = 30; 000
�Dt (cardinality of target sets)
Dt = 10 or Dt = 100
� n (path length)
n = 2; 3; 4
� F (signature size in bits)
F = 500 (Dt = 10)F = 5000 (Dt = 100)
�m (weight of element signature)
m = 2
19
Retrieval Cost (Fig. 4)(T � Q, Dt = 10, n= 3)
1
10
100
1000
1 2 3 4 5
page
s
Dq
has-subset (has-subset, Dt = 10, n = 3)
IbssfInix
Ibssf-nixInix-nix
�When Dq = 1
INIX < IBSSF-NIX; INIX-NIX < IBSSF
�When Dq � 2
IBSSF; IBSSF-NIX < INIX; INIX-NIX
20
Retrieval Cost (Fig. 5)(T � Q, Dt = 100, n= 3)
1
10
100
1000
10000
1 2 3 4 5
page
s
Dq
has-subset (has-subset, Dt = 100, n = 3)
IbssfInix
Ibssf-nixInix-nix
21
Retrieval Cost (Fig. 6)(T � Q, Dt = 10, n= 3)
100
1000
10000
10 20 30 40 50 60 70 80 90 100
page
s
Dq
has-subset (is-subset, Dt = 10, n = 3)
IbssfInix
Ibssf-nixInix-nix
� For the most of Dq values
IBSSF; IBSSF-NIX < INIX; INIX-NIX
�We can further improve the cost ofIBSSF and IBSSF-NIX by our smart re-
trieval strategy (reference [8]).
22
Retrieval Cost for LargeDq Values(T � Q, Dt = 10, n= 3)
1
10
100
1000
10000
100000
100 200 300 400 500 600 700 800 900 1000
page
s
Dq
has-subset (is-subset, Dt = 10, n = 3)
IbssfInix
Ibssf-nixInix-nix
� The cost of IBSSF drastically increasesfor large Dq values.
� The cost of IBSSF-NIX constantly de-creases.
23
Retrieval Cost (Fig. 7)(T � Q, Dt = 100, n= 3)
1000
10000
100000
100 200 300 400 500 600 700 800 900 1000
page
s
Dq
has-subset (is-subset, Dt = 100, n = 3)
IbssfInix
Ibssf-nixInix-nix
24
Storage Costs
Dt = 10Index Storage Cost (pages)IBSSF 559INIX 629IBSSF-NIX 693INIX-NIX 763
Dt = 100Index Storage Cost (pages)IBSSF 5059INIX 10047IBSSF-NIX 5193INIX-NIX 10181
25
Insertion Costs
Dt = 10Index Insertion Cost (pages)IBSSF 41INIX 40IBSSF-NIX 44INIX-NIX 43
Dt = 100Index Insertion Cost (pages)IBSSF 394INIX 400IBSSF-NIX 398INIX-NIX 403
26
Deletion Costs
Dt = 10Index Deletion Cost (pages)IBSSF 544INIX 37IBSSF-NIX 50INIX-NIX 43
Dt = 100Index Deletion Cost (pages)IBSSF 5395INIX 307IBSSF-NIX 423INIX-NIX 403
27
Summary
T � Q query:
� Four access facilities have similar per-formance except for Dq = 1.
�When Dq = 1, IBSSF is the worst andINIX is the best.
T � Q query:
� IBSSF and IBSSF-NIX show relatively sta-ble performance and are better thanINIX and INIX-NIX.
� For very large Dq values, IBSSF-NIX issuperior to IBSSF.
Storage costs:
IBSSF; IBSSF-NIX � INIX; INIX-NIX
Update costs:
� Four set access facilities are almostin the same order except the dele-tion cost of IBSSF.
28
Conclusion and ResearchIssues
Conclusion:
� Under the our parameter settings,IBSSF-NIX is a better indexing methodbecause of its stable performance andsmaller storage cost.
� If the case of T � Q and Dq = 1is important, INIX becomes anothercandidate.
Research Issues:
1. Analyses of another types of queries(e.g., T uQ, T � Q)
2. Cost analyses in other con�gurationsof nested objects
29