rudolf bayer technische universität münchen b-trees and databases, past & future

24
Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

Upload: audrey-barney

Post on 29-Mar-2015

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

Rudolf BayerTechnische Universität München

B-Trees and Databases, Past & Future

Page 2: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

2

1969 2001 Factormain memory 200 KB 200 MB 103

cache 20 KB 20 MB 103

cache pages 20 5000 <103

disk size 7.5 MB 20 GB 3*103

disk/memory size 40 100 -2.5transfer rate 150 KB/s 15 MB/s 102

random access 50 ms 5 ms 10scanning full disk 130 s 1300 s -10(accessibility)

Computing Technology in 1969 vs 2001

Page 3: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

3

Challenge of Applications in 1969

Space IndustrySupersonic Transport SSTC5ABoeing 747

Manufacturingparts explosion(spare) parts mangement

Commercebank check managementcredit card management

Page 4: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

4

Basics of B-Trees

5 11 16 21

1 2 3 4 12 13 15 17 18 19 20 22 24 256 7 8 10

9

Page 5: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

5

Basics of B-Trees: Insertion

5 11 16 21

1 2 3 4 12 13 15 17 18 19 20 22 24 256 7 8 9 10

Page 6: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

6

Basics of B-Trees: the Split

8 5 11 16 21

1 2 3 4 12 13 15 17 18 19 20 22 24 25 6 7 9 10

Page 7: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

7

Basics of B-Trees: recursive Split

1 2 3 4 12 13 15 17 18 19 20 22 24 256 7 9 10

5 8 11 16 21

Page 8: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

8

Basics of B-Trees: Growth at Root

11

5 8 16 21

1 2 3 4 6 7 9 10 12 13 15 17 18 19 20 22 24 25

Page 9: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

9

Scientific American 1984

Page 10: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

10

Fundamental Properties of B-Trees

Time & I/O Complexity: O(logk n); k > 400for all elementary operations

• find• insert & delete

Storage Utilization ~ 83%Growth: height nodes size 1 1 8 KB

2 400 3.2 MB 3 16*104 1.3 GB 4 64*106 512 GB < 4 logical I/O per operation !!

Page 11: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

11

Independence of DB Size

cached

. . .

Index part 1% of fileremember: since 1969disk sizememory size

const 100

< 2 physical I/O per operation !!

Page 12: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

12

DB-Models in 1969

IMS: hierarchical, commercial successCODASYL: network model,

M. Senko, C. BachmannRelational: E. F. Codd, theory only

Senko, Codd in same department:Information Systems DepartmentIBM Research Lab, San José

Senko Codd Efficiency Simplicity

Page 13: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

13

Relational DB-Model, Ted Codd

Research in 1969, published in 1970 CACM

Relational Algebra = Tables + Operatorstoday: x set operators Codd: lossless restriction by table tie

algebraic laws for query optimization(Codd does not mention this aspect)

Page 14: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

14

2 Languages

• imperative, procedural: algebraic expressions

• declarative, non-procedural: applied predicate calculus, DSL/Alpha (1971)

no implementation of acceptable efficiency in sight!

Page 15: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

15

Hard Questions from 1969-1974

• which model?• which language?• which implementation?

infighting, Codd to Systems Department defer decisions: rel. Storage System RSS

to support all models and languages

1971-1974 Leonard Liu: CS Dept1974 Cargese Workshop, Frank King

Page 16: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

16

Which Language?

DL/I: IMSCODASYL: COBOL + pointer chasing and

currency indicators, ChamberlinRel. Algebra: CoddDSL/Alpha: CoddSQUARE: Chamberlin, et al.SEQUEL: Chamberlin, Boyce, ReisnerQBE: Moshe ZloofRendezvous: Codd

3 survivors: DL/I, SQL, Rel.Algebra

Page 17: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

17

Implementation: System R, IBM

SQL Chamberlin, ReisnerSchemata: normalization, Codd, BoyceRel. Algebra: Codd et al.Optimization: Blasgen, Selinger, EswaranCost Models: Transactions: Gray, TraigerB-Trees: Bayer, Schkolnick, BlasgenRecovery: Lorie, Putzolu

Page 18: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

18

Factors for Product Success

• simple, formalized model• simple user interface: SQL• algebra + laws for optimization• performance: B-trees• multiuser: transactions (Gray)• robustness: transactions + recovery,

self-organization of B-trees• scalability: B-trees with logarithmic

growth, parallelism

Page 19: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

19

Prefix B-Trees

... (Smith, Bernie), (Smith, Henry) ...

... (Smith, C)

... (Smith, Bernie)

• store shortest separators: Simple Prefix B-trees• trim common prefixes: Prefix B-trees

(Smith, Henry)

Page 20: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

20

Concurrency and B-Trees

Bayer, Schkolnick: Acta Informatica 9, 1-21 (1977)

...

• everybody reads root• root almost never changes• low probability of conflicts near leaves

• combination of synchronization protocols • no chance of testing real general case

Page 21: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

21

UB-Trees: Multidimensional Indexing

• geographic databases (GIS)• Data-Warehousing: Star Schema• all relational databases with n:m relationships

R Sn m

(r, s)• XML• mobile, location based applications

Page 22: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

22

Basic Idea of UB-Tree

• linearize multidimensional space by space filling curve, e.g. Z-curve or Hilbert

• Use Z-address to store objects in B-Tree

Response time for query is proportional to size of the answer!

Page 23: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

23

UB-Tree: Regions and Query-Box

Z-region [0.1 : 1.1.1]

Page 24: Rudolf Bayer Technische Universität München B-Trees and Databases, Past & Future

R. Bayer B-Trees and Databases

24

World as self balancing UB-Tree