1 presented by: victor gonzalez-castro lachlan mackinnon a survey “off the record” – using...

22
1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the A survey “Off the Record” – Using Record” – Using Alternative Data Models Alternative Data Models to Increase Data Density to Increase Data Density in Data Warehouse in Data Warehouse Enviroments. Enviroments.

Upload: bennett-medley

Post on 31-Mar-2015

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

1

Presented by: Victor Gonzalez-Castro Lachlan MacKinnon

A survey “Off the Record” – A survey “Off the Record” – Using Alternative Data Using Alternative Data

Models to Increase Data Models to Increase Data Density in Data Warehouse Density in Data Warehouse

Enviroments.Enviroments.

Page 2: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

2

AgendaAgenda

Introduction Data Sparsity State of the art

Relational Model The Triple Store The Binary Model The Associative model The Transrelational model

Our proposal Questions

Page 3: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

3

IntroductionIntroduction• In Data Warehouse

environments Data Sparsity is a common issue that remains unresolved.

• Alternative Data Models that abandon the traditional record storage/manipulation structure have been researched.

• We are investigating the use of these alternative data models to increase data density with the idea to decrease data sparsity.

Page 4: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

4

Origin of Data SparsityOrigin of Data Sparsity

• Data sparsity is originated from the aim of answering all possible user queries from the information stored in a Data Warehouse that contains Nulls.

$ $ $ $ $ $ $ $

$ $ $

$

$

Time Dimension

Month

Year

Day

Fig.1. A three level dimension and Nulls. After [6]

Page 5: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

5

Origin of Data Sparsity (Cont…)Origin of Data Sparsity (Cont…)

• Data Sparsity is the result of the Cartesian product of all dimensions and all aggregation levels.

(Sparse)

(Dense)

Fig.2. Data Sparsity and data density. From [6].

Page 6: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

6

State of the art. (Relational)State of the art. (Relational)

• The Relational Model [7] uses the traditional record storage/manipulation structure.

1234 Nut Red London

• It is the base model against which the other models will be compared.

• All RDBMS made a poor management of sparsity (missing information).

•Codd [7] suggested a fundamental change in the relational Model V2, the use of a 4 value-logic.

•No one has implemented this fundamental change

Page 7: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

7

State of the art. (Relational)State of the art. (Relational)

• Major players on the Relational Market

/ SQL Server

Page 8: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

8

State of the art. (TripleStore)State of the art. (TripleStore)

Identifier Name

1 Nut

2 Red

3 London

… …

• The Triple Store. [1],[2]. It uses a Structure called the Name Store to keep all the names.

• To construct the processing Structure, uses Triples.

1 2 3

4 5 6

… … …

l m n

Page 9: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

9

State of the art. (TripleStore)State of the art. (TripleStore)

• The major project in Triple Store is TriStarp

• Tristarp was stablished in 1984. Leaded by Peter King with Support from IBM Hursley labs.

• Dr. Sharman from IBM Hursley [1] is visiting the Tristarp team.

• Current directions• Further development of the persistent Triple

Store Repository.• Continuing Research on the graph-based

model.• Extending technology to manage partially

structured data

Page 10: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

10

State of the art. (Binary)State of the art. (Binary)

Sur Pname Color City

s1 Nut Red London

s2 Bolt Green Paris

s3 Screw Blue Oslo

• The Binary Model [4] considers that all tables are Binary tables.

Sur City

s1 London

s2 Paris

s3 Oslo

Sur Pname

s1 Nut

s2 Bolt

s3 Screw

Sur Color

s1 Red

s2 Green

s3 Blue

Page 11: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

11

State of the art. (Binary)State of the art. (Binary)

• A Major Project in the Binary Model [4] is MONETDB.

• Is a DBMS designed to provide high performance on complex queries against real-world sized database.

• Achieves this goal using innovations at all layers of a DBMS: a storage model based on vertical fragmentation, processing speed by self-tuning relational operators, algorithms designed to exploit modern hardware, self-managing indexing structures, modular and extensible software architecture, etc.

• It is developed at the Institute for Mathematics and Computer Science Research of The Netherlands.

Page 12: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

12

State of the art. (Associative)State of the art. (Associative)

Identifier Name

77 Nut

08 Red

32 London

12 That is

67 Is located in

• The Associative Model [3] comprises two types of data structures Items and Links.

• It differs from Binary and Triple store in one fundamental way; Associations themselves may be either the source or the target of other associations.

• It uses Quadruplets.

Identifier Source Verb Target

74 77 12 08

03 74 67 32

Page 13: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

13

State of the art. (Associative)State of the art. (Associative)

• The Major product in the Associative Model is SentencesDB.

• Instead of using a separate, unique table for every different type of data, it uses a single, generic structure to contain all types of data.

• Information about the logical structure of the data and the rules that govern it are stored alongside the data in the database.

• The programs are truly reusable, and no longer need to be amended when the data structures change.

Page 14: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

14

State of the art. (Transrelational)State of the art. (Transrelational)

• The TransRelational ModelTM. [5] keeps the Relational model itself but abandon the record storage structure. It uses two structures:

The Record Reconstruction Table.The Field Values Table.• Since there is currently no instantiation of the

Transrelational Model available, We will build an implementation of the essential algorithms.

P# PNAME COLOR CITY

P1 Bolt Blue London

P2 Cam Blue London

P3 Cog Green London

P4 Nut Red Oslo

P5 Screw Red Paris

P6 Screw Red Paris

P# PNAME COLOR CITY

4 3 2 1

1 1 4 4

5 6 5 6

6 4 1 3

2 2 3 2

3 5 6 5

Page 15: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

15

Transrelational. AlgorithmsTransrelational. Algorithms

P# PNAME COLOR CITY

P1 Nut Red London

P2 Bolt Green Paris

P3 Screw Blue Oslo

P4 Screw Red London

P5 Cam Blue Paris

P6 Cog Red London

P# PNAME COLOR CITY

4 3 2 1

1 1 4 4

5 6 5 6

6 4 1 3

2 2 3 2

3 5 6 5

P# PNAME COLOR CITY

P1 Bolt Blue London

P2 Cam Blue London

P3 Cog Green London

P4 Nut Red Oslo

P5 Screw Red Paris

P6 Screw Red Paris

Field Values Table (FVT)

1. A file for the suppliers relation 2. Sort each column in asc.Record Reconst. Table (RRT)

P# PNAME COLOR CITY

P1 London

Nut Red

1. Go to Cell [1,1] of the FVT, fetch the value stored (P1).

3. Go to the corresponding RRT cell [4,2] and fetch the row number (4). The next (3rd or COLOR) is the 4th row in the FVT (Red).

5. Go to the corresponding RRT cell [4,1] and fetch value (1). The next 5th column does not exist, so it wraps around to the 1st column, so then is the 1st row in the FVT.

4. Go to the corresponding RRT cell [4,3] and fetch value (1). The next 4th or CITY) is the 1st row in the FVT (London).

2. Go to the same cell [1,1] in the RRT and fetch the value (4). It is interpreted to mean that the next field value (PNAME), is in the 4 th row of the FVT. Go to that cell and fetch the value (Nut)

Page 16: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

16

Alternative Data Models ComparisonAlternative Data Models Comparison

Model Storage Structure Linkage Structure

Relational Table (Relation) By position

Triple Store Name Store Triple Store

Binary Binary Table Joins

Associative Items Links

Transrelational Field Values Table Record Reconstruction Table

Page 17: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

17

Our proposal (Our aims)Our proposal (Our aims)

• To carry out an impartial survey on alternative Data Models.

• Compare whether or not the use of alternative data models can improve the Data Density in Data Warehouse environments.

• Observe the effect that such data density increase has on the data sparsity.

Page 18: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

18

Our proposal (How…)Our proposal (How…)

• We intend to use an implementation of each data model

TransRelationalTM

• We will use TPC-H data set to load each database.

• Run a set of benchmark metrics, where available if not we will develop our metrics to determine relative performance and then consider relative data density and sparsity.

Page 19: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

19

Just Remember…Just Remember…

• Instead of storing data horizontally, do it vertically and eliminate duplicate values.

123

456

789

234

567

Bolt

Screw

Nut

Nail

Black

Blue

White

Paris

London

Here are the Savings

• We are abandoning the traditional Record Structure, we are going “off the record”.

Page 20: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

20

Questions?Questions?

Page 21: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

21

Thanks !!Thanks [email protected]

[email protected]

Page 22: 1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse

22

ReferencesReferences

1. G C H Sharman and N Winterbottom, The Universal Triple Machine: a Reduced Instruction Set Repository Manager. Proceedings of BNCOD 6, pp 189-214, 1988.

2. TriStarp Web Site: http://www.dcs.bbk.ac.uk/~tristarp. Updated November, 2000.

3. Simon Williams. The Associative Model of Data, Second Edition, Lazy Software Ltd. ISBN: 1-903453-01-1 www.lazysoft.com

4. MonetDB. ©1994-2004 by CWI. http://monetdb.cwi.nl

5. Date, C.J. An introduction to Database Systems. Appendix A. The Transrelational Model , Eighth Edition. Addison Wesley. 2004. USA. ISBN: 0-321-18956-6.

6. Pendse Nigel. Database explosion. http://www.olapreport.com Updated Aug, 2003.

7. Codd, E.F. The Relational Model for Database Management Version 2. Addison-Wesley. 1990. ISBN 0-201-14192-2.