sw-s tore : a vertically partitioned dbms for s emantic w eb data m anagement surabhi mithal nipun...

22
SW-STORE: A VERTICALLY PARTITIONED DBMS FOR SEMANTIC WEB DATA MANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2009. The VLDB Journal. Group 4 Surabhi Mithal 4282643 Nipun Garg 4282567 http://www-users.cs.umn.edu/~s mithal/

Upload: janis-young

Post on 18-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

SW-STORE: A VERTICALLY PARTITIONED DBMS FOR SEMANTIC WEB DATA MANAGEMENT

Surabhi Mithal

Nipun Garg

Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2009. The VLDB Journal.

Group 4Surabhi Mithal 4282643Nipun Garg 4282567http://www-users.cs.umn.edu/~smithal/

Page 2: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

OUTLINE

Introduction to Semantic Web Motivation Problem Statement Challenges Major Contributions Related Work Key Concepts Assumptions Validation Methodology Results Improvements

Page 3: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

INTRODUCTION TO SEMANTIC WEB : AN EXAMPLE

ISBN Author Title Publisher Year

0006511409X id_xyz The Glass Palace id_qpr 2000

ID Name Homepage

id_xyz Ghosh, Amitav http://www.amitavghosh.com

ID Publisher’s name City

id_qpr Harper Collins London

Source : http://www.w3.org/People/Ivan/CorePresentations/SWTutorial/

A simplified bookstore data (dataset “A”)

Page 4: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

EXAMPLE CONT : GRAPH REPRESENATION

http://…isbn/000651409X

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:title

a:year

a:city

a:p_name

a:namea:homepage

a:authora:publisher

Page 5: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

ANOTHER BOOKSTORE DATA (DATASET “F”)

A B C D

1 ID Titre Traducteur

Original

2 ISBN 2020286682

Le Palais des Miroirs

$A12$ ISBN 0-00-6511409-X

3

4

5

6 ID Auteur7 ISBN 0-00-

6511409-X$A11$

8

9

10 Nom11 Ghosh, Amitav12 Besse,

Christianne

Page 6: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

EXAMPLE CONT : GRAPH REPRESENATION

http://…isbn/000651409X

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:original

f:nom

f:traducteur

f:auteurf:t

itre

http://…isbn/2020386682

f:nom

Page 7: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

DATA INTEGRATION ACROSS THE TWO DATASETS : SEMANTIC WEB

http://…isbn/000651409X

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:orig

ina

l

f:nom

f:traducteur

f:auteur f:titr

e

http://…isbn/2020386682

f:nom

http://…isbn/000651409X

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:title

a:year

a:city

a:p_nam

e

a:name a:homepag

e

a:author

a:publish

er

Page 8: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

DATA INTEGRATION ACROSS THE TWO DATASETS : SEMANTIC WEB

http://…isbn/000651409X

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:orig

in

al

f:nom

f:traducteur

f:auteur f:titr

e

http://…isbn/2020386682

f:nom

http://…isbn/000651409X

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:title

a:year

a:city

a:p_nam

e

a:name a:homepag

e

a:author

a:publish

erSAME URI

Page 9: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

DATA INTEGRATION ACROSS THE TWO DATASETS :SEMANTIC WEB

a:title

Ghosh, Amitav

Besse, Christianne

Le palais des miroirs

f:original

f:nom

f:traducteur

f:auteur

f:titr

e

http://…isbn/2020386682

f:nom

Ghosh, Amitav

http://www.amitavghosh.com

The Glass Palace

2000

London

Harper Collins

a:year

a:city

a:p_nam

e

a:name a:homepag

e

a:author

a:publish

er

http://…isbn/000651409X

User of data “F” can now ask queries like:“give me the title of the original”

Page 10: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

MOTIVATION

Integration and sharing of data across different applications and organizations.

The Semantic Web logical data model is called “Resource Description Framework.

Semantic web concept has issues related to scalability and performance due to the nature of the data. Current data management solutions for RDF scale poorly.

Page 11: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

PROBLEM STATEMENT

Input : RDF data in the form of triples <subject,property,object>

e.g. The Glass Palace hasAuthor Amitav Ghosh Output : Efficient storage system for RDF data.

Objective : Improve the query performance for complex real world queries.

Page 12: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

CHALLENGES

Find all authors of books whose title has the word “Transaction”.

5 way self join!

Page 13: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

MAJOR CONTRIBUTIONS AND NOVELTY

Introduction of a new concept of vertically partitioning RDF data and use of a column-oriented database to improve performance and increase simplicity.

The performance evaluation of the new and existing techniques with a real world example.

A new column oriented database SW-store is proposed which is based on the above approach.

Page 14: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

RELATED WORK– PROPERTY TABLESHP LABORATORIES - JENA

Property Clustered Tables and Property Class Tables

Approach 1: A data clustering approach. Approach 2: Creates clusters based on subject’s type.

Limitations: Accuracy of Clustering algorithms. NULLs in data. Multivalued attributes.

Page 15: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

SAMPLE DATABASE

Source: - SW-Store: a vertically partitioned DBMS for Semantic Web data management

Too many NULLs

Page 16: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

KEY CONCEPTS: VERTICAL PARTITIONING AND COLUMN ORIENTED STORE

Vertical partitioning of data and further storing this vertically partitioned data into a column oriented database.

Subject-object columns for each property. Advantages: Effective handling of Multivalued attributes. Elimination of NULLs The number of unions is less.

Column oriented storage. Advantages: no wastage of bandwidth as projections on data happen before it is pulled

into main memory. record header is stored in separate columns thus reducing the tuple width

and letting us choose different compression techniques for each column.

Page 17: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

KEY CONCEPTS: SW-STORE

SW-store is a column oriented DBMS optimized for storing RDF Single column table for subjects.

Representing Sparse data

Overflow tables

Page 18: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

ASSUMPTIONS

Postgres is assumed to be the best available choice for a row oriented RDBMS because of effective handling of NULLs.

Queries that do not restrict on property values are very rare for RDF applications.

Moderate amount of Insert/Updates on RDF store.

Critique for Assumption: Limited Insert/Update If the overflow tables get filled rapidly, the batch operation to update

the column oriented store will occur more often degrading the performance as a whole.

Page 19: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

VALIDATION METHODOLOGY

Barton Libraries dataset provided by the Simile Project at MIT (http://simile.mit.edu/rdf-test-data/barton).

The benchmark is set of 7 queries which is based on a browsing session of Long well, a UI built by Simile group for querying the library dataset. These queries are executed on: Triple data store (subject, property, object table with no

improvements on Postgres). Property tables ( on Postgres) Vertically partitioned data in a row oriented store (Postgres). Vertically partitioned data in a column oriented store (C- Store).

Page 20: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

VALIDATION METHODOLOGY

Strengths : Real world data and query scenarios. Comparison of all the existing techniques the proposed

technique.

Weaknesses :- Avoiding queries involving unrestricted property problem

which are particularly prevalent for vertical partitioned scenarios.

Accuracy of clustering for property tables. Performance may differ when using different underlying

databases.

Page 21: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

RESULTS

From the results, it is clear that proposed storage scheme outperforms the exiting methods in terms of query time.

Page 22: SW-S TORE : A VERTICALLY PARTITIONED DBMS FOR S EMANTIC W EB DATA M ANAGEMENT Surabhi Mithal Nipun Garg Daniel J. Abadi, Adam Marcus, Samuel R. Madden,

IMPROVEMENTS – SPATIAL PERSPECTIVE

Schema design- Queries are fired on vertically partitioned tables as well as overflow tables. Owing to the heaviness of spatial data, there should be some spatial indexing like R* TREE or GRID to make these queries faster.

Restrictive nature - Spatial queries are not restricted to only specific “properties” which is an important assumption on their part.

E.g. Landmarks Tables should be partitioned in a better way rather than just

handling one property per table!e.g. Grouping similar properties together based on domain

knowledge.