scalable data stores

SCALABLE DATA STORES Data Management in the Cloud

34

Overview

• New systems have emerged to address requirements of data management in the cloud – so-called “NoSQL” data stores

– scalable SQL databases

• Horizontal Scaling – shared nothing

– replicating and partitioning data over thousands of servers

– distribute “simple operation” workload over thousands of servers

• Simple Operations – key lookups

– read and writes of one or a small number of records

– no complex queries or joins

35

Defining “NoSQL”

• No agreed upon definition – “not only SQL”

– “not relational”

– …

• Six key features 1. ability to horizontally scale simple operation throughput over many

servers

2. ability to replicate and distribute (partition) data over many servers

3. simple call level interface or protocol (in contrast to a SQL binding)

4. weaker concurrency model than ACID transactions of most relational (SQL) database systems

5. efficient use of distributed indexes and RAM for data storage

6. ability to dynamically add new attributes to data records

36 Based on: “Scalable SQL and NoSQL Data Stores”, 2010

Data Models

• Terminology – tuple: row in a relational table, where attribute names and types are

defined by a schema, and values must be scalar

– document: supports both scalar values and nested documents, and the attributes are dynamically defined for each document

– column family: groups key/value pairs (columns) into families to partition and replicate them; one column family is similar to a document as new (nested, list-valued) attributes can be added

– object: analogous to objects in programming languages, but without procedural methods

• Relational – data is stored in relations (tables) of tuples (rows) of scalar values

– queries expressed over arbitrary (combinations of) attributes

– indexes defined over arbitrary (combinations of) attributes

37

Based on: “Scalable SQL and NoSQL Data Stores”, 2010

Key/Value Data Model

• Interface – put(key, value)

– get(key): value

• Data storage – values (data) are stored based on programmer-defined keys

– system is agnostic as to the structure (semantics) of the value

• Queries are expressed in terms of keys

• Indexes are defined over keys – some systems support secondary indexes over (part of) the value

38

k1 v1

k2 v2

k3 v3

…

kn vn

Document Data Model

• Interface – set(key, document)

– get(key): document

– set(key, name, value)

– get(key, name): value

• Data storage – documents (data) is stored based on programmer-defined keys

– system is aware of the (arbitrary) document structure

– support for lists, pointers and nested documents

• Queries expressed in terms of key (or attribute, if index exists)

• Support for key-based indexes and secondary indexes

39

k1 “name”:“fred”

k2 “name”:“mary”;“age”:“25”

k3

…

kn “name”:“john”;“address”:“k3”

“name”:“oak st”

Private Public

Column Family Data Model

• Interface – define(family)

– insert(family, key, columns)

– get(family, key): columns

• Data storage – <name, value, timestamp> triples (so-called columns) are stored based

on a column family and key; a column family is similar to a document

– system is aware of (arbitrary) structure of column family

– system uses column family information to replicate and distribute data

• Queries are expressed based on key and column family

• Secondary indexes per column family are typically supported

40

k1 “name”:“fred”

k2 “name”:“mary”

k3

…

kn “name”:“john”


“title”:“Mr”

“age”:“25”

Graph Data Model

• Interface – create: id

– get(id)

– connect(id1, id2): id

– addAttribute(id, name, value)

– getAttribute(id, name): value

• Data storage – data is stored in terms of nodes and (typed) edges

– both nodes and edges can have (arbitrary) attributes

• Queries are expressed based on system ids (if no indexes exist)

• Secondary indexes for nodes and edges are supported – retrieve nodes by attributes and edges by type, start and/or end node,

and/or attributes

41

n1 n2

n3

“name”:“fred”

“name”:“mary”;“age”:“25”


LIKES

LIKES

“weight”:“-1”

Array Data Model

• Nested multi-dimensional arrays – cells can be tuples or other

arrays – can have non-integer

dimensions

• Additional “History” dimension on updatable arrays

• Ragged arrays allow each row or column to have a different length

• Supports multiple flavors of “null” – array cells can be “EMPTY” – user-definable treatment of

special values

SciDB DDL

CREATE ARRAY Test_Array

< A: integer NULLS,

B: double,

C: USER_DEFINED_TYPE >

[I=0:99999,1000, 10, J=0:99999,1000, 10 ]

PARTITION OVER ( Node1, Node2, Node3 )

USING block_cyclic();

Attribute names A, B, C

Index names I, J

Chunk size 1000

Overlap 10

Object Data Model

• Interface – set(object)

– get(query): object

• Data storage – typed programming language objects (plus referenced objects) stored

– attribute can be collection-valued

– database is aware of the type (schema) of objects

• Objects are retrieved using queries or by traversal from “roots”

• Specialized indexes can be expressed based on schema

44

“mary”

25

Person

“fred”

27

Person LIKES

LIKES

“oak st”

Address

LIVES_AT

APPLICATION SCENARIO Data Management in the Cloud

45

An Application Scenario

• As an example application scenario, we will use graph data management and processing throughout the course

• Graph data applications – social networking

– Semantic Web (i.e. RDF graphs)

– data provenance

– Web site ranking (i.e. Page Rank)

– …

• No (mature) graph databases exist – graph data stores are available (Neo4j, OrientDB, …)

• Use existing (mature) non-graph database – graph data model must be mapped to data model of database

– algorithms must be specified in database language

46

Graph Data

• A graph 𝐺 = 𝑉, 𝐸 consists of a set of nodes 𝑉 and a set of edges 𝐸

• Edges are pairs (𝑢, 𝑣) that can either be ordered or unordered – if edges are ordered, the graph is directed; we sometimes refer to

directed edges as arcs

– if edges are unordered, the graph is undirected

• Each edge can be associated with a weight 𝑤 – if edges have a weight, the graph is weighted

– if edges have no weight, the graph is unweighted

• A path from 𝑣1 to 𝑣𝑛 is a sequence of nodes ⟨𝑣1, 𝑣2, … , 𝑣𝑛⟩ such that ∀ 1 ≤ 𝑖 < 𝑛: ∃ 𝑣𝑖 , 𝑣𝑖+1 ∈ 𝐸

• A graph is (fully) connected if there is a path between every pair of nodes

47

Graph Processing

• Classical graph algorithms – shortest path

– bridges

– transitive closure

• “Web 2.0” – friend of a friend

– who follows who?

– who might know who?

• Social network analysis – degree centrality

– closeness centrality

– betweenness centrality

48

Classical Graph Algorithms

• Shortest Path – find the shortest path(s) between two nodes

– e.g. Dijkstra’s algorithm 𝑂 𝐸 + 𝑉 log 𝑉

– weighted graphs: minimize total weight

– unweighted graphs: minimize number of edges

• Bridges – find edges that connect to graph components

– Tarjan’s algorithm 𝑂 𝑉 + 𝐸

– parallel algorithms exist

• Transitive Closure – inserts all transitive paths as edges

– Floyd-Warshall’s algorithm 𝑂 𝑉3

49

Social Network Analysis

• The degree centrality of a node 𝑣 in a graph 𝐺 = (𝑉, 𝐸) is defined as the number of edges incident to 𝑣, normalized by the number of other nodes in 𝐺

𝐶𝐷 𝑣 =deg (𝑣)

𝑉 − 1

• The closeness centrality of a node 𝑣 in a graph 𝐺 = 𝑉, 𝐸 is given by the inverse of the sum of its geodesic distance (weight or length of the shortest path) to all other nodes in 𝐺

𝐶𝐶 𝑣 = 1 𝛿(𝑣,𝑢)𝑢∈𝑉\v

50

Social Network Analysis

• The betweenness centrality of a node 𝑣 in a graph 𝐺 = 𝑉,𝐸 is defined as

𝐶𝐵 𝑣 = |𝑃 𝑢,𝑤 𝑣 |

|𝑃 (𝑢, 𝑤)|𝑢,𝑤∈𝑉∧𝑢≠𝑣≠𝑤

where 𝑃 (𝑢,𝑤) denotes the set of all shortest paths from 𝑢 to 𝑤 and 𝑃 𝑢,𝑤 𝑣 denotes the set of all shortest paths from 𝑢 to 𝑤 that pass through 𝑣

51

Usage Profiles

• Database updates and consistency – data changes frequently, results need to be accurate

– data changes infrequently, results need to be accurate

– query results may not reflect the latest state of the database

• Different types of queries – point-based, bound: a node, a node and its neighbors, friends of a

friend

– point-based, unbound: a node and all its reachable nodes, shortest path between two nodes

– graph-based, per node: centralities

– graph-based, all: bridges, transitive closure

52

Case Study: SQL Server

This slide is intentionally left blank.

53

Case Study: Versant

This slide is intentionally left blank.

54

Looking Good, SQL Server!

55

NetScience Geom Erdos

SQL Server 0.054 0.077 0.151

Versant 0.340 2.240 1.930

0.000

0.500

1.000

1.500

2.000

2.500

Tim

e (m

s)


SQL Server 0.074 0.486 35.313

Versant 11.800 131.400 13875.900

0.000

2000.000

4000.000

6000.000

8000.000

10000.000

12000.000

14000.000

16000.000

Tim

e (m

s)

Shortest Paths Closeness Centrality

Oh, no!

56


SQL Server 2710.000 48092.000 129945.000

Versant 4.160 26.020 12.110

0.000

20000.000

40000.000

60000.000

80000.000

100000.000

120000.000

140000.000

Tim

e (m

s)

Bridges

Course Project

• The course will be accompanied by a project that is based on the scenario of graph data management and processing – Task 1: Study question that will take you through a “dry run” of

mapping the graph data model to a NoSQL data model and make you think about how to answer some simple queries.

– Task 2: Groups of 4-5 students will pick a NoSQL system and compile a systems profile, based on papers and documentation. These profiles will be presented in class.

– Task 3: Groups of 4-5 students will design a graph management and processing system based on the previously chosen NoSQL system. This time for real!

– Task 4: Groups of 4-5 students implement a prototype of the graph data management and processing system.

– Task 5: Each student will write a 3 page report and essay.

57

Task 1: Application Design “Dry Run”

• The goal of this project is to complete the following tasks – Pick a NoSQL data model and map graph data model into that model

– In words or pseudo-code, describe how you would do the queries below

• find a node properties based on its identifier

• find the neighbors of a given node

• find the "friends of a friend" of a given node

• transitive closure

• bridges

– Discuss how different usage profiles presented in the lecture (Slide #51) affect the processing of these queries

• Deliverable is a written report

• Students will conduct this part of the project individually

58

Task 2: Systems Profile

• Horizontally scalable data management systems – Riak (http://wiki.basho.com/)

– Project Voldemort (http://project-voldemort.com/)

– CouchDB (http://couchdb.apache.org/)

– SimpleDB (http://www.amazon.com/simpledb/)

– HBase (http://www.mongodb.org/)

– Cassandra (http://cassandra.apache.org/)

– OrientDB (http://www.neo4j.org/)

• Groups of 4-5 students collaborate on a systems profile – groups will be formed on October 4

– decide in advance with who you would like to work on what

59

http://wiki.basho.com/

http://wiki.basho.com/

http://project-voldemort.com/



http://couchdb.apache.org/

http://www.amazon.com/simpledb/



http://www.mongodb.org/

http://cassandra.apache.org/

http://www.neo4j.org/

http://www.neo4j.org/


• Data Model – Precise description of the data model, especially in terms of differences

from the "standard" models presented in the lecture

– Detailed summary of the basic data manipulation API, i.e. features to create, retrieve, update and delete data items.

• Query Support – Supported query types, i.e. point, range, navigation, and/or arbitrary?

– What is the query language of the system? Is it declarative, functional, algebraic and/or imperative?

– Are queries automatically optimized?

• Indexes – What index structures are available?

– What can be indexed? What can be a key? What can be a value?

– How are indexes managed, i.e. manually or automatically?

60


• Storage – Disk or file storage

– In-memory (RAM)

– Flash or SSD

– Traditional database

– Cloud Storage (GFS, HDFS, S3)

• Transactions and Concurrency Control – Does the system support transactions?

– How are transactions implemented, i.e. locks, OCC, MVCC, etc.?

– What consistency guarantees are given?

61


• Scalability and Replication – What types of replication are supported, i.e. synchronous or

asynchronous?

– ...

• Platform/Deployment – What cloud infrastructures are supported?

– What deployment scenarios are supported, i.e. embedded, client/server, multi-core CPU, cloud, etc.?

– Language bindings?

– Communication protocols, i.e. JSON, REST, etc.?

62

Task 3: Application Design

• Design the example graph data management in the previously profiled system – similar to Tart 1, but more technical as it is based on a concrete system

– consider only "friends of a friend", transitive closure and bridges query

– insights for other queries optional, but highly welcome and appreciated

• Deliverable is a ten minute presentation in class – November 15

– discuss final design and motivate the design choices made w.r.t. the requirements of the application and the capabilities of the system

– give details on the mapping of data structures, planned indexes, and query implementation strategies

• Same groups of 4-5 students will continue to collaborate

63

Task 4: Prototype Implementation

• Implement a small prototype based on previous design – data model (with data loading capabilities)

– three queries mentioned before

• The goal of this part of the project is to realize of a small application and experience its performance in practice

• Alternative Option: If you feel you lack implementation experience to complete this task, you may contribute to "benchmarking" the systems built by your peers

• Deliverable is the developed source code by the students

• The same groups of 4-5 students continue to collaborate

• The team with the best solution in terms of design and performance will win a price!

64

Task 5: Report and Essay

• Final task is to write a 3 page paper

• Report (1 page) – summarize major decisions and choices made throughout the course of

the projects

– show an understanding of and active involvement in the project

• Essay (2 pages) – “cloud-scale data management systems in today's world of computing”

– demonstrate that they can position these systems within the IT landscape

– show awareness of advantages and disadvantages

– conclude with a personal opinion about/view on cloud-scale data management systems.

• Deliverable is the final report

• This part of the project is conducted individually

65

scalable data stores

Documents