data modeling with cassandra

Post on 22-Apr-2015

712 Views

Category:

Software

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Data modeling doesn't have to be difficult. This talk walks through different CQL data model examples.

TRANSCRIPT

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Data Modeling with Cassandra

Patricia Gorla @patriciagorla

Cassandra Consultant

About The Last Pickle. !

Work with clients to deliver and improve Apache Cassandra based solutions. Apache Cassandra Committer, DataStax MVP, Hector Maintainer, Apache Usergrid Committer. Based in New Zealand & USA.

A Few Notes about Cassandra

A Few Notes about CassandraOpen sourced in 2008 by Facebook

A Few Notes about CassandraOpen sourced in 2008 by FacebookA lot has changed since then…

See issues.apache.org/jira/browse/CASSANDRA

Cassandra is…• Distributed

'foo'

'bar''foo'

'foo'

'bar'

'bar'

Data distributed by hash

Cassandra is…• Distributed

Availability through Redundancy

'foo'

'bar''foo'

'foo'

'bar'

'bar'

SouthAfrica

Central Africa

Egypt

North Africa

Mad

agas

car

East Africa

India

Afghanistan

Middle East

Ural

Siberia

Yakutsk Kamchatka

Irkutsk

Japa

n

Russia

Scandinavia

SoutheastAsia

NorthernEurope

SouthernEurope

WesternEurope

Iceland

Great Britain

New Guinea

Indonesia

Western Australia

Eastern Australia

Northwest TerritoryAlaska

Alberta

Ontario Eastern Canada

WesternUnited States

EasternUnited States

Greenland

Central America

Venezuela

Brazil

Peru

Argentina

Cassandra is…• Distributed

Geolocated datacenters

Cassandra is…• Distributed • Eventually Consistent

?

?

?

Read Repair Maintenance Repair

Cassandra is…• Distributed • Eventually Consistent

?

?

?

Consistency Level

QUORUM, ONE, ALL, ANY

Cassandra is…• Distributed • Eventually Consistent • Fast

See http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster

2.1 - 190,000 wps

2.0 - 105,000 wps

Cassandra is…• Distributed • Eventually Consistent • Fast

See http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster

2.1 - 190,000 wps

2.0 - 105,000 wps

Note: Reads can be tuned through data model and JVM

Cassandra is…• Distributed • Eventually Consistent • Fast • Familiar

CREATE TABLE IF NOT EXISTS foo ( bar text, baz text, PRIMARY KEY (bar));

CQL - Cassandra Query Language

Cassandra is…• Distributed • Eventually Consistent • Fast • Familiar

CREATE TABLE IF NOT EXISTS foo ( bar text, baz text, PRIMARY KEY (bar));!

INSERT INTO foo (bar, baz) VALUES ('one', 'two');!

SELECT * FROM foo;

cqlsh - CLI tool

Cassandra is…• Distributed • Eventually Consistent • Fast • Familiar • Popular

DriversDatastax C#, Java, C++, Python,

Node.js*, Ruby*.NET/C# Cassandra Sharp, Aquiles, … Cassandra, Apache Spark Datastax Spark Connector

C++ libQTCassandraClojure CLJ-Hector, Cassaforte, AliaErlang CQerl

Go Gossie, GoCQL, CQLcHaskell Cassy

Java Astyanax,Hector, Achilles,Node.js Helenus, Node-Cassandra-

CQL,ODBC Simba ODBCPerl Cassandra::Simple, PerlcassaPHP CQL PHP, CQLSI, php-

cassandraPython Datastax Python, Pycassa,R R Cassandra

Ruby Fauna, CQL Ruby, CQLEngineRust Rust-CQL

Scala CascalStorm Storm-Cassandra

For full list, see http://planetcassandra.org/client-drivers-tools/

The Hard Part

The Hard Part(Data Modeling)

The Hard Part(Data Modeling)

No JOINs, Denormalize

The Hard Part(Data Modeling)

No JOINs, Denormalize

Duplicate the Data

The Hard Part(Data Modeling)

No JOINs, Denormalize

Duplicate the Data

Identify Usage

Bikes Customers Stations Trips

c Noah Berger, Flickr

Case Study: City BikeShare

!

CREATE KEYSPACE bikeshare WITH replication = { 'class': 'NetworkTopologyStrategy' , 'datacenter1': 3 };!

USE bikeshare; RF can be altered ex post facto

Bikes Customers Stations Trips

c Noah Berger, Flickr

- List the properties of the bike.

!

CREATE TABLE IF NOT EXISTS bike ( bike_id text, properties map<text, text>, is_damaged boolean, is_checked_out boolean, latitude double, longitude double, PRIMARY KEY (bike_id));

!

CREATE TABLE IF NOT EXISTS bike ( bike_id text, properties map<text, text>, is_damaged boolean, is_checked_out boolean, latitude double, longitude double, PRIMARY KEY (bike_id));

!

CREATE TABLE IF NOT EXISTS bike ( bike_id text, properties map<text, text>, is_damaged boolean, is_checked_out boolean, latitude double, longitude double, PRIMARY KEY (bike_id));

See www.datastax.com/documentation/cql/3.0/cql/cql_reference/cql_data_types_c.html for all data types

INSERT INTO bike ( bike_id, properties, is_damaged, is_checked_out, latitude, longitude ) VALUES ( 'bike1', {'serial_number' : 'GS-00143', 'type' : 'road bike'}, False, False, 37.7648, 122.4200);

!

SELECT * FROM bike;

!

SELECT * FROM bike;! bike_id | is_checked_out | is_damaged | latitude | longitude | properties---------+----------------+------------+----------+-----------+----------------------------------------------------- bike3 | False | True | 37.793 | 122.4 | {'serial_number': 'GS-70159', 'type': 'fixed gear'} bike2 | True | False | 37.786 | 122.4 | {'serial_number': 'GS-79366', 'type': 'road bike'} bike1 | False | False | 37.765 | 122.42 | {'serial_number': 'GS-00143', 'type': 'road bike'}!(3 rows)

!

CREATE TABLE IF NOT EXISTS bike ( bike_id text, properties map<text, text>, is_damaged boolean, is_checked_out boolean, latitude double, longitude double, PRIMARY KEY (bike_id));

!

CREATE TABLE IF NOT EXISTS bike ( bike_id text, properties map<text, text>, is_damaged boolean, is_checked_out boolean, latitude double, longitude double, PRIMARY KEY (bike_id));

!UPDATE bike    SET properties['color'] = 'royal blue'    WHERE bike_id = 'bike1';

!

CREATE TABLE IF NOT EXISTS bike ( bike_id text, properties map<text, text>, is_damaged boolean, is_checked_out boolean, latitude double, longitude double, PRIMARY KEY (bike_id));

!UPDATE bike    SET properties['color'] = 'royal blue'    WHERE bike_id = 'bike1';

!SELECT properties FROM bike    WHERE bike_id = bike1';!properties--------------------------------------------------------------------------- {'color': 'royal blue','serial_number': 'GS-00143', 'type': 'road bike'}!(1 rows)

!

CREATE TABLE IF NOT EXISTS bike ( bike_id text, properties map<text, text>, is_damaged boolean, is_checked_out boolean, latitude double, longitude double, PRIMARY KEY (bike_id));

!DELETE properties['color'] FROM bike    WHERE bike_id = 'bike1';

!

CREATE TABLE IF NOT EXISTS bike ( bike_id text, properties map<text, text>, is_damaged boolean, is_checked_out boolean, latitude double, longitude double, PRIMARY KEY (bike_id));

!DELETE properties['color'] FROM bike    WHERE bike_id = 'bike1';

!SELECT properties FROM bike    WHERE bike_id = bike1';!properties--------------------------------------------------- {'serial_number': 'GS-00143', 'type': 'road bike'}!(1 rows)

Bikes Customers Stations Trips

c Noah Berger, Flickr

- List the properties of the bike. - Verify whether the bike can be

checked out.

!

CREATE TABLE IF NOT EXISTS bike ( bike_id text, properties map<text, text>, is_damaged boolean, is_checked_out boolean, latitude double, longitude double, PRIMARY KEY (bike_id));

!

UPDATE bike     SET is_checked_out = True     WHERE bike_id = 'bike1'     IF is_checked_out = False; !

Set conditional statement

!

UPDATE bike     SET is_checked_out = True     WHERE bike_id = 'bike1'     IF is_checked_out = False; !! [applied] ----------- True

!

UPDATE bike     SET is_checked_out = True     WHERE bike_id = 'bike1'     IF is_checked_out = False; !! [applied] | is_checked_out -----------+---------------- False | True

See www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0

Bikes Customers Stations Trips

c Noah Berger, Flickr

- Get the customer details.

CREATE TYPE IF NOT EXISTS address ( street_name text, zip text);!

CREATE TABLE IF NOT EXISTS customer ( customer_id text, email text, name text, password text, mailing_address address, PRIMARY KEY (customer_id));

Note: This example uses text fields for simplicity. Passwords should not be stored in plain text.

CREATE TYPE IF NOT EXISTS address ( street_name text, zip text);!

CREATE TABLE IF NOT EXISTS customer ( customer_id text, email text, name text, password text, mailing_address frozen<address>, PRIMARY KEY (customer_id));

Limitations

Data is serialisedCASSANDRA-7857

CASSANDRA-7423 - Freezing UDT

- Query individual subfields

INSERT INTO customer ( customer_id, email, name, password, mailing_address) VALUES ( 'customer1', 'vanhaver@hotmail.com', 'Paul Van Haver', 'p@ssw0rd1', {street_name: 'Capp Street', zip: '94110'});

INSERT INTO customer ( customer_id, email, name, password, mailing_address) VALUES ( 'customer1', 'vanhaver@hotmail.com', 'Paul Van Haver', 'p@ssw0rd1', {street_name: 'Capp Street', zip: '94110'});

!

SELECT mailing_address.street_name FROM customer WHERE customer_id = ‘customer2';!

!

mailing_address.street_name----------------------------- Bryant Street!

(1 rows)

Bikes Customers Stations Trips

c Noah Berger, Flickr

- List the available bikes at a station.

!

CREATE TABLE IF NOT EXISTS station ( station_name text, latitude double, longitude double, PRIMARY KEY (station_name));

!

CREATE TABLE IF NOT EXISTS bike_at_stations_count ( station_name text, bikes_available counter, PRIMARY KEY (station_name));

!

CREATE TABLE IF NOT EXISTS bike_at_stations_count ( station_name text, bikes_available counter, PRIMARY KEY (station_name)); All counters start at 0

Only increment, decrement

!

UPDATE bikes_at_stations_count SET bikes_available = bikes_available + 1 WHERE station_name = '16th & Mission';

2.1 - Creates a local lock

See www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

!

UPDATE bikes_at_stations_count SET bikes_available = bikes_available + 1 WHERE station_name = '16th & Mission';!

SELECT * FROM bikes_at_stations_count WHERE station_name = '16th & Mission’;!

station_name | bikes_available----------------+----------------- 16th & Mission | 2!

(1 rows)

Bikes Customers Stations Trips

c Noah Berger, Flickr

- List all trips a bike has been on.

CREATE TABLE IF NOT EXISTS BikeTrips ( bike_id text, trip_id text, PRIMARY KEY (bike_id, trip_id));

CREATE TABLE IF NOT EXISTS BikeTrips ( bike_id text, trip_id text, PRIMARY KEY (bike_id, trip_id));

Flaw: All trips for a bike will be stored in the same row

(row will grow unbounded)

Two components of a primary key

PRIMARY KEY ((a, b, …)…, c)

Partition KeyWhere the row will be physically located

Two components of a primary key

PRIMARY KEY ((a, b, …)…, c)

PRIMARY KEY ((a, b, …)…, c)

Partition KeyWhere the row will be physically located

Clustering KeyHow the columns will be ordered on disk

Two components of a primary key

CREATE TABLE IF NOT EXISTS user ( first_name text, last_login timestamp, PRIMARY KEY (first_name));

Single PKEach row is on a separate partition Can be uniquely identified

Single PK

CREATE TABLE IF NOT EXISTS user ( first_name text, last_login timestamp, PRIMARY KEY (first_name, last_login)) WITH CLUSTERING ORDER BY (last_login DESC);

Compound PKColumns are ordered by logins Most recent users will be at the top

Each row is on a separate partition Can be uniquely identified

CREATE TABLE IF NOT EXISTS user ( first_name text, last_login timestamp, PRIMARY KEY (first_name));

Single PK

CREATE TABLE IF NOT EXISTS user ( first_name text, last_login timestamp, PRIMARY KEY (first_name, last_login)) WITH CLUSTERING ORDER BY (last_login DESC);

Compound PKColumns are ordered by logins Most recent users will be at the top

Each row is on a separate partition Can be uniquely identified

CREATE TABLE IF NOT EXISTS user ( first_name text, last_login timestamp, PRIMARY KEY (first_name));

CREATE TABLE IF NOT EXISTS user ( first_name text, last_name text, last_login timestamp, PRIMARY KEY ((first_name, last_name), last_login)) WITH CLUSTERING ORDER BY (last_login DESC);

Composite PKData is bucketed by compositeRow width will be limited

CREATE TABLE IF NOT EXISTS BikeTrips ( bike_id text, trip_id text, PRIMARY KEY (bike_id, trip_id));

Flaw: All trips for a bike will be stored in the same partition

(row will grow unbounded)

CREATE TABLE IF NOT EXISTS BikeTrips ( bike_id text, trip_id text, PRIMARY KEY (bike_id, trip_id));

Solution: Create artificial bucketCREATE TABLE IF NOT EXISTS BikeTrips ( bike_id text, bucket int, trip_id text, PRIMARY KEY ((bike_id, bucket), trip_id));

Flaw: All trips for a bike will be stored in the same partition

(row will grow unbounded)

CREATE TABLE IF NOT EXISTS BikeTrips ( bike_id text, bucket int, trip_id text, PRIMARY KEY ((bike_id, bucket), trip_id));

Must specify all parts on SELECT

SELECT * FROM BikeTrips WHERE bike_id = 1 AND bucket = 0;

Bikes Customers Stations Trips

c Noah Berger, Flickr

- List all trips a bike has been on. - List all trips a customer has

taken.

CREATE TABLE IF NOT EXISTS CustomerTrips ( customer_id text, trip_id text, PRIMARY KEY (customer_id, trip_id));

Rows will not be as wide as BikeTrips

Bikes Customers Stations Trips

c Noah Berger, Flickr

- List all trips a bike has been on. - List all trips a customer has

taken. - Show details of a particular trip

(duration, distance traveled).

CREATE TABLE IF NOT EXISTS trip ( trip_id text, customer_id text static, bike_id text static, started_at timestamp static, stopped_at timestamp static, sequence timestamp, latitude decimal, longitude decimal, delta_distance double, PRIMARY KEY (trip_id, sequence)) WITH CLUSTERING ORDER BY (sequence DESC);

!

SELECT * FROM trip WHERE trip_id = 'trip1';! trip_id | sequence | bike_id | customer_id | started_at | stopped_at | delta_distance | latitude | longitude---------+--------------------------+---------+-------------+--------------------------+--------------------------+----------------+-------------+----------- trip1 | 2014-08-10 06:10:05+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 8.7951 | -122.405319 | 37.796936 trip1 | 2014-08-10 06:10:00+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 15.381 | -122.403347 | 37.795535 trip1 | 2014-08-10 06:09:55+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 0 | -122.403347 | 37.795535 trip1 | 2014-08-10 06:09:50+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 10.557 | -122.401702 | 37.795731 trip1 | 2014-08-10 06:09:45+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 0 | -122.401702 | 37.795731 trip1 | 2014-08-10 06:09:40+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 35.282 | -122.400589 | 37.790268 ... trip1 | 2014-08-10 06:08:45+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 6.1672 | -122.414782 | 37.771255 trip1 | 2014-08-10 06:08:40+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 2.6682 | -122.415047 | 37.770929 trip1 | 2014-08-10 06:08:35+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 2.9604 | -122.415287 | 37.770529 trip1 | 2014-08-10 06:08:30+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 2.775 | -122.41544 | 37.770119 trip1 | 2014-08-10 06:08:25+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 5.7684 | -122.41566 | 37.769236 trip1 | 2014-08-10 06:08:20+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 3.1183 | -122.415669 | 37.768744 trip1 | 2014-08-10 06:08:15+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 93.217 | -122.414251 | 37.754102 trip1 | 2014-08-10 06:08:10+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 0 | -122.414251 | 37.754102 trip1 | 2014-08-10 06:08:05+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 31.664 | -122.409291 | 37.754393 trip1 | 2014-08-10 06:08:00+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 0 | -122.409291 | 37.754393 trip1 | 2014-08-10 06:07:55+0100 | bike15 | customer3 | 2014-08-10 06:07:55+0100 | 2014-08-10 06:07:55+0100 | 0.54761 | -122.409282 | 37.754307!(27 rows)

CREATE TABLE IF NOT EXISTS trip ( trip_id text, customer_id text static, bike_id text static, started_at timestamp static, stopped_at timestamp static, sequence timestamp, latitude decimal, longitude decimal, delta_distance double, PRIMARY KEY (trip_id, sequence)) WITH CLUSTERING ORDER BY (sequence DESC);

!

SELECT sequence, latitude, longitude FROM trip WHERE trip_id = 'trip1' AND sequence > '2014-08-10 06:09:00+0100';! sequence | latitude | longitude--------------------------+-------------+----------- 2014-08-10 06:10:05+0100 | -122.405319 | 37.796936 2014-08-10 06:10:00+0100 | -122.403347 | 37.795535 2014-08-10 06:09:55+0100 | -122.403347 | 37.795535 2014-08-10 06:09:50+0100 | -122.401702 | 37.795731 2014-08-10 06:09:45+0100 | -122.401702 | 37.795731 2014-08-10 06:09:40+0100 | -122.400589 | 37.790268 2014-08-10 06:09:35+0100 | -122.400589 | 37.790268 2014-08-10 06:09:30+0100 | -122.400404 | 37.790241 2014-08-10 06:09:25+0100 | -122.400359 | 37.790128 2014-08-10 06:09:20+0100 | -122.400359 | 37.790128 2014-08-10 06:09:15+0100 | -122.408092 | 37.784008 2014-08-10 06:09:10+0100 | -122.408092 | 37.784008 2014-08-10 06:09:05+0100 | -122.403416 | 37.780284

Use comparator for data type

CREATE TABLE IF NOT EXISTS trip ( trip_id text, customer_id text static, bike_id text static, started_at timestamp static, stopped_at timestamp static, sequence timestamp, latitude decimal, longitude decimal, delta_distance double, PRIMARY KEY (trip_id, sequence)) WITH CLUSTERING ORDER BY (sequence DESC);

CREATE TABLE IF NOT EXISTS trip ( trip_id text, customer_id text static, bike_id text static, started_at timestamp static, stopped_at timestamp static, sequence timestamp, latitude decimal, longitude decimal, delta_distance double, PRIMARY KEY (trip_id, sequence)) WITH CLUSTERING ORDER BY (sequence DESC);

Recap

Recap• There is hope

Recap• There is hope • Identify usage

Recap• There is hope • Identify usage • Be mindful of storage engine

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Patricia Gorla @patriciagorla !

www.thelastpickle.com

Q&A

top related