ljc conference 2014 cassandra for java developers

39
©2013 DataStax Confidential. Do not distribute without consent. Christopher Batey @chbatey Building awesome applications with Apache Cassandra 1

Upload: christopher-batey

Post on 10-Jul-2015

1.440 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: LJC Conference 2014 Cassandra for Java Developers

©2013 DataStax Confidential. Do not distribute without consent.

Christopher Batey @chbatey

Building awesome applications with Apache Cassandra

1

Page 2: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Who am I?•Technical Evangelist for Apache Cassandra

• Founder of Stubbed Cassandra• Help out Apache Cassandra users

• Previous: Cassandra backed apps at BSkyB

Page 3: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Overview• Topics covered

• Cassandra overview• Customer events example

• DataStax Java Driver• Java Mapping API

• Other features• Light weight transactions• Load balancing• Reconnection policies

Page 4: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Overview• Topics covered

• Cassandra overview• Customer events example

• DataStax Java Driver• Java Mapping API

• Other features• Light weight transactions• Load balancing• Reconnection policies

• Not covered• Cassandra read and write paths• Cassandra failure nodes

Page 5: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Common use cases•Ordered data such as time series•Event stores•Financial transactions•Sensor data e.g IoT

Page 6: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Common use cases•Ordered data such as time series•Event stores•Financial transactions•Sensor data e.g IoT

•Non functional requirements:•Linear scalability•High throughout durable writes•Multi datacenter including active-active•Analytics without ETL

Page 7: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Cassandra overview

Page 8: LJC Conference 2014 Cassandra for Java Developers

Cassandra

Cassandra

• Distributed master less database (Dynamo)

• Column family data model (Google BigTable)

Page 9: LJC Conference 2014 Cassandra for Java Developers

Cassandra

Europe

• Distributed master less database (Dynamo)

• Column family data model (Google BigTable)

• Multi data centre replication built in from the start

USA

Page 10: LJC Conference 2014 Cassandra for Java Developers

Cassandra

Online

• Distributed master less database (Dynamo)

• Column family data model (Google BigTable)

• Multi data centre replication built in from the start

• Analytics with Apache Spark

Analytics

Page 11: LJC Conference 2014 Cassandra for Java Developers

Replication

DC1 DC2

client

RF3 RF3

C

RC

WRITECL = 1 We have replication!

Page 12: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Tunable Consistency• Data is replicated N times• Every query that you execute you give a consistency

• ALL• QUORUM• LOCAL_QUORUM• ONE

• Christos Kalantzis Eventual Consistency != Hopeful Consistency: http://youtu.be/A6qzx_HE3EU?list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU

Page 13: LJC Conference 2014 Cassandra for Java Developers

@chbatey

CQL•Cassandra Query Language•SQL like query language

•Keyspace – analogous to a schema• The keyspace determines the RF (replication factor)

•Table – looks like a SQL Table CREATE TABLE scores ( name text, score int, date timestamp, PRIMARY KEY (name, score) );

INSERT INTO scores (name, score, date) VALUES ('bob', 42, '2012-06-24'); INSERT INTO scores (name, score, date) VALUES ('bob', 47, '2012-06-25');

SELECT date, score FROM scores WHERE name='bob' AND score >= 40;

Page 14: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Example Time: Customer event store

Page 15: LJC Conference 2014 Cassandra for Java Developers

An example: Customer event store• Customer event

• customer_id - ChrisBatey• staff_id - Charlie• store_type Website, PhoneApp, Phone, Retail• event_type - login, logout, add_to_basket, remove_from_basket, buy_item

• time• tags

Page 16: LJC Conference 2014 Cassandra for Java Developers

Requirements• Get all events • Get all events for a particular customer• As above for a time slice

Page 17: LJC Conference 2014 Cassandra for Java Developers

Modelling in CassandraCREATE TABLE customer_events(

customer_id text, staff_id text, time timeuuid, store_type text, event_type text, tags map<text, text>, PRIMARY KEY ((customer_id), time));

Partition Key

Clustering Column(s)

Page 18: LJC Conference 2014 Cassandra for Java Developers

How it is stored on diskcustomer

_idtime event_type store_type tags

charles 2014-11-18 16:52:04 basket_add online {'item': 'coffee'}

charles 2014-11-18 16:53:00 basket_add online {'item': ‘wine'}

charles 2014-11-18 16:53:09 logout online {}chbatey 2014-11-18 16:52:21 login online {}chbatey 2014-11-18 16:53:21 basket_add online {'item': 'coffee'}

chbatey 2014-11-18 16:54:00 basket_add online {'item': 'cheese'}

charles event_typebasket_add

staff_idn/a

store_typeonline

tags:itemcoffee

event_typebasket_add

staff_idn/a

store_typeonline

tags:itemwine

event_typelogout

staff_idn/a

store_typeonline

chbatey event_typelogin

staff_idn/a

store_typeonline

event_typebasket_add

staff_idn/a

store_typeonline

tags:itemcoffee

event_typebasket_add

staff_idn/a

store_typeonline

tags:itemcheese

Page 19: LJC Conference 2014 Cassandra for Java Developers

@chbatey

DataStax Java Driver• Open source

Page 20: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Get all the eventspublic List<CustomerEvent> getAllCustomerEvents() { return session.execute("select * from customers.customer_events") .all().stream() .map(mapCustomerEvent()) .collect(Collectors.toList()); }

private Function<Row, CustomerEvent> mapCustomerEvent() { return row -> new CustomerEvent( row.getString("customer_id"), row.getUUID("time"), row.getString("staff_id"), row.getString("store_type"), row.getString("event_type"), row.getMap("tags", String.class, String.class)); }

Page 21: LJC Conference 2014 Cassandra for Java Developers

@chbatey

All events for a particular customerprivate PreparedStatement getEventsForCustomer; @PostConstructpublic void prepareSatements() { getEventsForCustomer = session.prepare("select * from customers.customer_events where customer_id = ?"); } public List<CustomerEvent> getCustomerEvents(String customerId) { BoundStatement boundStatement = getEventsForCustomer.bind(customerId); return session.execute(boundStatement) .all().stream() .map(mapCustomerEvent()) .collect(Collectors.toList()); }

Page 22: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Customer events for a time slicepublic List<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime, long endTime) { Select.Where getCustomers = QueryBuilder.select() .all() .from("customers", "customer_events") .where(eq("customer_id", customerId)) .and(gt("time", UUIDs.startOf(startTime))) .and(lt("time", UUIDs.endOf(endTime))); return session.execute(getCustomers).all().stream() .map(mapCustomerEvent()) .collect(Collectors.toList()); }

Page 23: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Mapping API@Table(keyspace = "customers", name = "customer_events") public class CustomerEvent { @PartitionKey @Column(name = "customer_id") private String customerId; @ClusteringColumn private UUID time; @Column(name = "staff_id") private String staffId; @Column(name = "store_type") private String storeType; @Column(name = "event_type") private String eventType; private Map<String, String> tags;

// ctr / getters etc }

Page 24: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Mapping API@Accessorpublic interface CustomerEventDao { @Query("select * from customers.customer_events where customer_id = :customerId") Result<CustomerEvent> getCustomerEvents(String customerId); @Query("select * from customers.customer_events") Result<CustomerEvent> getAllCustomerEvents(); @Query("select * from customers.customer_events where customer_id = :customerId and time > minTimeuuid(:startTime) and time < maxTimeuuid(:endTime)") Result<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime, long endTime); }

@Beanpublic CustomerEventDao customerEventDao() { MappingManager mappingManager = new MappingManager(session); return mappingManager.createAccessor(CustomerEventDao.class); }

Page 25: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Adding some type safetypublic enum StoreType { ONLINE, RETAIL, FRANCHISE, MOBILE}

@Table(keyspace = "customers", name = "customer_events") public class CustomerEvent { @PartitionKey @Column(name = "customer_id") private String customerId; @ClusteringColumn() private UUID time; @Column(name = "staff_id") private String staffId; @Column(name = "store_type") @Enumerated(EnumType.STRING) // could be EnumType.ORDINAL private StoreType storeType;

Page 26: LJC Conference 2014 Cassandra for Java Developers

@chbatey

User defined typescreate TYPE store (name text, type text, postcode text) ; CREATE TABLE customer_events_type( customer_id text, staff_id text, time timeuuid, store frozen<store>, event_type text, tags map<text, text>, PRIMARY KEY ((customer_id), time));

Page 27: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Mapping user defined types@UDT(keyspace = "customers", name = "store") public class Store { private String name; private StoreType type; private String postcode; // getters etc }

@Table(keyspace = "customers", name = "customer_events_type") public class CustomerEventType { @PartitionKey @Column(name = "customer_id") private String customerId; @ClusteringColumn() private UUID time; @Column(name = "staff_id") private String staffId; @Frozen private Store store; @Column(name = "event_type") private String eventType; private Map<String, String> tags;

Page 28: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Mapping user defined types@UDT(keyspace = "customers", name = "store") public class Store { private String name; private StoreType type; private String postcode; // getters etc }

@Table(keyspace = "customers", name = "customer_events_type") public class CustomerEventType { @PartitionKey @Column(name = "customer_id") private String customerId; @ClusteringColumn() private UUID time; @Column(name = "staff_id") private String staffId; @Frozen private Store store; @Column(name = "event_type") private String eventType; private Map<String, String> tags;

@Query("select * from customers.customer_events_type") Result<CustomerEventType> getAllCustomerEventsWithStoreType();

Page 29: LJC Conference 2014 Cassandra for Java Developers

@chbatey

What else can I do?

Page 30: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Lightweight Transactions (LWT)Consequences of Lightweight Transactions 4 round trips vs. 1 for normal updates (uses Paxos algorithm) Operations are done on a per-partition basisWill be going across data centres to obtain consensus (unless you use LOCAL_SERIAL consistency)Cassandra user will need read and write access i.e. you get back the row!

Great for 1% your app, but eventual consistency is still your friend!

Page 31: LJC Conference 2014 Cassandra for Java Developers

@chbateyCompany Confidential

Batch StatementsBEGIN BATCH INSERT INTO users (userID, password, name) VALUES ('user2', 'ch@ngem3b', 'second user') UPDATE users SET password = 'ps22dhds' WHERE userID = 'user2' INSERT INTO users (userID, password) VALUES ('user3', 'ch@ngem3c') DELETE name FROM users WHERE userID = 'user2’APPLY BATCH;

BATCH statement combines multiple INSERT, UPDATE, and DELETE statements into a single logical operationAtomic operationIf any statement in the batch succeeds, all will

No batch isolationOther “transactions” can read and write data being affected by a partially executed batch

© 2014 DataStax, All Rights Reserved.

Page 32: LJC Conference 2014 Cassandra for Java Developers

@chbateyCompany Confidential

Batch Statements with LWTBEGIN BATCH

UPDATE foo SET z = 1 WHERE x = 'a' AND y = 1; UPDATE foo SET z = 2 WHERE x = 'a' AND y = 2 IF t = 4;

APPLY BATCH;

Allows you to group multiple conditional updates in a batch as long as all those updates apply to the same partition

© 2014 DataStax, All Rights Reserved.

Page 33: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Load balancing• Data centre aware policy• Token aware policy• Latency aware policy• Whitelist policy APP APP

Async Replication

DC1 DC2

Page 34: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Load balancing• Data centre aware policy• Token aware policy• Latency aware policy• Whitelist policy APP APP

Async Replication

DC1 DC2

Page 35: LJC Conference 2014 Cassandra for Java Developers

@chbatey©2014 DataStax. Do not distribute without consent.

Reconnection Policies• Policy that decides how often the reconnection to a dead node is

attempted.

Cluster cluster = Cluster.builder() .addContactPoints("127.0.0.1", "127.0.0.2") .withReconnectionPolicy(new ConstantReconnectionPolicy(1000)) .withLoadBalancingPolicy(new TokenAwarePolicy()) .build();

• ConstantReconnectionPolicy• ExponentialReconnectionPolicy (Default)

Page 36: LJC Conference 2014 Cassandra for Java Developers

@chbatey©2014 DataStax. Do not distribute without consent.

Reconnection Policies• Policy that decides how often the reconnection to a dead node is

attempted.

Cluster cluster = Cluster.builder() .addContactPoints("127.0.0.1", "127.0.0.2") .withReconnectionPolicy(new ConstantReconnectionPolicy(1000)) .withLoadBalancingPolicy(new TokenAwarePolicy()) .build();

• ConstantReconnectionPolicy• ExponentialReconnectionPolicy (Default)

Page 37: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Summary• Cassandra overview• Customer events example• DataStax Java Driver• Java Mapping API

• Other features• Light weight transactions• Load balancing• Reconnection policies

Page 38: LJC Conference 2014 Cassandra for Java Developers

@chbatey

Thanks for listening• Badger me on twitter @chbatey• https://github.com/chbatey/cassandra-customer-events• https://academy.datastax.com/• http://christopher-batey.blogspot.co.uk/

Page 39: LJC Conference 2014 Cassandra for Java Developers

Company Confidential© 2014 DataStax, All Rights Reserved. 39

Training Day | December 3rd

Beginner Track • Introduction to Cassandra • Introduction to Spark, Shark, Scala and Cassandra

Advanced Track • Data Modeling • Performance Tuning

Conference Day | December 4th Cassandra Summit Europe 2014 will be the single largest gathering of Cassandra users in Europe. Learn how the world's most successful companies are transforming their businesses and growing faster than ever using Apache Cassandra.

http://bit.ly/cassandrasummit2014