coursera's adoption of cassandra

Post on 11-Feb-2017

662 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Coursera’s Adoption of Cassandra

Biography

Daniel Chia @DanielJHChia

Software Engineer, Infrastructure Team

2© 2015. All Rights Reserved.

© 2015. All Rights Reserved.

1 Introduction

2 Want We Want From Our Database

3 MySQL Limitations

4 Cassandra - What and Why

5 Looking Back

Coursera

4© 2015. All Rights Reserved.

5© 2015. All Rights Reserved.

6© 2015. All Rights Reserved.

Web iOS Android

Database Wants

7© 2015. All Rights Reserved.

Consistently Fast Latencies

8© 2015. All Rights Reserved.

Availability

9© 2015. All Rights Reserved.

Scalability

10© 2015. All Rights Reserved.

Other Niceties

• Operational ease • Multi-region capability

11© 2015. All Rights Reserved.

Coursera Tech Stack

• 100% AWS • MySQL + Cassandra • Service-oriented

12© 2015. All Rights Reserved.

RDS Challenges

• Normalized data model ⇒ Unpredictable query performance

• Scaling by sharding not ideal

• Single master limitation

13© 2015. All Rights Reserved.

C*• Columnar model • Tunable consistency • Fast • Horizontally scalable • Great community

14© 2015. All Rights Reserved.

15© 2015. All Rights Reserved.

Looking Back

Cassandra - Initial Pain Points

• Can’t execute arbitrary queries • Filtering, sorting, etc.

• Can’t be abused as an OLAP database

• Worries about ‘eventual’ consistency

16© 2015. All Rights Reserved.

SQL ⇒ NoSQL Mindset Shift

• Build in-house Cassandra expertise

• Data modeling still important

• Know your queries

17© 2015. All Rights Reserved.

Cassandra ≠ [database XYZ]

“But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid.”

-Albert Einstein

18© 2015. All Rights Reserved.

Enrollment Example

• Learners enroll into a course • learner (many-to-many) course

• Need to track this membership

19© 2015. All Rights Reserved.

MySQL

CREATE TABLE `courses_learners` (

`id` INT(11) NOT NULL auto_increment,

`course_id` INT(11) NOT NULL,

`learner_id` INT(11) NOT NULL,

PRIMARY KEY (`id`),

UNIQUE KEY `c_l` (`learner_id`, `course_id`),

CONSTRAINT `ref1` FOREIGN KEY (`course_id`)

CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)

)

20© 2015. All Rights Reserved.

MySQL

CREATE TABLE `courses_learners` (

`id` INT(11) NOT NULL auto_increment,

`course_id` INT(11) NOT NULL,

`learner_id` INT(11) NOT NULL,

PRIMARY KEY (`id`),

UNIQUE KEY `c_l` (`learner_id`, `course_id`),

CONSTRAINT `ref1` FOREIGN KEY (`course_id`)

CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)

)

21© 2015. All Rights Reserved.

MySQL

CREATE TABLE `courses_learners` (

`id` INT(11) NOT NULL auto_increment,

`course_id` INT(11) NOT NULL,

`learner_id` INT(11) NOT NULL,

PRIMARY KEY (`id`),

UNIQUE KEY `c_l` (`learner_id`, `course_id`),

CONSTRAINT `ref1` FOREIGN KEY (`course_id`)

CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)

)

22© 2015. All Rights Reserved.

MySQL

CREATE TABLE `courses_learners` (

`id` INT(11) NOT NULL auto_increment,

`course_id` INT(11) NOT NULL,

`learner_id` INT(11) NOT NULL,

PRIMARY KEY (`id`),

UNIQUE KEY `c_l` (`learner_id`, `course_id`),

CONSTRAINT `ref1` FOREIGN KEY (`course_id`)

CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)

)

23© 2015. All Rights Reserved.

Cassandra

CREATE TABLE courses_by_learner (

learner_id uuid,

course_id uuid,

PRIMARY KEY (learner_id, course_id)

)

24© 2015. All Rights Reserved.

Helpful Things

• Data modeling consulting

• Monitoring

• Data access layer for common use cases

25© 2015. All Rights Reserved.

Gotchas

• Lots of truly ad-hoc queries is hard • Don’t use C* directly to explore your data. (Spark?)

• Sorting, filtering can be hard • Consider Solr / ElasticSearch • Or even MySQL depending on load / importance

26© 2015. All Rights Reserved.

Thank you

top related