apache cassandra in the cloud

16
Cassandra in the Cloud Adam Zegelin Co-founder and VP of Engineering @ Instaclustr. Sydney Tech Day instaclustr.com @Instaclustr

Upload: instaclustr

Post on 15-Aug-2015

82 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Apache Cassandra in the Cloud

Cassandra in the Cloud

Adam ZegelinCo-founder and VP of Engineering @ Instaclustr.

Sydney Tech Day

instaclustr.com @Instaclustr

Page 2: Apache Cassandra in the Cloud

Instaclustr• Instaclustr provides Cassandra-as-a-Service in the cloud.

• Aussie startup, based out of Canberra.Our CTO, Ben Bromhead, recently opened our Silicon Valley office. We’re soon to open an office in London.

• Real production experience — several customers in production.

• Currently run on Amazon Web Services, Microsoft Azure in private beta. In discussion with Google and IBM, more to come.

• DataStax are investors & partners.

Page 3: Apache Cassandra in the Cloud

• Ben and myself started as a different company — data hosting & market-place, running in the cloud.

• As a startup we wanted to use Cassandra in the cloud alongside our app, but not host & manage it

• Founded Instaclustr — now our full focus

• Instaclustr: Focus on writing your app, not managing database infrastructure

Page 4: Apache Cassandra in the Cloud

C* Model

• One database, many servers

• All servers (nodes) participate in the cluster

• Decentralised

• Need more capacity? Add more servers!

• Multiple servers ≣ built in redundancy

Page 5: Apache Cassandra in the Cloud

100,000ops/sec

200,000ops/sec

400,000ops/sec

Page 6: Apache Cassandra in the Cloud

client

0

4

28

0

4

28

client

Page 7: Apache Cassandra in the Cloud

Hosting C*

• Traditional — servers, racks, data centres

• Cloud — unlimited* compute resources

• Hybrid/cloud bursting — overflow into the cloud

Page 8: Apache Cassandra in the Cloud

Traditional Model• Buy or rent your own servers

• Manage hardware & software deployments and updates

• Slow time to market

• In-flexible

• Buy enough hardware to handle peak load upfront

• Requires good capacity planning

• Co-locate in a data centre Or build your own if your big enough (or have certain requirements)

Page 9: Apache Cassandra in the Cloud

☁• Pay for what you use

• Hardware is no-longer a concern

• Almost instant-on compute resources < 1 minute boot time

• Flexible

• Scale up and down with load

• Respond quickly to changes in capacity requirements

Page 10: Apache Cassandra in the Cloud

☁• Node replacement is easy

• Global DeploymentsC* nodes with data replica + app instance close to your users.

• Redundancy — not all your eggs in one basket Availability zones, regions, providers

• Split workloads — one DC for the app, one DC for analytics Reduce the performance impact of data analytics on user facing nodes

Page 11: Apache Cassandra in the Cloud

Hybrid/Cloud Bursting

• C* nodes in your own data centre and C* nodes in the cloud

• Both are C* replicated data centres

• Live backups & fail-over — faster data recovery

• On-demand extra capacity for planed peak loads

• Periodic computationally expensive analytics jobs

Page 12: Apache Cassandra in the Cloud

Amazon Web Services• Largest cloud provider — well supported (documentation, community,

value-add services)

• Multiple regions — Sydney, APAC, US, Europe

• Node sizes to fit all C* use cases

• SSD-backed nodes

• Virtual Private Clouds

• VPNs and VPC peering for isolated access

Page 13: Apache Cassandra in the Cloud

Gotchas

• Buggy APIs

• Noisy neighbours

• Data sovereignty & security

Page 14: Apache Cassandra in the Cloud

C* + ☁ + Instaclustr

• We run C* for metrics and log storage Also dog-fooding

• The scale of this cluster — both performance & storage — will grow as we manage more nodes More nodes = more data + more ops/sec

• Improved dev & testing — Every developer can run their own copy of our app + the monitoring cluster, on-demand

Page 15: Apache Cassandra in the Cloud

Case Study• Advertising company — recording click metrics and serve targeted advertisements

• Requires < 10 ms response time from C*

• Originally managed it themselves

• Managing their app and the C* cluster was a burden on their engineering team

• Switched over to Instaclustr

• Instaclustr + Cloud + C* is flexible.Their performance requirements changed and the cloud & C* allowed us to change the underlying virtual machines. In production. At runtime.