cassandra day sv 2014: apache cassandra at equinix for high performance, scalability and short...

24
CONFIDENTIAL 1 Praveen Kumar Emerging Software Platforms, Global Software Engineering Mar 2014 Equinix Big Data Platform & Cassandra

Upload: planet-cassandra

Post on 15-Jan-2015

301 views

Category:

Technology


2 download

DESCRIPTION

In this session, Praveen will be presenting Equinix's big data platform and how Cassandra sits at the center of it.

TRANSCRIPT

Page 1: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

CONFIDENTIAL 1

Praveen Kumar Emerging Software Platforms, Global Software Engineering

Mar 2014

Equinix Big Data Platform & Cassandra

Page 2: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 2

Big Data at Equinix

~2 million Alarms

~200k interconnections

~250k Electrical circuits

Sensors across 95+ IBXs

~40k Infrastructure objects

Page 3: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 3

Big Data at Equinix

Sensors across 95+ IBXs

Lead to / produce

Support for multiple protocols Push as well pull methods

Time series data Cross sectional data Not so clean data

High velocity

Clean data Lots and lots of noise

Some useful intel

Page 4: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 4

Big Data at Equinix

What do we use(or plan to use) this data for?

Customer Presentment Billing

Operations New Product & Services

Page 5: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 5

Big Data at Equinix

Use-case analysis : 80-20 rule

~80% of use-cases analyzed act upon “Hot Data”

~80% of data for most of use-cases analyzed is time-series.

All “quick win” use-cases need data mediation, aggregation and roll-up for presentment.

Real-time to near real-time processing of events

Collection, processing and storage technologies suitable for time-series data.

Collection, mediation, cross-referencing and co-relation of data from different sources; roll-up and aggregate.

Page 6: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 6

Big Data at Equinix

Our Approach : Equinix Big Data Platform

§  Common platform to be shared by all initial Big Data use cases – multi tenancy

§  Built on inexpensive hardware using free or inexpensive software

§  Seamless & massive scalability using scale-out

§  High reliability - partial failover, graceful degradation, self-healing, self-balancing

§  Data ingestion and processing capabilities for high volumes at high velocity

§  Support for structured and semi-structured data

§  Provides real-time processing abilities

§  Provides parallel processing capabilities

§  Support for low latency queries, wide range scan queries and search

§  Provides abstraction via connectors, frameworks and libraries

§  Support for low latency queries, wide range scan queries and search

§  Support for predictive analytics using machine learning

Immediate requirements

Long term goals

Big Data Platform - Logical Architecture (technology agnostic)

Page 7: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 7

Big Data at Equinix

Requirements & Technologies considered for Big Data Platform

Page 8: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 8

Big Data at Equinix

Grand Finale Hadoop Ecosystem vs. DataStax Enterprise

SearchSearch

SearchSearch

AnalyticsAnalytics

StorageStorageAnalyticsAnalytics

StorageStorage

StorageStorage

Hadoop  Distributed  File  System(Storage/Analytics)

NameNode Secondary  Name  Node

Data  Nodes  (Storage)

HBase  (Storage/Analytics)

Hbase  Master

Hbase  Region  Servers

Hbase  Master

Search

Management  Services

Cloudera  Manager

Solr  Nodes

Zookeeper

Pros •  Scalability •  Cloud readiness •  Resource availability •  Industry momentum •  Product eco-system

maturity •  Technical support

Cons •  Infrastructure footprint •  Operational Complexity •  Learning curve •  Availability •  Total cost of ownership

Pros •  Infrastructure footprint •  Operational ease •  Scalability •  Availability •  Cloud readiness •  Learning curve •  Resource availability •  Technical support •  Total cost of ownership

Cons •  Industry momentum •  Product eco-system

maturity

Page 9: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 9

Criteria   Cassandra   HBase  CAP Theorem Focus Availability, Partition-Tolerance Consistency, Availability

Data Partitioning Supports ordered & random partitioning, random partitioning is recommended.

Ordered Partitioning. Load balancing achieved through resharding.

Distributed System P2P architecture (Amazon Dynamo) Master / Slave via HDFS, Zookeeper for coordination

Administration & Maintenance Medium High

Single Write Master No (R+W+1 to get Strong Consistency) Yes

Multi-tenancy Yes Yes

Secondary indexes Supports secondary indexes on CF where column name is known. Does not natively support secondary indexes.

Consistency Tunable Consistency Strict consistency (Not ACID)

Hot Spot Problem No, distributes load across nodes using random partition strategy.

Yes, one node may handle most of the traffic due to ordered partition.

Multi-Data Center Support and Disaster Recovery Asynchronous replication via WAN Asynchronous replication via WAN

Single point of failure Ring topology, there is no single point of failure.

Although there exists a concept of a master server, HBase itself does not depend on it heavily. HBase cluster can keep serving data even if the master goes down. Hadoop namenode is a single point of failure.

Commercial vendors Datastax, Acunu Clodera, Hortonworks

Cassandra Vs. HBase Big Data at Equinix

Page 10: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 10

Why DSE Cassandra Big Data at Equinix

Support for Analytics Integrated search using Solr Security features Cluster management capabilities Commercial support

DataStax would probably list lots of more reasons, these are the reasons relevant to us.

Page 11: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 11

Big Data at Equinix

Grand Finale Hadoop Ecosystem vs. DataStax Enterprise

SearchSearch

SearchSearch

AnalyticsAnalytics

StorageStorageAnalyticsAnalytics

StorageStorage

StorageStorage

Hadoop  Distributed  File  System(Storage/Analytics)

NameNode Secondary  Name  Node

Data  Nodes  (Storage)

HBase  (Storage/Analytics)

Hbase  Master

Hbase  Region  Servers

Hbase  Master

Search

Management  Services

Cloudera  Manager

Solr  Nodes

Zookeeper

Pros •  Scalability •  Cloud readiness •  Resource availability •  Industry momentum •  Product eco-system

maturity •  Technical support

Cons •  Infrastructure footprint •  Operational Complexity •  Learning curve •  Availability •  Total cost of ownership

Pros •  Infrastructure footprint •  Operational ease •  Scalability •  Availability •  Cloud readiness •  Learning curve •  Resource availability •  Technical support •  Total cost of ownership

Cons •  Industry momentum •  Product eco-system

maturity

ü Sold

Page 12: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 12

Big Data at Equinix

How far are we on our Big Data journey?

ü  Pilot use-case from PoC to Production

ü  Moved network statistics use case from RRD based solution to DSE Cassandra

ü  Build in progress for

§  power monitoring use cases §  data center monitoring §  network monitoring

In-plans Ø  Recommendation engine on interconnection

platform Ø  Use case analysis and technology selection for

connected data sets Ø  Building data science capabilities for use cases

requiring predictive modeling

A few data points

Physical bare metal boxes for DSE nodes Densely packed data nodes with 4TB storage on each node, 96GB RAM About ~250 million records a day Also used for log analysis for internal IT systems monitoring use-cases

Page 13: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 13

Big Data at Equinix

Experience so far

Lack of standards based connectors / drivers DataStax has developed a Java Driver, but doesn’t support JDBC No data visualization tools to access from Cassandra for low-latency access No data access tools (Toad equivalent) available yet; DevCenter is not there yet

We used Astyanax and are evaluating DataStax java driver built libraries to abstract Astyanax for application engineering teams built rest services for data access by applications

Good reliability Not many instances of nodes being down Handled loads even when nodes were down

Page 14: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 14

Big Data at Equinix

Where do we go from here??

Graph databases Batch processing (Hadoop, Spark , MapReduce ??) Interactive queries Online data processing Data analytics Data science and machine learning Data visualization tools and applications Developer toolkits

We are hiring

Big Data Architect Big Data Engineers

Data Scientists

send resume at [email protected]

Page 15: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

CONFIDENTIAL 15

Thank you!

•  [email protected] •  [email protected] •  www.equinix.com

Page 16: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

EQUINIX?

Page 17: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 17

WHO IS EQUINIX?

Page 18: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 18

GLOBAL DATA CENTERS 95+ Data Centers 9M+ Square Feet

99.999% Uptime Record

INTERCONNECTION 950+ Networks

110,000+ Cross Connects

BUSINESS ECOSYSTEMS Equinix Marketplace™ 4,000+ Businesses Revenue Opportunities

MOVING TOWARDS THE FUTURE | PLATFORM

Equinix: A Platform for Growth

Page 19: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Solid. Powerful. Growing.

$1.8B IN ANNUALIZED

REVENUE

MEMBER OF THE NASDAQ 100

$7B INVESTMENTS IN EXPANSION

Page 20: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

15 COUNTRIES 5 CONTINENTS

31 MARKETS

Page 21: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

Confidential – © 2013 Equinix Inc. www.equinix.com 21

HOW WE’RE DIFFERENT | GLOBAL FOOTPRINT

Where You Are. Where You Need To Be.

Page 22: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

90% PASS THROUGH EQUINIX DATA CENTERS

OVER

OF INTERNET ROUTES

950+ NETWORK PROVIDERS

Page 23: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

450+ CLOUD & SaaS

PROVIDERS

Page 24: Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

CONFIDENTIAL 24

Thank you!

•  [email protected] •  [email protected] •  www.equinix.com