cassandra day sv 2014: apache cassandra at equinix for high performance, scalability and short...
DESCRIPTION
In this session, Praveen will be presenting Equinix's big data platform and how Cassandra sits at the center of it.TRANSCRIPT
CONFIDENTIAL 1
Praveen Kumar Emerging Software Platforms, Global Software Engineering
Mar 2014
Equinix Big Data Platform & Cassandra
Confidential – © 2013 Equinix Inc. www.equinix.com 2
Big Data at Equinix
~2 million Alarms
~200k interconnections
~250k Electrical circuits
Sensors across 95+ IBXs
~40k Infrastructure objects
Confidential – © 2013 Equinix Inc. www.equinix.com 3
Big Data at Equinix
Sensors across 95+ IBXs
Lead to / produce
Support for multiple protocols Push as well pull methods
Time series data Cross sectional data Not so clean data
High velocity
Clean data Lots and lots of noise
Some useful intel
Confidential – © 2013 Equinix Inc. www.equinix.com 4
Big Data at Equinix
What do we use(or plan to use) this data for?
Customer Presentment Billing
Operations New Product & Services
Confidential – © 2013 Equinix Inc. www.equinix.com 5
Big Data at Equinix
Use-case analysis : 80-20 rule
~80% of use-cases analyzed act upon “Hot Data”
~80% of data for most of use-cases analyzed is time-series.
All “quick win” use-cases need data mediation, aggregation and roll-up for presentment.
Real-time to near real-time processing of events
Collection, processing and storage technologies suitable for time-series data.
Collection, mediation, cross-referencing and co-relation of data from different sources; roll-up and aggregate.
Confidential – © 2013 Equinix Inc. www.equinix.com 6
Big Data at Equinix
Our Approach : Equinix Big Data Platform
§ Common platform to be shared by all initial Big Data use cases – multi tenancy
§ Built on inexpensive hardware using free or inexpensive software
§ Seamless & massive scalability using scale-out
§ High reliability - partial failover, graceful degradation, self-healing, self-balancing
§ Data ingestion and processing capabilities for high volumes at high velocity
§ Support for structured and semi-structured data
§ Provides real-time processing abilities
§ Provides parallel processing capabilities
§ Support for low latency queries, wide range scan queries and search
§ Provides abstraction via connectors, frameworks and libraries
§ Support for low latency queries, wide range scan queries and search
§ Support for predictive analytics using machine learning
Immediate requirements
Long term goals
Big Data Platform - Logical Architecture (technology agnostic)
Confidential – © 2013 Equinix Inc. www.equinix.com 7
Big Data at Equinix
Requirements & Technologies considered for Big Data Platform
Confidential – © 2013 Equinix Inc. www.equinix.com 8
Big Data at Equinix
Grand Finale Hadoop Ecosystem vs. DataStax Enterprise
SearchSearch
SearchSearch
AnalyticsAnalytics
StorageStorageAnalyticsAnalytics
StorageStorage
StorageStorage
Hadoop Distributed File System(Storage/Analytics)
NameNode Secondary Name Node
Data Nodes (Storage)
HBase (Storage/Analytics)
Hbase Master
Hbase Region Servers
Hbase Master
Search
Management Services
Cloudera Manager
Solr Nodes
Zookeeper
Pros • Scalability • Cloud readiness • Resource availability • Industry momentum • Product eco-system
maturity • Technical support
Cons • Infrastructure footprint • Operational Complexity • Learning curve • Availability • Total cost of ownership
Pros • Infrastructure footprint • Operational ease • Scalability • Availability • Cloud readiness • Learning curve • Resource availability • Technical support • Total cost of ownership
Cons • Industry momentum • Product eco-system
maturity
Confidential – © 2013 Equinix Inc. www.equinix.com 9
Criteria Cassandra HBase CAP Theorem Focus Availability, Partition-Tolerance Consistency, Availability
Data Partitioning Supports ordered & random partitioning, random partitioning is recommended.
Ordered Partitioning. Load balancing achieved through resharding.
Distributed System P2P architecture (Amazon Dynamo) Master / Slave via HDFS, Zookeeper for coordination
Administration & Maintenance Medium High
Single Write Master No (R+W+1 to get Strong Consistency) Yes
Multi-tenancy Yes Yes
Secondary indexes Supports secondary indexes on CF where column name is known. Does not natively support secondary indexes.
Consistency Tunable Consistency Strict consistency (Not ACID)
Hot Spot Problem No, distributes load across nodes using random partition strategy.
Yes, one node may handle most of the traffic due to ordered partition.
Multi-Data Center Support and Disaster Recovery Asynchronous replication via WAN Asynchronous replication via WAN
Single point of failure Ring topology, there is no single point of failure.
Although there exists a concept of a master server, HBase itself does not depend on it heavily. HBase cluster can keep serving data even if the master goes down. Hadoop namenode is a single point of failure.
Commercial vendors Datastax, Acunu Clodera, Hortonworks
Cassandra Vs. HBase Big Data at Equinix
Confidential – © 2013 Equinix Inc. www.equinix.com 10
Why DSE Cassandra Big Data at Equinix
Support for Analytics Integrated search using Solr Security features Cluster management capabilities Commercial support
DataStax would probably list lots of more reasons, these are the reasons relevant to us.
Confidential – © 2013 Equinix Inc. www.equinix.com 11
Big Data at Equinix
Grand Finale Hadoop Ecosystem vs. DataStax Enterprise
SearchSearch
SearchSearch
AnalyticsAnalytics
StorageStorageAnalyticsAnalytics
StorageStorage
StorageStorage
Hadoop Distributed File System(Storage/Analytics)
NameNode Secondary Name Node
Data Nodes (Storage)
HBase (Storage/Analytics)
Hbase Master
Hbase Region Servers
Hbase Master
Search
Management Services
Cloudera Manager
Solr Nodes
Zookeeper
Pros • Scalability • Cloud readiness • Resource availability • Industry momentum • Product eco-system
maturity • Technical support
Cons • Infrastructure footprint • Operational Complexity • Learning curve • Availability • Total cost of ownership
Pros • Infrastructure footprint • Operational ease • Scalability • Availability • Cloud readiness • Learning curve • Resource availability • Technical support • Total cost of ownership
Cons • Industry momentum • Product eco-system
maturity
ü Sold
Confidential – © 2013 Equinix Inc. www.equinix.com 12
Big Data at Equinix
How far are we on our Big Data journey?
ü Pilot use-case from PoC to Production
ü Moved network statistics use case from RRD based solution to DSE Cassandra
ü Build in progress for
§ power monitoring use cases § data center monitoring § network monitoring
In-plans Ø Recommendation engine on interconnection
platform Ø Use case analysis and technology selection for
connected data sets Ø Building data science capabilities for use cases
requiring predictive modeling
A few data points
Physical bare metal boxes for DSE nodes Densely packed data nodes with 4TB storage on each node, 96GB RAM About ~250 million records a day Also used for log analysis for internal IT systems monitoring use-cases
Confidential – © 2013 Equinix Inc. www.equinix.com 13
Big Data at Equinix
Experience so far
Lack of standards based connectors / drivers DataStax has developed a Java Driver, but doesn’t support JDBC No data visualization tools to access from Cassandra for low-latency access No data access tools (Toad equivalent) available yet; DevCenter is not there yet
We used Astyanax and are evaluating DataStax java driver built libraries to abstract Astyanax for application engineering teams built rest services for data access by applications
Good reliability Not many instances of nodes being down Handled loads even when nodes were down
Confidential – © 2013 Equinix Inc. www.equinix.com 14
Big Data at Equinix
Where do we go from here??
Graph databases Batch processing (Hadoop, Spark , MapReduce ??) Interactive queries Online data processing Data analytics Data science and machine learning Data visualization tools and applications Developer toolkits
We are hiring
Big Data Architect Big Data Engineers
Data Scientists
send resume at [email protected]
EQUINIX?
Confidential – © 2013 Equinix Inc. www.equinix.com 17
WHO IS EQUINIX?
Confidential – © 2013 Equinix Inc. www.equinix.com 18
GLOBAL DATA CENTERS 95+ Data Centers 9M+ Square Feet
99.999% Uptime Record
INTERCONNECTION 950+ Networks
110,000+ Cross Connects
BUSINESS ECOSYSTEMS Equinix Marketplace™ 4,000+ Businesses Revenue Opportunities
MOVING TOWARDS THE FUTURE | PLATFORM
Equinix: A Platform for Growth
Solid. Powerful. Growing.
$1.8B IN ANNUALIZED
REVENUE
MEMBER OF THE NASDAQ 100
$7B INVESTMENTS IN EXPANSION
15 COUNTRIES 5 CONTINENTS
31 MARKETS
Confidential – © 2013 Equinix Inc. www.equinix.com 21
HOW WE’RE DIFFERENT | GLOBAL FOOTPRINT
Where You Are. Where You Need To Be.
90% PASS THROUGH EQUINIX DATA CENTERS
OVER
OF INTERNET ROUTES
950+ NETWORK PROVIDERS
450+ CLOUD & SaaS
PROVIDERS