mesosphere // data driven nyc // september 2014
DESCRIPTION
Mesosphere Co-Founder and CTO Tobi Knaup presented at September 2014's edition of Data Driven NYC. Mesosphere turns data centers into one big computer.TRANSCRIPT
Brand GuidelinesVersion 110.21.2013
Introduction to the Mesosphere Stack
Tobi Knaup [email protected] @superguenter
Who am I?
Co-Founder & CTO at Mesosphere
Engineer 4 at Airbnb, tech lead, built infrastructure/team
Grad school work on sentiment detection with big(ger) data
Imagine if…
All your servers
were pooled together
So they behave like one big computer
as easy as
…and building new
datacenter apps is as easy as building an
app for one machine
PowerBook G4
Fantasy?
Datacenter Operating System
Brand GuidelinesVersion 110.21.2013
Why should I care?
Big data analysts today use a multitude of tools
Mesos makes it easy to run them all
!
Building big data tools means building distributed systems
Mesos is an SDK for building distributed systems
!
Tuning clusters for efficiency is a lot of work
Mesos automatically bin-packs apps to increase utilization
Applications in the Cloud Era
Cloud Era:Big apps, small servers
Client-Server Era:Small apps, big servers
Server
Virtualization
App App App AppApp
Aggregation
Serv Serv Serv Serv
Enterprise Technologies pioneered by Google
Big Data MapReduce (2004) ➞ Hadoop (2005) ➞ Cloudera (2008)
NoSQL Bigtable (2006) ➞ Cassandra (2008) ➞ DataStax (2010)
Datacenter Operating System The Datacenter as a Computer (2009) ➞ Mesos (2009) ➞ Mesosphere (2013)
The UNIX Operating System Stack
SSHd
Linux, BSD
MySQLApache
Kernel
ApplicationsMemcached
Init, Upstart, Systemd Init System
The Mesosphere Operating System Stack
Memcached
Mesos
RedisRails
Kernel
ApplicationsElasticsearch
Marathon Init System
The Mesosphere Operating System Stack
Kernel Mesos
Spark, Storm, Hadoop, ElasticSearch, MPI
Batch REST API“Chronos” (~cron)
Services REST API“Marathon” (~init)
Rails, Kafka, Play!(any that runs on Linux)
Mesos SDKJava, Python, C++, Go
Recurring JobsETL, Hadoop, Backups
API
Apps
Hardware Server Server Server Server Server Server Server
Native Long Running/Linux Batch/Linux
Mesos as a Distributed Operating System Kernel
Cluster level resource scheduler
Launch tasks across the cluster (like threads)
Failure detection
Communication between tasks (like IPC)
Distributed state
APIs for building “native” applications (frameworks)
Custom frameworks are elastic from the start
Easy failover and HA
Core Technology
Scales to 10,000s of nodes, production grade
Mesos is Top-level Apache project
Mesosphere, Twitter and Airbnb are major users / contributors
Built-in containerization, incl. Docker
Mesos - Frameworks
Chronos
Data Infrastructure @ Airbnb pre Chronos
DBDB
DBProd DB
Analytics DBAnalytics
DBRedShift
Elastic MapReduce
RedShiftRedShift
CRON
S3 PigHive Cascading Cascalog
create_omg_table-fact_ad_stats create_omg_table-dim_ad_groups
create_omg_table-ad_stats
create_tmp_users_visits_history_table
create_omg_table-dim_dates
create_airbed_dump_table_reservation2s
create_omg_table-dim_keyword_templates
create_airbed_dump_table_users
bookings_history
hostings_summary_2_quality_score
daily_omg_jobs-ad_stats_ad_weekly
daily_gibson-import_omg_tables
daily_omg_jobs-ad_stats_kw_summary
daily_gibson-default_data_cooked_table_import
daily_omg_jobs-ad_stats_combined
create_omg_table-ltv_user_revenue_yr1_fc
hostings_summary
hostings_impressions_normalize_prepare
high_level_report_hs_history
hostings_summary_1_pre
hostings_impressions_normalize
update_users_visits_history_table
daily_omg_jobs-omg_adwords_report_next30
create_users_visits_history_table
users_summary
daily_omg_jobs-ad_stats_af_market
Mesos - Frameworks
Problems
10,000 lines of CRON / BASH
“sleep 300”
AWS: unreliable network / timeouts
Cron - not HA, single node
DB Import / Export means heavy I/O (single node)
Debugging hard, 12 hrs / week!
Requirements for building Chronos
Existing tools too heavyweight
Express & visualize dependencies
Retries, fault tolerance, HA
Distributed, elastic
Raw BASH (for doing Hadoop, Rake, etc.)
Debugging hard, 12 hrs / week!
REST endpoint (easy to script against)
Building Chronos
Mesos framework
CRON for the data center
3K lines of Scala
ISO8601 interval notation (R10/2013-02-28T14:00:00Z/P1D)
Elastic, HA
No network programming
Architecture with Chronos
Mesos SlaveMesos Slave
ChronosChronos
Mesos SlaveMesos Slave
job
Mesos Master
DBDB
DBProd DB
Analytics DBAnalytics
DBRedShift
Elastic MapReduce
RedShiftRedShift
S3
PigHive Cascading Cascalog
Mesos Master
jobnotification
Deployments