mesos at opentable
TRANSCRIPT
Mesos at OpenTable
Pablo Delgado Senior Data Engineer OpenTable @pablete
MesosCon 2015, Seattle, WA
• Over 32,000 restaurants worldwide
• more than 760 million diners seated since 1998, representing more than $30 billion spent at partner restaurants
• Over 16 million diners seated every month
• OpenTable has seated over 190 million diners via a mobile device. Almost 50% of our reservations are made via a mobile device
• OpenTable currently has presence in US, Canada, Mexico, UK, Germany and Japan
• OpenTable has nearly 600 partners including Facebook, Google, TripAdvisor, Urbanspoon, Yahoo and Zagat.
2
OpenTable the world’s leading provider of online restaurant reservations
At OpenTable
we aim to power
the best dining experiences!
Service Oriented Architecture
5
From monolith to microservices
6
• Mesos: A Platform for Fine-Grained Resource Sharing in the Data CenterPAPER: http://mesos.berkeley.edu/mesos_tech_report.pdf
• Omega: flexible, scalable schedulers for large compute clusters PAPER: http://research.google.com/pubs/pub41684.html
Apache Mesos
7
Apache Mesos• Mesos slaves connect to
masters and offer resources like CPU, disk, and memory.
• Masters take those offers and make decisions about resource allocation using frameworks like Singularity.
• Frameworks in turn choose to use resource offers, and run tasks on slaves.
8
Zookeeper
Netflix’s Exhibitor
Mesos Master
Zookeeper
Netflix’s Exhibitor
Standby Master
Zookeeper
Netflix’s Exhibitor
Standby Master
Docker
Mesos SlaveDocker
Mesos Slave
Docker
Mesos SlaveDocker
Mesos Slave
Docker
Mesos SlaveDocker
Mesos Slave
availability zone 2bavailability zone 2a availability zone 2c
Apache Mesos
Hubspot’s Singularity Scheduler
10
• Native Docker Support
• JSON REST API and Java Client
• Fully featured web application (replaces and improves Mesos Master UI)
• Deployments, automatic rollbacks, and healthchecks
• Configurable email alerts to service owners
Singularity Features
11
Hubspot’s SingularityProcess types:Web Services WorkersScheduled (CRON-type) JobsOn-Demand Processes
Slave placement:GREEDYSEPARATE_BY_DEPLOYSEPARATE_BY_REQUESTOPTIMISTIC
Executors:Mesos executorSingularity executorDocker executor
Linux Containers
13
Docker• Immutability
• Portability
• Isolation
Service Discovery
15
Services no longer live in a well known address/port, so we needed a registry or dynamic way to find them. Also it had to be MESOS agnostic.
• Service announce their presence to the Discovery Server
• Service subscribe to changes in dependencies announcement
• Service un-announce on termination or timeout on crash
Service Discovery
16
Zookeeper Zookeeper Zookeeper
availability zone 2bavailability zone 2a availability zone 2c
Service Discovery
Discovery Server Discovery Server Discovery Server
A
A
A
BB
Announce
Discover
Subscribe
17
Service Discovery API
FrontDoor
19
FrontDoor
• Route external traffic to internal services
• Simple Discovery-aware proxy
• Dynamic configuration
• Developer friendly configuration via Git repo
REQUEST_URI=/api/timezone* passthru timezone
Monitoring
21
Monitoring
https://github.com/opentable/mesos_stats
• Finds your service name by parsing the task names.
• Includes grafana dashboard
• Runs inside mesos
All together
23
Github
Continuous Integration
Singularity
Discovery
MasterZookeeper
MasterZookeeper
MasterZookeeper
SlaveDocker
SlaveDocker
SlaveDocker
SlaveDocker
SlaveDocker
SlaveDocker
FrontDoor
Docker Registry
Discovery
Discovery
Overview
24
Github
Continuous Integration
Singularity
Docker Registry
Developer’s Concerns
• Initialize projects with Continuous integration template
• Enable monitoring/logging of application level errors
• Build project as an immutable docker image
• Deploy to Mesos through singularity using a rest API
25
Singularity
Discovery
MasterZookeeper
MasterZookeeper
MasterZookeeper
SlaveDocker
SlaveDocker
SlaveDocker
SlaveDocker
SlaveDocker
SlaveDocker
FrontDoor
Docker Registry
Discovery
Discovery
Operational Concerns
• Provide Mesos with resources
• Monitor and maintain external traffic routing
• Monitor and replace failing resources
26
Stateless Mesos Cluster
Datastores
Caches
Stateless Simplicity
Other
Mysql, PostgreSQL, MongoDB
Redis, Memcached
Zookeeper, Amazon S3
27
US Data Center EU Data Center
AWS us-west-2 AWS eu-west-1 AWS us-west-2
PROD PROD
PROD PROD QA
DATA PROCESSING
28
US Data Center EU Data Center
AWS us-west-2 AWS eu-west-1 AWS us-west-2
PROD PROD
PROD PROD QA
DATA PROCESSING
Kafka Kafka
Kafka Kafka Kafka
Data Processing
30
Distributed Multitenant Data Processing
31
Spark’s Approach
• Generalize MapReduce in order to support new apps in the same engine
• General DAGs and Data Sharing
• Unification benefits the engine, which is more efficient, and simple for user
• Handles batch, interactive and online processing
• API available for Java, Scala, Python, SQL, R
32
Spark RDDs
Resilient Distributed Datasets (or RDD) are fault-tolerant distributed collections
They exists in the form of:
• Parallelized Collections
• External datasets, distributed datasets from any storage source supported by Hadoop, including your local file system, HDFS, Cassandra, HBase, Amazon S3, etc.
33
HadoopRDD(path(=(hdfs://...(
FilteredRDD(func(=(_.contains(…)(shouldCache(=(true(
file:%
errors:%
Partition.level%view:%Dataset.level%view:%
Task%1%Task%2% ...%
RDD GraphDataset-level view Partition-level view
file RDD
errors RDD
Task 1 Task 2 Task 3 Task n
34
Scheduling Process
rdd1.join(rdd2) .groupBy(…) .filter(…)
RDD#Objects#
build#operator#DAG!
agnos&c(to(operators!(
doesn’t(know(about(stages(
DAGScheduler#
split#graph#into#stages#of#tasks!
submit#each#stage#as#ready#
DAG#
TaskScheduler#
TaskSet#
launch#tasks#via#cluster#manager!
retry#failed#or#straggling#tasks!
Cluster#manager#
Worker#
execute#tasks!
store#and#serve#blocks!
Block(manager(
Threads(Task#
stage#failed#
Lifetime of a job. Scheduling Process
35
Scheduling Process
rdd1.join(rdd2) .groupBy(…) .filter(…)
RDD#Objects#
build#operator#DAG!
agnos&c(to(operators!(
doesn’t(know(about(stages(
DAGScheduler#
split#graph#into#stages#of#tasks!
submit#each#stage#as#ready#
DAG#
TaskScheduler#
TaskSet#
launch#tasks#via#cluster#manager!
retry#failed#or#straggling#tasks!
Cluster#manager#
Worker#
execute#tasks!
store#and#serve#blocks!
Block(manager(
Threads(Task#
stage#failed#
Lifetime of a job. Scheduling Process
36
Scheduling Process
rdd1.join(rdd2) .groupBy(…) .filter(…)
RDD#Objects#
build#operator#DAG!
agnos&c(to(operators!(
doesn’t(know(about(stages(
DAGScheduler#
split#graph#into#stages#of#tasks!
submit#each#stage#as#ready#
DAG#
TaskScheduler#
TaskSet#
launch#tasks#via#cluster#manager!
retry#failed#or#straggling#tasks!
Cluster#manager#
Worker#
execute#tasks!
store#and#serve#blocks!
Block(manager(
Threads(Task#
stage#failed#
Lifetime of a job. Scheduling Process
37
Scheduling Process
rdd1.join(rdd2) .groupBy(…) .filter(…)
RDD#Objects#
build#operator#DAG!
agnos&c(to(operators!(
doesn’t(know(about(stages(
DAGScheduler#
split#graph#into#stages#of#tasks!
submit#each#stage#as#ready#
DAG#
TaskScheduler#
TaskSet#
launch#tasks#via#cluster#manager!
retry#failed#or#straggling#tasks!
Cluster#manager#
Worker#
execute#tasks!
store#and#serve#blocks!
Block(manager(
Threads(Task#
stage#failed#
Lifetime of a job. Scheduling Process
38
Alternating Least Squares (ALS) in MLlib
39
Driver Program
SparkContext
Cluster Manager
Worker Node
Executor
Task Task
Cache
Worker Node
Executor
Task Task
Cache
Running Spark
40
Driver Program
SparkContext
Cluster Manager
Worker Node
Executor
Task Task
Cache
Mesos Master
Mesos Executor
Worker Node
Task Task
Cache
Mesos Executor
Framework
Mesos Coarse Grained
Executor
41
Driver Program
SparkContext
Cluster Manager
Worker Node
Task
Mesos Master
Mesos Executor
Worker Node
Mesos Executor
Task
Task Task
Executor
Executor
Executor
Executor
Mesos Fine Grained
Framework
Pull Requests (maybe merged)
[SPARK-7373] Add docker support for launching drivers in mesos cluster mode.
[SPARK-5338] Add cluster mode support for Mesos
[SPARK-5095] Support capping cores and launch mulitple executors in coarse mode
[SPARK-6707] Mesos Scheduler should allow the user to specify constraints based on slave attributes[SPARK-6287] Add dynamic allocation to the coarse-grained Mesos scheduler
43
Memory-centric distributed storage system (cache)
Distributed file system
General engine for large-scale data processing
Kernel for the datacenter
Ideal data processing stack
44
Other frameworks
• KAFKA on mesos https://github.com/mesos/kafka
• SAMZA on mesos https://github.com/banno/samza-mesos
• PHOENIX (secor on mesos) https://github.com/stealthly/phoenix
• CASSANDRA on mesos https://github.com/mesosphere/cassandra-mesos
We are also using:
We are considering:
• CHRONOS https://github.com/mesos/chronos
• MARATHON https://github.com/mesosphere/marathon
45
KafkaUser Activity
backups
Query/Processing Layer
Spark SQL
JSON
Data Products
ETL
Spark MLlib
Spark Streaming
46
{“userId”:"xxxxxxxx","event":"personalizer_search","query_longitude":-77.16816,"latitude":38.918159,"req_attribute_tag_ids":["pizza"],"req_geo_query":"Current Location”,"sort_by":"best","longitude":-77.168156,"query_latitude":38.91816,"req_forward_minutes":30,"req_party_size":2,"req_backward_minutes":30,"req_datetime":"2015-06-02T12:00","req_time":"12:00","res_num_results":784,"calculated_radius":5.466253405962307,"req_date":"2015-06-02"},"type":"track","messageId":"b4f2fafc-dd4a-45e3-99ed-4b83d1e42dcd","timestamp":"2015-06-02T10:02:34.323Z"}
ETL with Spark/ SparkSQL
47
Matrix Factorization. Spark MLlib
• Collaborative Filtering
• Topic Modeling
• Restaurant Demand Analysis
48
nigiri sashimi gari maki roku rolls roll godzilla chirashi robata zushi omakase yellowtail unagi
samba toro gyoza aburi spider starburst nakazawa shabu sasa katana sake hapa maguro tsunami
raku kappo yasuda otoro seki tamari ra teppanyaki caterpillar japan shashimi hamasaku
Early explorations with Word2vec: Find synonyms for “Sushi”
We use Apache Spark’s Implementation of Word2Vec (skip-gram model)
49
Sushi of Gari, Gari Columbus, NYC
Masaki Sushi Chicago
Sansei Seafood Restaurant & Sushi Bar, Maui
A restaurant like your favorite one but in a different city. Find the “synonyms” of the restaurant in question, then filter by location!
Akiko’s, SF
San Francisco Maui Chicago New York
'
Downtown upscale sushi experience with sushi bar
keep in touch @pablete