building a cloud culture at yelp (bdt305) | aws re:invent 2013
DESCRIPTION
Yelp is evolving from a purely hosted infrastructure environment to running many systems in AWS—paving the way for their growth to 108 million monthly visitors (source: Google Analytics). Embracing a cloud culture reduced reliability issues, sped up the pace of innovation, and helped them support dozens of data-intensive Yelp features, including search relevance, usage graphs, review highlights, spam filtering, and advertising optimizations. Today, Yelp runs 7+ TB hosted databases, 250+ GB compressed logs per day in Amazon S3, and hundreds of Amazon Elastic MapReduce jobs per day. In this session, Yelp engineers share the secrets of their success and show how they achieved big wins with Amazon EMR and open source libraries, policies around development, privacy, and testing.TRANSCRIPT
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Building a Cloud Culture at Yelp
Jim Blomo – Engineering Manager, Yelp
November 15, 2013
Why Cloud?
Yelp!
Yelp Data
Why Cloud?
specificity & optimization
generality & abstraction
Make the Trade-Off!
How Cloud?
Back to the Past
Logging – an aside
{user_id: 5,
request_id: "b1946ac92492d2347c6235b4d2611184",
search_query: "farm to table",
city: "Portland, OR",
results: [17289, 8230452, 825429, 184312,...],
timestamp: 1384469660
}
Hadoop Trade-Offs
Riddle: Success? • How do you know
when your
infrastructure is a
success?
• Starts failing under
heavy load
• "Too many" users
overloading the system
Hadoop Issues
EMR Solutions • Clusters up for limited
amount of time
• Upgrades handled by
Amazon
• Multiple clusters
means no capacity
coordination S3
Trade-Offs
Trade-Offs
Trade-Offs
Standard configs
Resource consumption tracking
testing
No Cargo Cults
Standard Configs
mrjob Configs
# standard
# memory intensive
# cpu intensive
Resource Tracking
python -m mrjob.tools.emr.terminate_idle_job_flows -c mrjob.conf
Testing
--runner emr
mr_canary.py
mrjob is Open Source
Adoption
Cloud Calculations • 5 days with 10
machines = 1 day
with 50 machines
• (On demand pricing
simplification)
Cost
Cost
Cost Control
Data Availability
• s3mysqldump
Overview
S3 in: s3mysqldump
out: LOAD DATA or rsync in: JSON logs
in: logs & DB dumps out: CSV, JSON, MyISAM
Pitfalls
Leaky Abstractions
Closed Source
bootstrap-actions/configure-hadoop
mapred.reduce.tasks.speculative.execution=false
Data Explosion
Tron
Wanted
Next Up: Services
Cloud Strategy
Target
generality & abstraction
easiest way
Tandem
We’re Hiring
yelp.com/careers
Please give us your feedback on this
presentation
As a thank you, we will select prize
winners daily for completed surveys!
BDT305 – Cloud @Yelp