(bdt402) performance profiling in production: analyzing web requests at scale using amazon elastic...

November 12, 2014 | Las Vegas

Performance Profiling in Production Analyzing Web Requests at Scale Using MapReduce and Storm

Zach Musgrave, Yelp

Page 2: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Roadmap

1. Why profile your code? 2. Create and analyze profiles 3. Acquire profiles from your webapp 4. Search and sort profiles 5. Aggregate similar profiles together 6. Search, sort, aggregate in real time 7. Future work, extensions, and possibilities

Page 3: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

In a magical world, far far away…

• Our apps never break • Our apps never slow down • Developers think about scalability • All external services run in O(1) • All bugs are known a priori

Page 4: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

In the world we live in…

• Accidents happen • Developers are people • Developers make mistakes • Those mistakes can make it to… production!

Page 5: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

One Fateful Day…

Page 6: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

One Fateful Day…

Page 7: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

One Fateful Day…

Your code makes me sad :(

Crap crap crap

Holy crap

That’s like a 25% bump

Is this even our fault?

Who did this???Are we timing out?

What’s the user impact?Holy crap

Page 8: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 9: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Roadmap

Page 10: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Enter… the profiler…

• Generate deterministic statistics – How many times is a method called? – How long is that method’s runtime? – What’s that method’s name/module? – How much total runtime is devoted?

• It’s easy to use ad hoc: – python -m cProfile myscript.py

Page 11: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

ztm@dev7-devb:~$ python -m pstats some-filename-goes-here.profile Welcome to the profile statistics browser. % sort cumulative % callees 10 Ordered by: cumulative time List reduced from 34239 to 10 due to restriction <10> ! ncalls tottime cumtime wsgi/app.py:134(classic_yelp_routing) 1271 0.043 0.208 web/common.py:126(handler_context) 1271 0.102 1790.071 web/wsgi.py:365(execute_request) web/gatekeeper/check.py:226(_handle) 1320 0.037 0.277 visit_captcha.py:58(is_captcha_uri) 1318 0.036 1804.759 emergency_captcha.py:121(__call__) 1318 0.021 0.043 gatekeeper/check.py:223(_should_log_request_timing) web/emergency_captcha.py:90(_handle) 1318 0.016 0.183 visit_captcha.py:58(is_captcha_uri) 1312 0.030 1804.016 web/accesscookies/app.py:118(__call__) 1317 0.020 0.369 web/emergency_captcha.py:67(_should_display_captcha) web/accesscookies/app.py:118(__call__) 1311 0.020 1802.502 pagelet/app.py:37(app) 1312 0.038 0.311 web/accesscookies/app.py:151(should_handle) 1312 0.020 0.058 web/wsgi.py:55(__init__) pagelet/app.py:37(app) 1312 0.030 1803.569 .../pyramid/router.py:242(__call__) 41 0.000 0.009 core/ips.py:48(is_internal_ip) 1312 0.003 0.003 {method 'get' of 'dict' objects}

Raw Profile Output

Page 12: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

ztm@dev7-devb:~$ diff_pstats -s calls several_months_ago.profile recently.profile SORTING BY DELTA IN calls BEFORE AFTER DELTA yelp/util/request_bucketer/bucketer.py:<lambda> 485 3284 2798 ...site-packages/staticconf/proxy.py:method 1967 3524 1557 yelp/util/experiments.py:<genexpr> 231 1620 1389 ...site-packages/simplejson/encoder.py:iterencode 0 1189 1189 yelp/core/encapsulation.py:__new__ 0 1062 1062

ztm@dev7-devb:~$ diff_pstats -s cum several_months_ago.profile recently.profile SORTING BY DELTA IN cum BEFORE AFTER DELTA yelp/wsgi/tweens.py:tween 1.352045 5.487169 4.135124 yelp/web/gatekeeper/check.py:_handle 0.000000 1.378666 1.378666 yelp/web/emergency_captcha.py:_handle 0.000000 1.378226 1.378226 yelp/util/cheetah/filters.py:markup_filter 0.000000 0.101759 0.101759 yelp/logic/decorators.py:wrapper 0.233577 0.321657 0.088080 yelp/logic/experiments.py:experiments_for_yuv 0.034188 0.120480 0.086293 yelp/util/request_bucketer/bucketer.py:get_bucket 0.049993 0.135661 0.085668

Diff Based on Call Count (n~1,000)

Diff Based on Cumulative Runtime (n~1,000)

Page 13: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

BUT HOW DOES THIS WORK IN PRODUCTION?!?!

Hi! I’m Daurius, the profiling hedgehog!

Page 14: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Roadmap

Page 15: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Get Your Data!

• Make a context manager – Wrap your app in it at a high level – Return a profiling context… sometimes

• Make a place to put your profiles – We use a distributed logging system, Scribe – You can also save them to a local disk – As long as they eventually go to the cloud!

• Add your logging stream to your profiles! – Then you can search for attributes

Page 16: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Internet

End User Requests Yelp DCs

(East Coast)

Yelp DCs (West Coast)

Scribe Aggregator

Upload Scribe to Amazon S3

All your profiling and logging data in one place!

Real-time analysis Log tailing

S3

System Diagram

Page 17: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Webapp Contextclass CProfileScribeContext(object): """ Context: on exit, save cProfile to Scribe log. """ scribe_category = "cprofile" ! def __enter__(self): self.profiler = cProfile.Profile() self.profiler.enable() ! def __exit__(self, *args): self.profiler.create_stats() write_out = { "cprofile": encode_stats(Stats(self.profiler)), "ranger": ranger.request_info } clog.log_line( self.scribe_category, write_ranger_line(write_out) ) !!

Page 18: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 19: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 20: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Webapp Context Managerclass CProfileContextManager(object): ! def should_profile(self, servlet): """ Get the probability for a specific servlet. """ ! cprof_prob = get_config(servlet, DEFAULT) if random.random() < cprof_prob: return True return False ! def get_manager(self, request): """ Return a context manager for the request. """ ! if config.enabled and self.should_profile(servlet): return CProfileScribeContext() return CProfileNoOp()

Page 21: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 22: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 23: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Per-Servlet Configuration• Consider maintaining a config!

– Default percentage of requests to profile – Override for specific servlets

• Useful for unusual/rarely loaded flows • Reload dynamically with PyStaticConf

cprofile: enabled: True probability: default: 0.000X servlets: - home: 0.002X - biz_details: 0.001X

Page 24: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Roadmap

Page 25: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Usability is KEY!

• Having ~150,000 of anything per day is HARD! • You need to be able to search, sort, and filter • You need to do this quickly

– Or it gets stale — less than one day latency – Stale data isn’t (usually) useful!

I’m a classy ‘hog… I wanna be FRESH!

Page 26: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Enter… Amazon EMR!

Page 27: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Why Amazon EMR?

• EMR lets you run MapReduce jobs in the cloud – How big a cluster? As big as you want!

• EMR spins up on demand, too • It’s super easy to use with Python!

– Yelp maintains MRJob

Mr. Job and I are best buddies!

Page 28: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Save Discrete Profile, Logging Files

• Process lines of Scribe logs into correct formats – Perfectly parallel – each line is independent!

• Save into Amazon S3 – One file for each request’s cProfile – One file for each request’s logging data

• Analyze logging data for searchable parameters – Each parameter can be computed in parallel!

Page 29: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Parameters Yelp Cares About• WHO: Is the user logged in? • WHAT: Which page did the user access?

• site (main, mobile, api, biz site) • servlet (home, biz_details, user_profile) • action (submit, first load, refresh)

• WHERE: Which data center? • WHEN: 2014-10-01 T 13:10:53 • HOW: HTTP request (GET, POST, PUT) • HOW LONG: over/under 1 second response

Page 30: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Save Discrete Profile, Logging Filesclass MRScribeTagCprofile(MRJob): ! def mapper(self, _, line): # convert text into dict; convert JSON to a pstats object request = process_ranger_line(line) pstats = decode_stats(request["cprofile"]) ! # ex: logs/cprofile-discrete/2014/10/01/00:01:34-3fc2d016d8accaf4 save_path = get_basekeyname(request) ! # save pstats and logging info to Amazon S3 bucket.set_object_gz(save_path + ".profile.gz", marshal.dumps(pstats.stats) ) bucket.set_object_gz(save_path + ".ranger.gz", write_ranger_line(request["ranger"]) ) # key examples: datacenter/sfo ; loggedin/True ; servlet/home for key in make_all_matching_tags(request) yield key, save_path

Page 31: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 32: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 33: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 34: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 35: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Save Discrete Profile, Logging Filesclass MRScribeTagCprofile(MRJob): def reducer(self, tag_key, matching_paths): # ex: logs/cprofile-discrete/2014/10/01/tags/datacenter/sfo tag_path = tag_path_for(tag_key) ! # get old list of matching values; add new values tag_list = bucket.get_object_gz(tag_path).split("\n") # update matching values tag_list.extend(list(matching_paths)) tag_contents = "\n".join(tag_list) # upload tag file w/ new matching web requests bucket.set_object_gz(tag_path, tag_contents) # output the number of paths per tag we added yield key, len(matching_paths)

Page 36: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 37: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 38: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 39: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 40: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Roadmap

Page 41: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Aggregate into Multiple Requests

• Any single profile doesn’t tell the whole story – If you pick one at random… – There’s no guarantee it’ll show the badness

• Create aggregate profiles – Usually one per day, for each set of parameters – Compare daily aggregates to see the big picture

It’s hard to see the hedgehogs for the trees!

Page 42: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def mapper(self, _, pathname): # download the logging info and process it ranger_raw = bucket.get_object_gz(pathname + ".ranger.gz") ranger_data = process_ranger_line(ranger_raw) ! # download the cProfile info and process it profile_raw = bucket.get_object_gz(pathname + ".profile.gz") stats = pstats.Stats(marshal.loads(profile_raw)) ! # key examples: datacenter/sfo ; loggedin/True ; servlet/home tags = make_all_matching_tags(ranger_data) # generate all 7-ary, ... 1-ary, 0-ary matching paths # 3-ary example: http_method.GET,servlet.biz_details,site.main for path in batch_process_paths(tags): yield path, {"ranger": [ranger_data], "cprofile": encode_stats(stats), }

Page 43: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 44: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 45: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 46: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 47: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

• Generating all batch paths is messy – First version looked like this…

Page 48: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

• Generating all batch paths is messy – First version looked like this…

Page 49: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def reducer(self, path_key, entries): combo_pstats = None combo_ranger = [] # Loop over every set of profiles (1 or >1) given for entry in entries: # Add cProfile data together if combo_pstats: combo_pstats.add(decode_stats(entry["cprofile"])) else: combo_pstats = decode_stats(entry["cprofile"]) # Add logging data together combo_ranger.append(entry["ranger"]) ! # See next slide

Page 50: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 51: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 52: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 53: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Aggregate into Multiple Requestsclass MRCprofileCombine(MRJob): def reducer(self, path_key, entries): # See previous slide ! # ex: data/cprofile-processed/batch/2014/10/01/ # date.2014-10-01,http_method.GET,servlet.biz_details,site.main pathname = batch_path(combo_ranger) # save combined cprofile and logging data bucket.set_object_gz(pathname + ".profile.gz", marshal.dumps(combo_pstats.stats) ) bucket.set_object_gz(pathname + ".ranger.gz", write_ranger_line(combo_ranger) ) yield pathname, len(combo_ranger)

Page 54: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 55: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 56: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Web Request Profile Me Maybe?

Scribe to S3

Nightly MRJob: Upload and tag

Nightly MRJob: Aggregate records

Ad-hoc MRJob: N-day aggregate

Amazon S3: - discrete profiles, logs - per-attribute tags

Amazon S3: - combined profiles, logs - per-attribute tags

E-mail notify

EMREMR

EMR

Profilistic service Hi, Daurius!

System Diagram Redux

Page 57: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

• We have, for every possible combination: • A combined set of profile statistics • A combined set of logging data

• Ex: examining {servlet: biz_details} • user logged in; long run!

• DC: east; user logged in; long run!

• DC: east; HTTP: POST; user logged in; long run!

• DC: east; HTTP: POST; site: main; logged in; long run

Page 58: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

ztm@dev7-devb:~$ diff_pstats -s calls several_months_ago.profile recently.profile SORTING BY DELTA IN calls BEFORE AFTER DELTA yelp/util/request_bucketer/bucketer.py:<lambda> 485 3284 2798 ...site-packages/staticconf/proxy.py:method 1967 3524 1557 yelp/util/experiments.py:<genexpr> 231 1620 1389 ...site-packages/simplejson/encoder.py:iterencode 0 1189 1189 yelp/core/encapsulation.py:__new__ 0 1062 1062

ztm@dev7-devb:~$ diff_pstats -s cum several_months_ago.profile recently.profile SORTING BY DELTA IN cum BEFORE AFTER DELTA yelp/wsgi/tweens.py:tween 1.352045 5.487169 4.135124 yelp/web/gatekeeper/check.py:_handle 0.000000 1.378666 1.378666 yelp/web/emergency_captcha.py:_handle 0.000000 1.378226 1.378226 yelp/util/cheetah/filters.py:markup_filter 0.000000 0.101759 0.101759 yelp/logic/decorators.py:wrapper 0.233577 0.321657 0.088080 yelp/logic/experiments.py:experiments_for_yuv 0.034188 0.120480 0.086293 yelp/util/request_bucketer/bucketer.py:get_bucket 0.049993 0.135661 0.085668

Diff Based on Call Count (n~1,000)

Diff Based on Cumulative Runtime (n~1,000)

Page 59: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Storage Considerations, Per Day

• 152,814 discrete profile/log records • 40,537 aggregate combinations (0-ary to 7-ary) !

• 386,707 total files created !

• 62.25 GB storage space used (all gzipped) • 40.99 GB on aggregate profiles (without logs) • 21.28 GB on individual profiles/logs

Page 60: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Performance Considerations

Page 61: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Performance Considerations

• Amazon Elastic MapReduce: all units of work should take equal time • This is not the case for our

aggregations! • 60%: 10 or fewer profiles • 95%: 1,000 or fewer profiles • 8: over 100,000 profiles

Page 62: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Ooh! It’s my time to shine!

Remember Ease of Use? …Remember Daurius?

Page 63: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 64: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 65: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 66: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 67: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Roadmap

Page 68: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

• Apache Storm • Real-time distributed computation platform • Directed graph of processing steps (tuples)

• Spouts - sources of data - like Scribe! • Bolts - processors of data - like MRJob! • Groupings - define how tuples move between…

Enter… the Storm!

Page 69: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Pyleus: A Python Framework for Storm Topologies

• Pyleus: Yelp’s super new Python Storm bindings • Now open sourced! http://pyleus.org

• Build topologies in Python • Declaratively describe structure in YAML

• Respects requirements.txt • Compose a topology from Python packaged components!

Page 70: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Sample Pyleus Topologyname: profilistic workers: 3 topology: ! - spout: name: cprofile-sfo module: yelp_pyleus_util.scribe_spout options: scribe_host: 10.10.10.10 stream: cprofile ! - spout: name: cprofile-iad module: yelp_pyleus_util.scribe_spout options: scribe_host: 10.20.10.10 stream: cprofile

Page 71: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 72: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 73: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 74: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Sample Pyleus Topology - bolt: # equivalent of first mapper name: process-ranger module: profilistic.storm.process_ranger groupings: - shuffle_grouping: cprofile-sfo - shuffle_grouping: cprofile-iad ! - bolt: # equiv. of first reducer, plus an S3 cache name: update-tag module: profilistic.storm.update_tag tasks: 6 parallelism_hint: 3 groupings: - fields_grouping: component: process-ranger fields: - tag

Page 75: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 76: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 77: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Sample Pyleus Boltclass MyFirstBolt(pyleus.storm.SimpleBolt): ! def initialize(self): # set up any persistent config resources staticconf.YamlConfiguration( ... ) self.bucket = s3.get_bucket( ... ) ! def process_tuple(self, tup): key, value = tup # do stuff here! ! new_tup = (new_key, new_value) self.emit(new_tup) !if __name__ == '__main__': MyFirstBolt.run()

Page 78: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 79: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 80: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Page 81: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Profilistic in Pyleus

• Profiles used to be one day delayed • Or, in emergencies, an ad hoc midday batch run • Now, ~10 minutes after bad performance…

!

• You can investigate!

Page 82: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Roadmap

Page 83: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Future Work

1. Active monitoring • For every new aggregation created each day • Pull the same aggregation from 1 day, 1 week ago • DIFF them! • If the delta is too big, send an alert or an e-mail • Easy add-on to end of Pyleus topology

Page 84: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Future Work

2. Visualization within the webapp • Already possible ad hoc: graphviz files to PDF • Most recent Yelp hackathon (F’14): Someone built this!

Page 85: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

How do I… DIY?

1. Wrap the webapp in a context manager 2. Save profiles into the cloud 3. Tag profiles with attributes 4. Combine profiles based on attributes 5. Build quick-’n-dirty internal app to search/filter 6. Refactor it all into Storm? 7. Give the hedgehog a hug!

I believe in you! ♥

Page 86: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Yelp Dataset ChallengeAcademic dataset from Phoenix, Las Vegas, Madison,

Waterloo and Edinburgh! !!!

+ Your academic project, research and/or visualizations

submitted by December 31, 2014 =

$5,000 prize + $1,000 for publication + $500 for presenting*

yelp.com/dataset_challenge

*See full terms on website

● 1,125,458 Reviews ● 42,153 Businesses

○ 320,002 Business attributes ● 403,210 Tips !!

● 252,898 Users ○ 955,999 Edge social graph

● 31,617 Checkin Sets

http://www.yelp.com/dataset_challenge/

Page 87: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Thanks for listening!

Don’t be a stranger! !!

Python MapReduce package: !

Python Storm package:

[email protected] !!http://mrjob.org !http://pyleus.org

Page 88: (BDT402) Performance Profiling in Production: Analyzing Web Requests at Scale Using Amazon Elastic MapReduce and Storm | AWS re:Invent 2014

Please give us your feedback on this presentation

Join the conversation on Twitter with #reinvent

BDT402

(bdt402) performance profiling in production: analyzing web requests at scale using amazon elastic...

Technology