profiling ruby
TRANSCRIPT
Why Profiling?Program analysis (often in space or time)What is my code doing on this path/request? (and why so slow??)What is the code doing in production?And while we're here, where did all my memory go?
The World of MRIJealous of all the JVM goodness (e.g. VisualVM)Bits and pieces (memprof, etc.)2.x brings a host of improvements
RBlineprof Usage require 'rblineprof'
profile = lineprof(/./) do
5.times do |n|
n.times { [].push Object.new }
sleep n
end
end
Rblineprof | 0ms| 0| 0| profile = lineprof(/./) do
10008.3ms | 0.4ms| 1| 20| 5.times do |n|
0.3ms | 0.2ms| 35| 20| n.times { [].push Object.new }
10007.9ms | 0.1ms| 5| 0| sleep n
| 0ms| 0| 0| end
| 0ms| 0| 0| end
Usagerequire 'rbtrace'rbtrace -p $PID --firehoserbtrace -p $PID --slow=<N>rbtrace -p $PID --gcrbtrace -p $PID --methodsrbtrace -p $PID -d tracer.fileslow/method/gc/tracers options can be combined
Rails Demorbtrace -p $PID -frbtrace -p $PID -s 200rbtrace -p $PID -m"ActiveRecord::Railties::ControllerRuntime#process_action(action,args)"
StackprofCall-stack sample profiler (using new rb_profile_frames() in2.1)Very low-overhead operationSamples on wall time, cpu time, object allocation counts orYOUR_CUSTOM_PHASE_OF_THE_MOONStandalone & Rack middlewareOff and on-able (accumulates between start/stop)Defaults: cpu, 1000 microsecond intervals
Rack Middlewareconfig/environments/ENV.rb:
config.middleware.use StackProf::Middleware, enabled: true, mode: :cpu, interval: 1000, save_every: 5
Stackprof outputstackprof lobsters.stackprof
================================== Mode: cpu(1000) Samples: 97 (0.00% miss rate) GC: 18 (18.56%)================================== TOTAL (pct) SAMPLES (pct) FRAME 9 (9.3%) 9 (9.3%) Pathname#chop_basename 4 (4.1%) 4 (4.1%) Hike::Index#entries 4 (4.1%) 3 (3.1%) ActiveSupport::Subscriber#start 3 (3.1%) 3 (3.1%) block in ActiveRecord::ConnectionAdapters::AbstractMysqlAdapter#execute 4 (4.1%) 3 (3.1%) Hike::Index#build_pattern_for 6 (6.2%) 2 (2.1%) Pathname#plus 2 (2.1%) 2 (2.1%) ActiveSupport::SafeBuffer#initialize 2 (2.1%) 2 (2.1%) Hike::Index#sort_matches 3 (3.1%) 2 (2.1%) ActiveSupport::Inflector#underscore 2 (2.1%) 2 (2.1%) block (2 levels) in <class:Numeric> 5 (5.2%) 2 (2.1%) ActiveSupport::Subscriber#finish 2 (2.1%) 2 (2.1%) Sprockets::Mime#mime_types 2 (2.1%) 2 (2.1%) ActiveSupport::PerThreadRegistry#instance
Zooming instackprof lobsters.dump --method 'Hike::Index#entries'
Hike::Index#entries (/home/vagrant/.rvm/gems/ruby-2.1.3/gems/hike-1.2.3/lib/hike/index.rb:78)
samples: 637 self (14.6%) / 645 total (14.8%)
callers:
645 ( 100.0%) Hike::Index#match
callees (8 total):
8 ( 100.0%) block in Hike::Index#entries
code:
| 78 | def entries(path)
21 (0.5%) / 21 (0.5%) | 79 | @entries[path.to_s] ||= begin
5 (0.1%) / 5 (0.1%) | 80 | pathname = Pathname.new(path)
424 (9.7%) / 424 (9.7%) | 81 | if pathname.directory?
195 (4.5%) / 187 (4.3%) | 82 | pathname.entries.reject { |entry| entry.to_s =~ /^\.|~$|^\#.*\#$/ }.sort
| 83 | else
FlamegraphsWhat are they?Visualization technique for sample stack tracesTurning thousands of dense traces into a single imageInvented by Brendan Gregg (Joyent / Netflix)
Interpreting FlamegraphsY-Axis is Stack depthX-Axis is not timeBox width proportional to how often method (or children) profiled
Stackprof Flamegraphspass in raw:true (Rack middleware requires patching)stackprof --flamegraph stack.dump > flame_outputstackprof --flamegraph-viewer flame_output (Safari /Chrome only)stackprof --stackcollapse stack.dump (classicalFlamegraph)
Rails FlamegraphDefault Stackprof flamegraphs show repeated calls to samemethodsCan hide patternsGregg's flamegraph includes a 'collapse' preprocessing phase tocombine repeated calls
Another exampleWorking on a pure Ruby application'Why is it running so slow?''Can we see any quick way of shaving off some execution time?'
InterpretationMost of the execution time is spent in Excon and Fog methods
These are talking to network (OpenStack / Puppet)
Caching some results provided a quick win that shaved ~30s
Most of execution time still network-based
Medium / Long-term solution to move to pre-baked images and
thus eliminate need for Puppet run
Result: Runtime of 8 minutes (!) down to 20s.
dump & dump_allJSON representation of object (more info provided if allocationtracing is on)GIVE ME THE ENTIRE HEAP! ObjectSpace.dump_allDump is multiple lines of JSON(Obviously, can be large!)
Example - pryQ. How many STRINGS are there in my pry session?
require 'objspace'
ObjectSpace.dump_all(output: File.open('heap.dump','w'))
$> grep '"type":"STRING"' heap.dump | wc -l
A. ???
Hunting for leaks with
rbtrace
wabbit seasonIdea - GC, dump, repeat, and compare
Remove objects from dump 2 that are in dump 1(Remove missing objects in dump 3 from dump 2)
Not necessarily leaks but a great place to start looking
Rbtrace & LeaksHow to get the dumps from a live server?
rbtrace -e
e.g. rbtrace -p $PID -e 'Rails.root.to_s'
watch out for eval timeouts
Getting the heap dump Thread.new{
require "objspace";
ObjectSpace.trace_object_allocations_start;
GC.start();
ObjectSpace.dump_all(output: File.open("heap-1.dump", "w"))
}.join
Diffing Heaps
diff_heaps.rb in Heroku/discussion repo
Leaked 37793 STRING objects at: /home/vagrant/.rvm/rubies/ruby-2.1.3/lib/ruby/2.1.0/psych.rb:370Leaked 563 ARRAY objects at: /home/vagrant/.rvm/gems/ruby-2.1.3/gems/activesupport-4.1.8/lib/active_support/core_ext/hash/keys.rb:142Leaked 483 STRING objects at: /home/vagrant/.rvm/gems/ruby-2.1.3/gems/activesupport-4.1.8/lib/active_support/dependencies.rb:443...
DemoLet's look at SinatraMemoryProfiler.report { require 'sinatra' }.pretty_printFreeze your strings!
GCGC is in a state of flux
1.9.x, 2.0, 2.1, 2.2 all have different GC strategies.
Mostly worked with 2.1 (2.2 is improvement on 2.1 strategy)
Tuning? Here be dragons…
gc_tracerUses new 2.1 hooks for GC profiling
Outputs TSV (GC.stat, minor/major GC runs, etc.)
Useful for ideas on GC tuning
What to look for in GCTuning
Initial slots (RUBY_GC_HEAP_INIT_SLOTS)Limiting memory growth(GC is probably another talk in itself)Experiment, profile, update tuning, experiment, etc.2.1.x is not…great for webapps (minor/major issue, :symbols bug)All hail Rails 5 and Ruby 2.2!
Summing UpThings are getting better!Still a bunch of separate tools (with some overlap)(more things abound - ruby-prof, rack-mini-profiler, etc)Good idea to send some of this to logging / graphite / etc.Lower level - SystemTap, DTrace, perf
Linkshttp://www.brendangregg.com/flamegraphs.htmlhttps://github.com/tmm1/rblineprofhttps://github.com/peek/peek-rblineprofhttps://github.com/tmm1/stackprofhttps://github.com/falloutdurham/stackprof (patched for raw Racksamples)https://github.com/tmm1/rbtracehttps://github.com/heroku/discussion/blob/master/script/diff_heaps.rbhttps://github.com/srawlins/allocation_statshttps://github.com/SamSaffron/memory_profilerhttps://github.com/ko1/gc_tracer