SlideShare a Scribd company logo
Profiling Ruby
Ian Pointer (@carsondial)
March 10, 2015
Why Profiling?
Program analysis (often in space or time)
What is my code doing on this path/request? (and why so slow??)
What is the code doing in production?
And while we're here, where did all my memory go?
The World of MRI
Jealous of all the JVM goodness (e.g. VisualVM)
Bits and pieces (memprof, etc.)
2.x brings a host of improvements
Rblineprof
Line Profiler
Produces array of [wall, cpu, calls, allocated objects] / line
RBlineprof Usage
require 'rblineprof'
profile = lineprof(/./) do
5.times do |n|
n.times { [].push Object.new }
sleep n
end
end
Rblineprof
| 0ms| 0| 0| profile = lineprof(/./) do
10008.3ms | 0.4ms| 1| 20| 5.times do |n|
0.3ms | 0.2ms| 35| 20| n.times { [].push Object.new }
10007.9ms | 0.1ms| 5| 0| sleep n
| 0ms| 0| 0| end
| 0ms| 0| 0| end
Peek-rblineprof
Peek plugin for Rblineprof
Support for Pygments highlighting
Heavyweight approach
rbtrace
strace, but for Ruby
Works on 1.8 up
Low production impact (mostly)
Usage
require 'rbtrace'
rbtrace -p $PID --firehose
rbtrace -p $PID --slow=<N>
rbtrace -p $PID --gc
rbtrace -p $PID --methods
rbtrace -p $PID -d tracer.file
slow/method/gc/tracers options can be combined
Rails Demo
rbtrace -p $PID -f
rbtrace -p $PID -s 200
rbtrace -p $PID -m
"ActiveRecord::Railties::ControllerRuntime#process_action(action,
args)"
Stackprof
Call-stack sample profiler (using new rb_profile_frames() in
2.1)
Very low-overhead operation
Samples on wall time, cpu time, object allocation counts or
YOUR_CUSTOM_PHASE_OF_THE_MOON
Standalone & Rack middleware
Off and on-able (accumulates between start/stop)
Defaults: cpu, 1000 microsecond intervals
Stackprof Usage
StackProf.run(out: 'tmp/app.stackprof') do
...
end
Rack Middleware
config/environments/ENV.rb:
config.middleware.use StackProf::Middleware, enabled: true,
mode: :cpu,
interval: 1000,
save_every: 5
Stackprof output
stackprof lobsters.stackprof
==================================
Mode: cpu(1000)
Samples: 97 (0.00% miss rate)
GC: 18 (18.56%)
==================================
TOTAL (pct) SAMPLES (pct) FRAME
9 (9.3%) 9 (9.3%) Pathname#chop_basename
4 (4.1%) 4 (4.1%) Hike::Index#entries
4 (4.1%) 3 (3.1%) ActiveSupport::Subscriber#start
3 (3.1%) 3 (3.1%) block in ActiveRecord::ConnectionAdapters::AbstractM
4 (4.1%) 3 (3.1%) Hike::Index#build_pattern_for
6 (6.2%) 2 (2.1%) Pathname#plus
2 (2.1%) 2 (2.1%) ActiveSupport::SafeBuffer#initialize
2 (2.1%) 2 (2.1%) Hike::Index#sort_matches
3 (3.1%) 2 (2.1%) ActiveSupport::Inflector#underscore
2 (2.1%) 2 (2.1%) block (2 levels) in <class:Numeric>
5 (5.2%) 2 (2.1%) ActiveSupport::Subscriber#finish
2 (2.1%) 2 (2.1%) Sprockets::Mime#mime_types
2 (2.1%) 2 (2.1%) ActiveSupport::PerThreadRegistry#instance
Zooming in
stackprof lobsters.dump --method 'Hike::Index#entries'
Hike::Index#entries (/home/vagrant/.rvm/gems/ruby-2.1.3/gems/hike-1.2.3/lib/hike/index.rb:78)
samples: 637 self (14.6%) / 645 total (14.8%)
callers:
645 ( 100.0%) Hike::Index#match
callees (8 total):
8 ( 100.0%) block in Hike::Index#entries
code:
| 78 | def entries(path)
21 (0.5%) / 21 (0.5%) | 79 | @entries[path.to_s] ||= begin
5 (0.1%) / 5 (0.1%) | 80 | pathname = Pathname.new(path)
424 (9.7%) / 424 (9.7%) | 81 | if pathname.directory?
195 (4.5%) / 187 (4.3%) | 82 | pathname.entries.reject { |entry| entry.to
| 83 | else
Flamegraphs!
Flamegraphs!
FLAMEGRAPHS!
Flamegraphs
What are they?
Visualization technique for sample stack traces
Turning thousands of dense traces into a single image
Invented by Brendan Gregg (Joyent / Netflix)
A Flamegraph
Rails new
Interpreting Flamegraphs
Y-Axis is Stack depth
X-Axis is not time
Box width proportional to how often method (or children) profiled
Stackprof Flamegraphs
pass in raw:true (Rack middleware requires patching)
stackprof --flamegraph stack.dump > flame_output
stackprof --flamegraph-viewer flame_output (Safari /
Chrome only)
stackprof --stackcollapse stack.dump (classical
Flamegraph)
Rails Flamegraph
Rails Flamegraph
Default Stackprof flamegraphs show repeated calls to same
methods
Can hide patterns
Gregg's flamegraph includes a 'collapse' preprocessing phase to
combine repeated calls
Another example
Working on a pure Ruby application
'Why is it running so slow?'
'Can we see any quick way of shaving off some execution time?'
Flamegraph
(wall time / 1000 microseconds sample - collapsed graph)
Profiling Ruby
Interpretation
Most of the execution time is spent in Excon and Fog methods
These are talking to network (OpenStack / Puppet)
Caching some results provided a quick win that shaved ~30s
Most of execution time still network-based
Medium / Long-term solution to move to pre-baked images and
thus eliminate need for Puppet run
Result: Runtime of 8 minutes (!) down to 20s.
Memory
Where did it all go?
ObjectSpace
require 'objspace'
ObjectSpace.trace_object_allocations_start
ObjectSpace.dump/dump_all
dump & dump_all
JSON representation of object (more info provided if allocation
tracing is on)
GIVE ME THE ENTIRE HEAP! ObjectSpace.dump_all
Dump is multiple lines of JSON
(Obviously, can be large!)
Example - pry
Q. How many STRINGS are there in my pry session?
require 'objspace'
ObjectSpace.dump_all(output: File.open('heap.dump','w'))
$> grep '"type":"STRING"' heap.dump | wc -l
A. ???
Hunting for leaks with
rbtrace
wabbit season
Idea - GC, dump, repeat, and compare
Remove objects from dump 2 that are in dump 1
(Remove missing objects in dump 3 from dump 2)
Not necessarily leaks but a great place to start looking
Rbtrace & Leaks
How to get the dumps from a live server?
rbtrace -e
e.g. rbtrace -p $PID -e 'Rails.root.to_s'
watch out for eval timeouts
Getting the heap dump
Thread.new{
require "objspace";
ObjectSpace.trace_object_allocations_start;
GC.start();
ObjectSpace.dump_all(output: File.open("heap-1.dump", "w"))
}.join
Diffing Heaps
diff_heaps.rb in Heroku/discussion repo
Leaked 37793 STRING objects at: /home/vagrant/.rvm/rubies/ruby-2.1.3/lib/ruby/2.1.0/psych.rb:370
Leaked 563 ARRAY objects at: /home/vagrant/.rvm/gems/ruby-2.1.3/gems/activesupport-4.1.8/lib/acti
Leaked 483 STRING objects at: /home/vagrant/.rvm/gems/ruby-2.1.3/gems/activesupport-4.1.8/lib/act
...
MemoryProfiler
Uses new 2.1+ hooks
Shows allocated / retained memory
Can be slow
Demo
Let's look at Sinatra
MemoryProfiler.report { require 'sinatra' }.pretty_prin
Freeze your strings!
GC
GC is in a state of flux
1.9.x, 2.0, 2.1, 2.2 all have different GC strategies.
Mostly worked with 2.1 (2.2 is improvement on 2.1 strategy)
Tuning? Here be dragons…
gc_tracer
Uses new 2.1 hooks for GC profiling
Outputs TSV (GC.stat, minor/major GC runs, etc.)
Useful for ideas on GC tuning
Using gc_tracer
require 'gc_tracer'
GC::Tracer.start_logging(filename) do
...
end
What to look for in GC
Tuning
Initial slots (RUBY_GC_HEAP_INIT_SLOTS)
Limiting memory growth
(GC is probably another talk in itself)
Experiment, profile, update tuning, experiment, etc.
2.1.x is not…great for webapps (minor/major issue, :symbols bug)
All hail Rails 5 and Ruby 2.2!
Summing Up
Things are getting better!
Still a bunch of separate tools (with some overlap)
(more things abound - ruby-prof, rack-mini-profiler, etc)
Good idea to send some of this to logging / graphite / etc.
Lower level - SystemTap, DTrace, perf
Links
http://www.brendangregg.com/flamegraphs.html
https://github.com/tmm1/rblineprof
https://github.com/peek/peek-rblineprof
https://github.com/tmm1/stackprof
https://github.com/falloutdurham/stackprof (patched for raw Rack
samples)
https://github.com/tmm1/rbtrace
https://github.com/heroku/discussion/blob/master/script/diff_heaps.rb
https://github.com/srawlins/allocation_stats
https://github.com/SamSaffron/memory_profiler
https://github.com/ko1/gc_tracer
Questions?

More Related Content

Profiling Ruby