Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

AWS re:Invent 2014: Performance Tuning EC2 Instances

Talk for AWS re:Invent 2014 by Brendan Gregg, Netflix.

Video: https://www.youtube.com/watch?v=7Cyd22kOqWc

Description: "Netflix tunes Amazon EC2 instances for maximum performance. In this session, you learn how Netflix configures the fastest possible EC2 instances, while reducing latency outliers. This session explores the various Xen modes (e.g., HVM, PV, etc.) and how they are optimized for different workloads. Hear how Netflix chooses Linux kernel versions based on desired performance characteristics and receive a firsthand look at how they set kernel tunables, including hugepages. You also hear about Netflix’s use of SR-IOV to enable enhanced networking and their approach to observability, which can exonerate EC2 issues and direct attention back to application performance."

next
prev
1/81
next
prev
2/81
next
prev
3/81
next
prev
4/81
next
prev
5/81
next
prev
6/81
next
prev
7/81
next
prev
8/81
next
prev
9/81
next
prev
10/81
next
prev
11/81
next
prev
12/81
next
prev
13/81
next
prev
14/81
next
prev
15/81
next
prev
16/81
next
prev
17/81
next
prev
18/81
next
prev
19/81
next
prev
20/81
next
prev
21/81
next
prev
22/81
next
prev
23/81
next
prev
24/81
next
prev
25/81
next
prev
26/81
next
prev
27/81
next
prev
28/81
next
prev
29/81
next
prev
30/81
next
prev
31/81
next
prev
32/81
next
prev
33/81
next
prev
34/81
next
prev
35/81
next
prev
36/81
next
prev
37/81
next
prev
38/81
next
prev
39/81
next
prev
40/81
next
prev
41/81
next
prev
42/81
next
prev
43/81
next
prev
44/81
next
prev
45/81
next
prev
46/81
next
prev
47/81
next
prev
48/81
next
prev
49/81
next
prev
50/81
next
prev
51/81
next
prev
52/81
next
prev
53/81
next
prev
54/81
next
prev
55/81
next
prev
56/81
next
prev
57/81
next
prev
58/81
next
prev
59/81
next
prev
60/81
next
prev
61/81
next
prev
62/81
next
prev
63/81
next
prev
64/81
next
prev
65/81
next
prev
66/81
next
prev
67/81
next
prev
68/81
next
prev
69/81
next
prev
70/81
next
prev
71/81
next
prev
72/81
next
prev
73/81
next
prev
74/81
next
prev
75/81
next
prev
76/81
next
prev
77/81
next
prev
78/81
next
prev
79/81
next
prev
80/81
next
prev
81/81

PDF: AWSreInvent2014_perf_tuning_EC2_nobkg.pdf

Keywords (from pdftotext):

slide 1:
    PFC306
    Brendan Gregg, Performance Engineering, Netflix
    November 12, 2014 | Las Vegas, NV
    
    slide 2:
      
      slide 3:
        
        slide 4:
          
          slide 5:
            
            slide 6:
              
              slide 7:
                
                slide 8:
                  
                  slide 9:
                    EC2
                    ELB
                    Cassandra
                    Applications
                    (Services)
                    Elasticsearch
                    EVCache
                    SES
                    SQS
                    
                    slide 10:
                      
                      slide 11:
                        
                        slide 12:
                          
                          slide 13:
                            Start
                            Find best
                            balance
                            Select memory to
                            cache working set
                            
                            slide 14:
                              ASG Cluster
                              prod1
                              ELB
                              Canary
                              ASG-v010
                              ASG-v011
                              Instance
                              Instance
                              Instance
                              Instance
                              Instance
                              Instance
                              
                              slide 15:
                                
                                slide 16:
                                  
                                  slide 17:
                                    Select instance families
                                    From any desired
                                    resource, see
                                    types & cost
                                    Select resources
                                    
                                    slide 18:
                                      eg, 8 vCPU:
                                      
                                      slide 19:
                                        
                                        slide 20:
                                          
                                          slide 21:
                                            Acceptable
                                            Headroom
                                            Unacceptable
                                            
                                            slide 22:
                                              
                                              slide 23:
                                                
                                                slide 24:
                                                  
                                                  slide 25:
                                                    
                                                    slide 26:
                                                      Cost per hour
                                                      Services
                                                      
                                                      slide 27:
                                                        
                                                        slide 28:
                                                          
                                                          slide 29:
                                                            
                                                            slide 30:
                                                              
                                                              slide 31:
                                                                
                                                                slide 32:
                                                                  
                                                                  slide 33:
                                                                    
                                                                    slide 34:
                                                                      
                                                                      slide 35:
                                                                        
                                                                        slide 36:
                                                                          # schedtool –B PID
                                                                          
                                                                          slide 37:
                                                                            vm.swappiness = 0
                                                                            # from 60
                                                                            
                                                                            slide 38:
                                                                              # echo never >gt; /sys/kernel/mm/transparent_hugepage/enabled
                                                                              # from madvise
                                                                              
                                                                              slide 39:
                                                                                vm.dirty_ratio = 80
                                                                                # from 40
                                                                                vm.dirty_background_ratio = 5
                                                                                # from 10
                                                                                vm.dirty_expire_centisecs = 12000
                                                                                # from 3000
                                                                                mount -o defaults,noatime,discard,nobarrier …
                                                                                
                                                                                slide 40:
                                                                                  /sys/block/*/queue/rq_affinity
                                                                                  /sys/block/*/queue/scheduler
                                                                                  /sys/block/*/queue/nr_requests
                                                                                  /sys/block/*/queue/read_ahead_kb
                                                                                  mdadm –chunk=64 ...
                                                                                  noop
                                                                                  
                                                                                  slide 41:
                                                                                    net.core.somaxconn = 1000
                                                                                    net.core.netdev_max_backlog = 5000
                                                                                    net.core.rmem_max = 16777216
                                                                                    net.core.wmem_max = 16777216
                                                                                    net.ipv4.tcp_wmem = 4096 12582912 16777216
                                                                                    net.ipv4.tcp_rmem = 4096 12582912 16777216
                                                                                    net.ipv4.tcp_max_syn_backlog = 8096
                                                                                    net.ipv4.tcp_slow_start_after_idle = 0
                                                                                    net.ipv4.tcp_tw_reuse = 1
                                                                                    net.ipv4.ip_local_port_range = 10240 65535
                                                                                    net.ipv4.tcp_abort_on_overflow = 1
                                                                                    # maybe
                                                                                    
                                                                                    slide 42:
                                                                                      echo tsc >gt; /sys/devices/system/clocksource/clocksource0/current_clocksource
                                                                                      
                                                                                      slide 43:
                                                                                        
                                                                                        slide 44:
                                                                                          
                                                                                          slide 45:
                                                                                            
                                                                                            slide 46:
                                                                                              
                                                                                              slide 47:
                                                                                                Resource
                                                                                                Utilization
                                                                                                (%)
                                                                                                
                                                                                                slide 48:
                                                                                                  
                                                                                                  slide 49:
                                                                                                    
                                                                                                    slide 50:
                                                                                                      
                                                                                                      slide 51:
                                                                                                        Application
                                                                                                        System Libraries
                                                                                                        System Calls
                                                                                                        Kernel
                                                                                                        Devices
                                                                                                        
                                                                                                        slide 52:
                                                                                                          
                                                                                                          slide 53:
                                                                                                            
                                                                                                            slide 54:
                                                                                                              $ sar -n TCP,ETCP,DEV 1
                                                                                                              Linux 3.2.55 (test-e4f1a80b)
                                                                                                              rxpck/s
                                                                                                              08/18/2014
                                                                                                              09:10:43 PM
                                                                                                              09:10:44 PM
                                                                                                              09:10:44 PM
                                                                                                              IFACE
                                                                                                              eth0
                                                                                                              txpck/s
                                                                                                              09:10:43 PM
                                                                                                              09:10:44 PM
                                                                                                              active/s passive/s
                                                                                                              09:10:43 PM
                                                                                                              09:10:44 PM
                                                                                                              […]
                                                                                                              atmptf/s
                                                                                                              _x86_64_ (8 CPU)
                                                                                                              rxkB/s
                                                                                                              txkB/s rxcmp/s txcmp/s
                                                                                                              4537.46 28513.24
                                                                                                              iseg/s
                                                                                                              oseg/s
                                                                                                              estres/s retrans/s isegerr/s
                                                                                                              orsts/s
                                                                                                              rxmcst/s
                                                                                                              
                                                                                                              slide 55:
                                                                                                                
                                                                                                                slide 56:
                                                                                                                  
                                                                                                                  slide 57:
                                                                                                                    
                                                                                                                    slide 58:
                                                                                                                      
                                                                                                                      slide 59:
                                                                                                                        Stack frame
                                                                                                                        Ancestry
                                                                                                                        Mouse-over
                                                                                                                        frames to
                                                                                                                        quantify
                                                                                                                        
                                                                                                                        slide 60:
                                                                                                                          # git clone https://github.com/brendangregg/FlameGraph
                                                                                                                          # cd FlameGraph
                                                                                                                          # perf record -F 99 -ag -- sleep 60
                                                                                                                          # perf script | ./stackcollapse-perf.pl | ./flamegraph.pl >gt; perf.svg
                                                                                                                          
                                                                                                                          slide 61:
                                                                                                                            
                                                                                                                            slide 62:
                                                                                                                              Kernel
                                                                                                                              TCP/IP
                                                                                                                              Broken
                                                                                                                              Java stacks
                                                                                                                              (missing
                                                                                                                              frame
                                                                                                                              pointer)
                                                                                                                              Locks
                                                                                                                              epoll
                                                                                                                              Time
                                                                                                                              Idle
                                                                                                                              thread
                                                                                                                              
                                                                                                                              slide 63:
                                                                                                                                
                                                                                                                                slide 64:
                                                                                                                                  
                                                                                                                                  slide 65:
                                                                                                                                    # ./iosnoop –ts
                                                                                                                                    Tracing block I/O. Ctrl-C to end.
                                                                                                                                    STARTs
                                                                                                                                    ENDs
                                                                                                                                    COMM
                                                                                                                                    5982800.302061 5982800.302679 supervise
                                                                                                                                    5982800.302423 5982800.302842 supervise
                                                                                                                                    5982800.304962 5982800.305446 supervise
                                                                                                                                    5982800.305250 5982800.305676 supervise
                                                                                                                                    […]
                                                                                                                                    PID
                                                                                                                                    TYPE DEV
                                                                                                                                    202,1
                                                                                                                                    202,1
                                                                                                                                    202,1
                                                                                                                                    202,1
                                                                                                                                    BLOCK
                                                                                                                                    BYTES LATms
                                                                                                                                    # ./iosnoop –h
                                                                                                                                    USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name] [duration]
                                                                                                                                    -d device
                                                                                                                                    # device string (eg, "202,1)
                                                                                                                                    -i iotype
                                                                                                                                    # match type (eg, '*R*' for all reads)
                                                                                                                                    -n name
                                                                                                                                    # process name to match on I/O issue
                                                                                                                                    -p PID
                                                                                                                                    # PID to match on I/O issue
                                                                                                                                    # include queueing time in LATms
                                                                                                                                    # include start time of I/O (s)
                                                                                                                                    # include completion time of I/O (s)
                                                                                                                                    […]
                                                                                                                                    
                                                                                                                                    slide 66:
                                                                                                                                      
                                                                                                                                      slide 67:
                                                                                                                                        # perf record –e skb:consume_skb –ag -- sleep 10
                                                                                                                                        # perf report
                                                                                                                                        [...]
                                                                                                                                        74.42% swapper [kernel.kallsyms] [k] consume_skb
                                                                                                                                        --- consume_skb
                                                                                                                                        arp_process
                                                                                                                                        arp_rcv
                                                                                                                                        Summarizing stack traces for a
                                                                                                                                        __netif_receive_skb_core
                                                                                                                                        tracepoint
                                                                                                                                        __netif_receive_skb
                                                                                                                                        netif_receive_skb
                                                                                                                                        virtnet_poll
                                                                                                                                        perf_events can do many things,
                                                                                                                                        net_rx_action
                                                                                                                                        it is hard to pick just one example
                                                                                                                                        __do_softirq
                                                                                                                                        irq_exit
                                                                                                                                        do_IRQ
                                                                                                                                        ret_from_intr
                                                                                                                                        […]
                                                                                                                                        
                                                                                                                                        slide 68:
                                                                                                                                          
                                                                                                                                          slide 69:
                                                                                                                                            ec2-guest# ./showboost
                                                                                                                                            CPU MHz
                                                                                                                                            : 2500
                                                                                                                                            Turbo MHz
                                                                                                                                            : 2900 (10 active)
                                                                                                                                            Turbo Ratio : 116% (10 active)
                                                                                                                                            CPU 0 summary every 5 seconds...
                                                                                                                                            TIME
                                                                                                                                            06:11:35
                                                                                                                                            06:11:40
                                                                                                                                            06:11:45
                                                                                                                                            [...]
                                                                                                                                            C0_MCYC
                                                                                                                                            C0_ACYC
                                                                                                                                            Real CPU MHz
                                                                                                                                            UTIL
                                                                                                                                            51%
                                                                                                                                            50%
                                                                                                                                            49%
                                                                                                                                            RATIO
                                                                                                                                            116%
                                                                                                                                            115%
                                                                                                                                            115%
                                                                                                                                            MHz
                                                                                                                                            
                                                                                                                                            slide 70:
                                                                                                                                              
                                                                                                                                              slide 71:
                                                                                                                                                
                                                                                                                                                slide 72:
                                                                                                                                                  Region
                                                                                                                                                  Breakdowns
                                                                                                                                                  App
                                                                                                                                                  Interactive
                                                                                                                                                  Graph
                                                                                                                                                  Metrics
                                                                                                                                                  Options
                                                                                                                                                  Summary Statistics
                                                                                                                                                  
                                                                                                                                                  slide 73:
                                                                                                                                                    
                                                                                                                                                    slide 74:
                                                                                                                                                      
                                                                                                                                                      slide 75:
                                                                                                                                                        Utilization
                                                                                                                                                        Per device
                                                                                                                                                        Breakdowns
                                                                                                                                                        Saturation
                                                                                                                                                        Errors
                                                                                                                                                        
                                                                                                                                                        slide 76:
                                                                                                                                                          
                                                                                                                                                          slide 77:
                                                                                                                                                            
                                                                                                                                                            slide 78:
                                                                                                                                                              http://aws.amazon.com/ec2/instance-types/
                                                                                                                                                              http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html
                                                                                                                                                              http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html
                                                                                                                                                              http://www.slideshare.net/cpwatson/cpn302-yourlinuxamioptimizationandperformance
                                                                                                                                                              http://www.brendangregg.com/blog/2014-09-27/from-clouds-to-roots.html
                                                                                                                                                              http://www.brendangregg.com/blog/2014-05-07/what-color-is-your-xen.html
                                                                                                                                                              http://www.brendangregg.com/linuxperf.html
                                                                                                                                                              http://www.slideshare.net/brendangregg/linux-performance-tools-2014
                                                                                                                                                              http://www.brendangregg.com/USEmethod/use-linux.html
                                                                                                                                                              http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html
                                                                                                                                                              https://github.com/brendangregg/FlameGraph https://github.com/brendangregg/perf-tools
                                                                                                                                                              
                                                                                                                                                              slide 79:
                                                                                                                                                                
                                                                                                                                                                slide 80:
                                                                                                                                                                  Talk
                                                                                                                                                                  Time
                                                                                                                                                                  Title
                                                                                                                                                                  PFC-305
                                                                                                                                                                  Wednesday, 1:15pm
                                                                                                                                                                  Embracing Failure: Fault Injection and Service Reliability
                                                                                                                                                                  BDT-403
                                                                                                                                                                  Wednesday, 2:15pm
                                                                                                                                                                  Next Generation Big Data Platform at Netflix
                                                                                                                                                                  PFC-306
                                                                                                                                                                  Wednesday, 3:30pm
                                                                                                                                                                  Performance Tuning EC2
                                                                                                                                                                  DEV-309
                                                                                                                                                                  Wednesday, 3:30pm
                                                                                                                                                                  From Asgard to Zuul, How Netflix’s proven Open Source
                                                                                                                                                                  Tools can accelerate and scale your services
                                                                                                                                                                  ARC-317
                                                                                                                                                                  Wednesday, 4:30pm
                                                                                                                                                                  Maintaining a Resilient Front-Door at Massive Scale
                                                                                                                                                                  PFC-304
                                                                                                                                                                  Wednesday, 4:30pm
                                                                                                                                                                  Effective Inter-process Communications in the Cloud: The
                                                                                                                                                                  Pros and Cons of Micro Services Architectures
                                                                                                                                                                  ENT-209
                                                                                                                                                                  Wednesday, 4:30pm
                                                                                                                                                                  Cloud Migration, Dev-Ops and Distributed Systems
                                                                                                                                                                  APP-310
                                                                                                                                                                  Friday, 9:00am
                                                                                                                                                                  Scheduling using Apache Mesos in the Cloud
                                                                                                                                                                  
                                                                                                                                                                  slide 81: