Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

EuroBSDcon 2017: System Performance Analysis Methodologies

Video: https://www.youtube.com/watch?v=ay41Uq1DrvM

Keynote by Brendan Gregg.

Description: "Traditional performance monitoring makes do with vendor-supplied metrics, often involving interpretation and inference, and with numerous blind spots. Much in the field of systems performance is still living in the past: documentation, procedures, and analysis GUIs built upon the same old metrics. Modern BSD has advanced tracers and PMC tools, providing virtually endless metrics to aid performance analysis. It's time we really used them, but the problem becomes which metrics to use, and how to navigate them quickly to locate the root cause of problems.

There's a new way to approach performance analysis that can guide you through the metrics. Instead of starting with traditional metrics and figuring out their use, you start with the questions you want answered then look for metrics to answer them. Methodologies can provide these questions, as well as a starting point for analysis and guidance for locating the root cause. They also pose questions that the existing metrics may not yet answer, which may be critical in solving the toughest problems. System methodologies include the USE method, workload characterization, drill-down analysis, off-CPU analysis, chain graphs, and more.

This talk will discuss various system performance issues, and the methodologies, tools, and processes used to solve them. Many methodologies will be discussed, from the production proven to the cutting edge, along with recommendations for their implementation on BSD systems. In general, you will learn to think differently about analyzing your systems, and make better use of the modern tools that BSD provides."

next
prev
1/65
next
prev
2/65
next
prev
3/65
next
prev
4/65
next
prev
5/65
next
prev
6/65
next
prev
7/65
next
prev
8/65
next
prev
9/65
next
prev
10/65
next
prev
11/65
next
prev
12/65
next
prev
13/65
next
prev
14/65
next
prev
15/65
next
prev
16/65
next
prev
17/65
next
prev
18/65
next
prev
19/65
next
prev
20/65
next
prev
21/65
next
prev
22/65
next
prev
23/65
next
prev
24/65
next
prev
25/65
next
prev
26/65
next
prev
27/65
next
prev
28/65
next
prev
29/65
next
prev
30/65
next
prev
31/65
next
prev
32/65
next
prev
33/65
next
prev
34/65
next
prev
35/65
next
prev
36/65
next
prev
37/65
next
prev
38/65
next
prev
39/65
next
prev
40/65
next
prev
41/65
next
prev
42/65
next
prev
43/65
next
prev
44/65
next
prev
45/65
next
prev
46/65
next
prev
47/65
next
prev
48/65
next
prev
49/65
next
prev
50/65
next
prev
51/65
next
prev
52/65
next
prev
53/65
next
prev
54/65
next
prev
55/65
next
prev
56/65
next
prev
57/65
next
prev
58/65
next
prev
59/65
next
prev
60/65
next
prev
61/65
next
prev
62/65
next
prev
63/65
next
prev
64/65
next
prev
65/65

PDF: EuroBSDcon2017_SystemMethodology.pdf

Keywords (from pdftotext):

slide 1:
    EuroBSDcon 2017
    System Performance
    Analysis Methodologies
    Brendan Gregg
    Senior Performance Architect
    
    slide 2:
      
      slide 3:
        Apollo Lunar Module Guidance Computer
        performance analysis
        CORE SET
        AREA
        VAC SETS
        ERASABLE
        MEMORY
        FIXED
        MEMORY
        
        slide 4:
          
          slide 5:
            Background
            
            slide 6:
              History
              • System Performance Analysis up to the '90s:
              – Closed source UNIXes and applicaNons
              – Vendor-created metrics and performance tools
              – Users interpret given metrics
              • Problems
              – Vendors may not provide the best metrics
              – ORen had to infer, rather than measure
              – Given metrics, what do we do with them?
              $ ps -auxw
              USER
              PID %CPU %MEM
              root
              11 99.9 0.0
              root
              0 0.0 0.0
              root
              1 0.0 0.2
              […]
              VSZ RSS TT
              16 0 176 5408 1040 -
              STAT STARTED
              TIME COMMAND
              22:10
              22:27.05 [idle]
              DLs 22:10
              0:00.47 [kernel]
              ILs 22:10
              0:00.01 /sbin/init --
              
              slide 7:
                Today
                1. Open source
                OperaNng systems: Linux, BSD, etc.
                ApplicaNons: source online (Github)
                2. Custom metrics
                Can patch the open source, or,
                Use dynamic tracing (open source helps)
                3. Methodologies
                Start with the quesNons, then make metrics to answer them
                Methodologies can pose the quesNons
                Biggest problem with dynamic tracing has been what to do with it.
                Methodologies guide your usage.
                
                slide 8:
                  Crystal Ball Thinking
                  
                  slide 9:
                    An2-Methodologies
                    
                    slide 10:
                      Street Light An2-Method
                      1. Pick observability tools that are
                      – Familiar
                      – Found on the Internet
                      – Found at random
                      2. Run tools
                      3. Look for obvious issues
                      
                      slide 11:
                        Drunk Man An2-Method
                        • Drink Tune things at random unNl the problem goes away
                        
                        slide 12:
                          Blame Someone Else An2-Method
                          1. Find a system or environment component you are not
                          responsible for
                          2. Hypothesize that the issue is with that component
                          3. Redirect the issue to the responsible team
                          4. When proven wrong, go to 1
                          
                          slide 13:
                            Traffic Light An2-Method
                            1. Turn all metrics into traffic lights
                            2. Open dashboard
                            3. Everything green? No worries, mate.
                            • Type I errors: red instead of green
                            – team wastes Nme
                            • Type II errors: green instead of red
                            – performance issues undiagnosed
                            – team wastes more Nme looking elsewhere
                            Traffic lights are suitable for objec2ve metrics (eg, errors), not
                            subjec2ve metrics (eg, IOPS, latency).
                            
                            slide 14:
                              Methodologies
                              
                              slide 15:
                                Performance Methodologies
                                • For system engineers:
                                System Methodologies:
                                – ways to analyze unfamiliar systems and
                                applicaNons
                                • For app developers:
                                – guidance for metric and dashboard design
                                Collect your
                                own toolbox of
                                methodologies
                                Problem statement method
                                FuncNonal diagram method
                                Workload analysis
                                Workload characterizaNon
                                Resource analysis
                                USE method
                                Thread State Analysis
                                On-CPU analysis
                                CPU flame graph analysis
                                Off-CPU analysis
                                Latency correlaNons
                                Checklists
                                StaNc performance tuning
                                Tools-based methods
                                
                                slide 16:
                                  Problem Statement Method
                                  1. What makes you think there is a performance problem?
                                  2. Has this system ever performed well?
                                  3. What has changed recently?
                                  soRware? hardware? load?
                                  4. Can the problem be described in terms of latency?
                                  or run Nme. not IOPS or throughput.
                                  5. Does the problem affect other people or apps?
                                  6. What is the environment?
                                  soRware, hardware, instance types? versions? config?
                                  
                                  slide 17:
                                    FuncNonal Diagram Method
                                    1. Draw the funcNonal diagram
                                    2. Trace all components in the data path
                                    3. For each component, check performance
                                    Breaks up a bigger problem into
                                    smaller, relevant parts
                                    Eg, imagine throughput between the UCSB 360 and the
                                    UTAH PDP10 was slow…
                                    ARPA Network 1969
                                    
                                    slide 18:
                                      Workload Analysis
                                      • Begin with applicaNon metrics & context
                                      • A drill-down methodology
                                      • Pros:
                                      – ProporNonal, accurate metrics
                                      – App context
                                      Workload
                                      ApplicaNon
                                      System Libraries
                                      • Cons:
                                      System Calls
                                      – Difficult to dig from app to resource
                                      – App specific
                                      Kernel
                                      Hardware
                                      Analysis
                                      
                                      slide 19:
                                        Workload CharacterizaNon
                                        • Check the workload, not resulNng performance
                                        Workload
                                        • Eg, for CPUs:
                                        Who: which PIDs, programs, users
                                        Why: code paths, context
                                        What: CPU instrucNons, cycles
                                        How: changing over Nme
                                        Target
                                        
                                        slide 20:
                                          Workload CharacterizaNon: CPUs
                                          Who
                                          Why
                                          top
                                          CPU profile
                                          CPU flame graphs
                                          How
                                          What
                                          monitoring
                                          PMCs
                                          CPI flame graph
                                          
                                          slide 21:
                                            Most companies and monitoring products today
                                            Who
                                            Why
                                            top
                                            CPU profile
                                            CPU flame graphs
                                            How
                                            What
                                            monitoring
                                            PMCs
                                            CPI flame graph
                                            We can do
                                            bejer
                                            
                                            slide 22:
                                              Resource Analysis
                                              • Typical approach for system performance analysis:
                                              begin with system tools & metrics
                                              Workload
                                              • Pros:
                                              – Generic
                                              – Aids resource perf tuning
                                              ApplicaNon
                                              • Cons:
                                              System Libraries
                                              – Uneven coverage
                                              – False posiNves
                                              System Calls
                                              Kernel
                                              Hardware
                                              Analysis
                                              
                                              slide 23:
                                                The USE Method
                                                • For every resource, check:
                                                Utilization: busy time
                                                Saturation: queue length or time
                                                Errors: easy to interpret (objective)
                                                Starts with the questions, then finds the tools
                                                Eg, for hardware, check every resource incl. busses:
                                                
                                                slide 24:
                                                  http://www.brendangregg.com/USEmethod/use-rosetta.html
                                                  
                                                  slide 25:
                                                    http://www.brendangregg.com/USEmethod/use-freebsd.html
                                                    
                                                    slide 26:
                                                      
                                                      slide 27:
                                                        Apollo Lunar Module Guidance Computer
                                                        performance analysis
                                                        CORE SET
                                                        AREA
                                                        VAC SETS
                                                        ERASABLE
                                                        MEMORY
                                                        FIXED
                                                        MEMORY
                                                        
                                                        slide 28:
                                                          USE Method: SoRware
                                                          • USE method can also work for soRware resources
                                                          – kernel or app internals, cloud environments
                                                          – small scale (eg, locks) to large scale (apps). Eg:
                                                          • Mutex locks:
                                                          – uNlizaNon à lock hold Nme
                                                          – saturaNon à lock contenNon
                                                          – errors à any errors
                                                          • EnNre applicaNon:
                                                          – uNlizaNon à percentage of worker threads busy
                                                          – saturaNon à length of queued work
                                                          – errors à request errors
                                                          Resource
                                                          UNlizaNon
                                                          (%)
                                                          
                                                          slide 29:
                                                            RED Method
                                                            • For every service, check these are within SLO/A:
                                                            Request rate
                                                            Error rate
                                                            Dura=on (distribuNon)
                                                            Metrics
                                                            Database
                                                            User
                                                            Database
                                                            Another exercise in posing quesNons from
                                                            funcNonal diagrams
                                                            Load
                                                            Balancer
                                                            Web
                                                            Proxy
                                                            Payments
                                                            Server
                                                            Web Server
                                                            By Tom Wilkie: hjp://www.slideshare.net/weaveworks/monitoring-microservices
                                                            Asset
                                                            Server
                                                            
                                                            slide 30:
                                                              Thread State Analysis
                                                              State transiNon diagram
                                                              IdenNfy & quanNfy Nme in states
                                                              Narrows further analysis to state
                                                              Thread states are applicable to all apps
                                                              
                                                              slide 31:
                                                                TSA: eg, OS X
                                                                Instruments: Thread States
                                                                
                                                                slide 32:
                                                                  TSA: eg, RSTS/E
                                                                  RSTS: DEC OS from the 1970's
                                                                  TENEX (1969-72) also had Control-T
                                                                  for job states
                                                                  
                                                                  slide 33:
                                                                    TSA: Finding FreeBSD Thread States
                                                                    # dtrace -ln sched:::
                                                                    PROVIDER
                                                                    sched
                                                                    sched
                                                                    sched
                                                                    sched
                                                                    sched
                                                                    sched
                                                                    sched
                                                                    sched
                                                                    sched
                                                                    […]
                                                                    MODULE
                                                                    kernel
                                                                    kernel
                                                                    kernel
                                                                    kernel
                                                                    kernel
                                                                    kernel
                                                                    kernel
                                                                    kernel
                                                                    kernel
                                                                    probes
                                                                    FUNCTION NAME
                                                                    none preempt
                                                                    none dequeue
                                                                    none enqueue
                                                                    none off-cpu
                                                                    none on-cpu
                                                                    none remain-cpu
                                                                    none surrender
                                                                    none sleep
                                                                    none wakeup
                                                                    struct thread {
                                                                    […]
                                                                    enum {
                                                                    TDS_INACTIVE = 0x0,
                                                                    TDS_INHIBITED,
                                                                    TDS_CAN_RUN,
                                                                    TDS_RUNQ,
                                                                    TDS_RUNNING
                                                                    } td_state;
                                                                    thread flags
                                                                    […]
                                                                    #define KTDSTATE(td)
                                                                    (((td)->gt;td_inhibitors & TDI_SLEEPING) != 0 ? "sleep" :
                                                                    ((td)->gt;td_inhibitors & TDI_SUSPENDED) != 0 ? "suspended" :
                                                                    ((td)->gt;td_inhibitors & TDI_SWAPPED) != 0 ? "swapped" :
                                                                    ((td)->gt;td_inhibitors & TDI_LOCK) != 0 ? "blocked" :
                                                                    ((td)->gt;td_inhibitors & TDI_IWAIT) != 0 ? "iwait" : "yielding")
                                                                    
                                                                    slide 34:
                                                                      TSA: FreeBSD
                                                                      # ./tstates.d
                                                                      Tracing scheduler events... Ctrl-C to end.
                                                                      Time (ms) per state:
                                                                      COMM
                                                                      PID
                                                                      CPU
                                                                      RUNQ
                                                                      SLP
                                                                      irq14: ata0
                                                                      irq15: ata1
                                                                      swi4: clock (0) 12
                                                                      usbus0
                                                                      [...]
                                                                      sshd
                                                                      0 10011
                                                                      devd
                                                                      dtrace
                                                                      4 10006
                                                                      rand_harvestq
                                                                      kernel
                                                                      sshd
                                                                      intr
                                                                      cksum
                                                                      cksum
                                                                      cksum
                                                                      idle
                                                                      DTrace proof of concept
                                                                      SUS
                                                                      SWP
                                                                      LCK
                                                                      IWT
                                                                      YLD
                                                                      hjps://github.com/brendangregg/DTrace-tools/blob/master/sched/tstates.d
                                                                      
                                                                      slide 35:
                                                                        On-CPU Analysis
                                                                        1. Split into user/kernel states
                                                                        – /proc, vmstat(1)
                                                                        2. Check CPU balance
                                                                        – mpstat(1), CPU uNlizaNon heat map
                                                                        3. Profile soRware
                                                                        – User & kernel stack sampling (as a CPU flame graph)
                                                                        4. Profile cycles, caches, busses
                                                                        – PMCs, CPI flame graph
                                                                        CPU UNlizaNon
                                                                        Heat Map
                                                                        
                                                                        slide 36:
                                                                          CPU Flame Graph Analysis
                                                                          1. Take a CPU profile
                                                                          2. Render it as a flame graph
                                                                          3. Study largest "towers" first
                                                                          Discovers issues by their CPU usage
                                                                          Directly: CPU consumers
                                                                          Indirectly: iniNalizaNon of I/O, locks, Nmes, ...
                                                                          Narrows target of study
                                                                          Flame Graph
                                                                          
                                                                          slide 37:
                                                                            CPU Flame Graphs: FreeBSD
                                                                            • Use either DTrace or pmcstat. Eg, kernel CPU with DTrace:
                                                                            git clone https://github.com/brendangregg/FlameGraph; cd FlameGraph
                                                                            dtrace -n 'profile-99 /arg0/ { @[stack()] = count(); } tick-30s { exit(0); }' >gt; stacks01
                                                                            stackcollapse.pl gt; stacks01.svg
                                                                            • Both user & kernel CPU:
                                                                            dtrace -x ustackframes=100 -x stackframes=100 -n '
                                                                            profile-99 { @[stack(), ustack(), execname] = sum(1); }
                                                                            tick-30s,END { printa("%k-%k%s\n%@d\n", @); trunc(@); exit(0); }' >gt; stacks02
                                                                            hjp://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#DTrace
                                                                            
                                                                            slide 38:
                                                                              Java Mixed-Mode CPU Flame Graph
                                                                              Kernel
                                                                              (C)
                                                                              User
                                                                              (C)
                                                                              By sampling stack traces with:
                                                                              • -XX:+PreserveFramePointer
                                                                              • Java perf-map-agent
                                                                              Java
                                                                              JVM
                                                                              (C++)
                                                                              
                                                                              slide 39:
                                                                                CPI Flame Graph: BSD
                                                                                hjp://www.brendangregg.com/blog/2014-10-31/cpi-flame-graphs.html
                                                                                A CPU flame graph (cycles) colored using instructions/stall profile data
                                                                                eg, using FreeBSD pmcstat:
                                                                                red == instrucNons
                                                                                blue == stalls
                                                                                
                                                                                slide 40:
                                                                                  Off-CPU Analysis
                                                                                  Analyze off-CPU Nme via blocking code
                                                                                  path: Off-CPU flame graph
                                                                                  ORen need wakeup code paths as well…
                                                                                  
                                                                                  slide 41:
                                                                                    Off-CPU Time Flame Graph: FreeBSD tar … >gt; /dev/null
                                                                                    seek
                                                                                    readahead
                                                                                    file read
                                                                                    file read
                                                                                    readahead
                                                                                    directory read
                                                                                    missing symbols (stripped)
                                                                                    Off-CPU Nme
                                                                                    Stack depth
                                                                                    
                                                                                    slide 42:
                                                                                      Off-CPU Profiling: FreeBSD
                                                                                      #!/usr/sbin/dtrace -s
                                                                                      #pragma D option ustackframes=100
                                                                                      #pragma D option dynvarsize=32m
                                                                                      offcpu.d
                                                                                      Uses DTrace
                                                                                      sched:::off-cpu /execname == "bsdtar"/ { self->gt;ts = timestamp; }
                                                                                      Change/remove as desired
                                                                                      sched:::on-cpu
                                                                                      /self->gt;ts/
                                                                                      eg, add /curthread->gt;td_state gt;ts);
                                                                                      self->gt;ts = 0;
                                                                                      dtrace:::END
                                                                                      normalize(@, 1000000);
                                                                                      printa("%k-%k%s\n%@d\n", @);
                                                                                      Warning: can have significant overhead
                                                                                      (scheduler events can be frequent)
                                                                                      # ./offcpu.d >gt; out.stacks
                                                                                      # git clone https://github.com/brendangregg/FlameGraph; cd FlameGraph
                                                                                      # stackcollapse.pl gt; out.svg
                                                                                      
                                                                                      slide 43:
                                                                                        Off-CPU Time Flame Graph: FreeBSD
                                                                                        file read
                                                                                        readahead
                                                                                        pipe write
                                                                                        tar … | gzip
                                                                                        
                                                                                        slide 44:
                                                                                          Wakeup Time Flame Graph: FreeBSD
                                                                                          Who did the wakeup:
                                                                                          kernel-stack
                                                                                          wakee
                                                                                          user-stack
                                                                                          waker
                                                                                          
                                                                                          slide 45:
                                                                                            Wakeup Profiling: FreeBSD
                                                                                            #!/usr/sbin/dtrace -s
                                                                                            wakeup.d
                                                                                            Uses DTrace
                                                                                            #pragma D option quiet
                                                                                            #pragma D option ustackframes=100
                                                                                            #pragma D option dynvarsize=32m
                                                                                            sched:::sleep /execname == "bsdtar"/ { ts[curlwpsinfo->gt;pr_addr] = timestamp; }
                                                                                            sched:::wakeup
                                                                                            Change/remove as desired
                                                                                            /ts[arg0]/
                                                                                            this->gt;delta = timestamp - ts[arg0];
                                                                                            @[args[1]->gt;p_comm, stack(), ustack(), execname] = sum(this->gt;delta);
                                                                                            ts[arg0] = 0;
                                                                                            dtrace:::END
                                                                                            normalize(@, 1000000);
                                                                                            printa("\n%s%k-%k%s\n%@d\n", @);
                                                                                            Warning: can have significant overhead
                                                                                            (scheduler events can be frequent)
                                                                                            
                                                                                            slide 46:
                                                                                              Merging Stacks with eBPF: Linux
                                                                                              Using enhanced
                                                                                              Berkeley Packet Filter
                                                                                              (eBPF) to merge stacks
                                                                                              in kernel context
                                                                                              Not available on BSD
                                                                                              (yet)
                                                                                              Stack
                                                                                              DirecNon
                                                                                              Waker task
                                                                                              Waker stack
                                                                                              Wokeup
                                                                                              Blocked stack
                                                                                              Blocked task
                                                                                              
                                                                                              slide 47:
                                                                                                Ye Olde BPF
                                                                                                Berkeley Packet Filter
                                                                                                # tcpdump host 127.0.0.1 and port 22 -d
                                                                                                OpNmizes packet filter
                                                                                                (000) ldh
                                                                                                [12]
                                                                                                performance
                                                                                                (001) jeq
                                                                                                #0x800
                                                                                                jt 2
                                                                                                jf 18
                                                                                                (002) ld
                                                                                                [26]
                                                                                                (003) jeq
                                                                                                #0x7f000001
                                                                                                jt 6
                                                                                                jf 4
                                                                                                (004) ld
                                                                                                [30]
                                                                                                2 x 32-bit registers
                                                                                                (005) jeq
                                                                                                #0x7f000001
                                                                                                jt 6
                                                                                                jf 18
                                                                                                (006) ldb
                                                                                                [23]
                                                                                                & scratch memory
                                                                                                (007) jeq
                                                                                                #0x84
                                                                                                jt 10
                                                                                                jf 8
                                                                                                (008) jeq
                                                                                                #0x6
                                                                                                jt 10
                                                                                                jf 9
                                                                                                (009) jeq
                                                                                                #0x11
                                                                                                jt 10
                                                                                                jf 18
                                                                                                User-defined bytecode
                                                                                                (010) ldh
                                                                                                [20]
                                                                                                executed by an in-kernel
                                                                                                (011) jset
                                                                                                #0x1fff
                                                                                                jt 18
                                                                                                jf 12
                                                                                                sandboxed virtual machine
                                                                                                (012) ldxb
                                                                                                4*([14]&0xf)
                                                                                                (013) ldh
                                                                                                [x + 14]
                                                                                                Steven McCanne and Van Jacobson, 1993
                                                                                                [...]
                                                                                                
                                                                                                slide 48:
                                                                                                  Enhanced BPF
                                                                                                  aka eBPF or just "BPF"
                                                                                                  10 x 64-bit registers
                                                                                                  maps (hashes)
                                                                                                  stack traces
                                                                                                  ac=ons
                                                                                                  Alexei Starovoitov, 2014+
                                                                                                  
                                                                                                  slide 49:
                                                                                                    bcc/BPF front-end (C & Python)
                                                                                                    bcc examples/tracing/bitehist.py
                                                                                                    
                                                                                                    slide 50:
                                                                                                      Latency CorrelaNons
                                                                                                      1. Measure latency histograms at
                                                                                                      different stack layers
                                                                                                      2. Compare histograms to find
                                                                                                      latency origin
                                                                                                      Even bejer, use latency heat maps
                                                                                                      Match outliers based on both latency and Nme
                                                                                                      
                                                                                                      slide 51:
                                                                                                        Checklists: eg, BSD Perf Analysis in 60s
                                                                                                        uptime
                                                                                                        dmesg -a | tail
                                                                                                        vmstat 1
                                                                                                        vmstat -P
                                                                                                        ps -auxw
                                                                                                        iostat -xz 1
                                                                                                        systat -ifstat
                                                                                                        systat -netstat
                                                                                                        top
                                                                                                        systat -vmstat
                                                                                                        load averages
                                                                                                        kernel errors
                                                                                                        overall stats by Nme
                                                                                                        CPU balance
                                                                                                        process usage
                                                                                                        disk I/O
                                                                                                        network I/O
                                                                                                        TCP stats
                                                                                                        process overview
                                                                                                        system overview
                                                                                                        adapted from hjp://techblog.neylix.com/2015/11/linux-performance-analysis-in-60s.html
                                                                                                        
                                                                                                        slide 52:
                                                                                                          Checklists: eg, Neylix perfvitals Dashboard
                                                                                                          1. RPS, CPU
                                                                                                          2. Volume
                                                                                                          3. Instances
                                                                                                          4. Scaling
                                                                                                          5. CPU/RPS
                                                                                                          6. Load Avg
                                                                                                          7. Java Heap
                                                                                                          8. ParNew
                                                                                                          9. Latency
                                                                                                          10. 99th Nle
                                                                                                          
                                                                                                          slide 53:
                                                                                                            StaNc Performance Tuning: FreeBSD
                                                                                                            
                                                                                                            slide 54:
                                                                                                              Tools-Based Method: FreeBSD
                                                                                                              Try all the tools!
                                                                                                              May be an anN-pajern
                                                                                                              
                                                                                                              slide 55:
                                                                                                                Tools-Based Method: DTrace FreeBSD
                                                                                                                Just my new BSD tools
                                                                                                                
                                                                                                                slide 56:
                                                                                                                  Other Methodologies
                                                                                                                  ScienNfic method
                                                                                                                  5 Why's
                                                                                                                  Process of eliminaNon
                                                                                                                  Intel's Top-Down Methodology
                                                                                                                  Method R
                                                                                                                  
                                                                                                                  slide 57:
                                                                                                                    What You Can Do
                                                                                                                    
                                                                                                                    slide 58:
                                                                                                                      What you can do
                                                                                                                      1. Know what's now possible on modern systems
                                                                                                                      – Dynamic tracing: efficiently instrument any soRware
                                                                                                                      – CPU faciliNes: PMCs, MSRs (model specific registers)
                                                                                                                      – VisualizaNons: flame graphs, latency heat maps, …
                                                                                                                      2. Ask quesNons first: use methodologies to ask them
                                                                                                                      3. Then find/build the metrics
                                                                                                                      4. Build or buy dashboards to support methodologies
                                                                                                                      
                                                                                                                      slide 59:
                                                                                                                        Dynamic Tracing: Efficient Metrics
                                                                                                                        Eg, tracing TCP retransmits
                                                                                                                        Kernel
                                                                                                                        Old way: packet capture
                                                                                                                        tcpdump
                                                                                                                        Analyzer
                                                                                                                        1. read
                                                                                                                        2. dump
                                                                                                                        buffer
                                                                                                                        1. read
                                                                                                                        2. process
                                                                                                                        3. print
                                                                                                                        file system
                                                                                                                        send
                                                                                                                        receive
                                                                                                                        disks
                                                                                                                        New way: dynamic tracing
                                                                                                                        Tracer
                                                                                                                        1. configure
                                                                                                                        2. read
                                                                                                                        tcp_retransmit_skb()
                                                                                                                        
                                                                                                                        slide 60:
                                                                                                                          Dynamic Tracing: Instrument Most SoRware
                                                                                                                          My Solaris/DTrace tools (many already work on BSD/DTrace):
                                                                                                                          
                                                                                                                          slide 61:
                                                                                                                            Performance Monitoring Counters
                                                                                                                            Eg, BSD PMC groups for Intel Sandy Bridge:
                                                                                                                            
                                                                                                                            slide 62:
                                                                                                                              VisualizaNons
                                                                                                                              Eg, Disk I/O latency as a heat map, quanNzed in kernel:
                                                                                                                              Post processing the output of my iosnoop tool: www.brendangregg.com/HeatMaps/latency.html
                                                                                                                              
                                                                                                                              slide 63:
                                                                                                                                Summary
                                                                                                                                • It is the crystal ball age of performance observability
                                                                                                                                • What majers is the quesNons you want answered
                                                                                                                                • Methodologies are a great way to pose quesNons
                                                                                                                                Who
                                                                                                                                Why
                                                                                                                                How
                                                                                                                                What
                                                                                                                                
                                                                                                                                slide 64:
                                                                                                                                  References & Resources
                                                                                                                                  FreeBSD @ Neylix:
                                                                                                                                  hjps://openconnect.itp.neylix.com/
                                                                                                                                  hjp://people.freebsd.org/~scojl/Neylix-BSDCan-20130515.pdf
                                                                                                                                  hjp://www.youtube.com/watch?v=FL5U4wr86L4
                                                                                                                                  USE Method
                                                                                                                                  hjp://queue.acm.org/detail.cfm?id=2413037
                                                                                                                                  hjp://www.brendangregg.com/usemethod.html
                                                                                                                                  TSA Method
                                                                                                                                  hjp://www.brendangregg.com/tsamethod.html
                                                                                                                                  Off-CPU Analysis
                                                                                                                                  hjp://www.brendangregg.com/offcpuanalysis.html
                                                                                                                                  hjp://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
                                                                                                                                  hjp://www.brendangregg.com/blog/2016-02-05/ebpf-chaingraph-prototype.html
                                                                                                                                  StaNc Performance Tuning, Richard Elling, Sun blueprint, May 2000
                                                                                                                                  RED Method: hjp://www.slideshare.net/weaveworks/monitoring-microservices
                                                                                                                                  Other system methodologies
                                                                                                                                  Systems Performance: Enterprise and the Cloud, PrenNce Hall 2013
                                                                                                                                  hjp://www.brendangregg.com/methodology.html
                                                                                                                                  The Art of Computer Systems Performance Analysis, Jain, R., 1991
                                                                                                                                  Flame Graphs
                                                                                                                                  hjp://queue.acm.org/detail.cfm?id=2927301
                                                                                                                                  hjp://www.brendangregg.com/flamegraphs.html
                                                                                                                                  hjp://techblog.neylix.com/2015/07/java-in-flames.html
                                                                                                                                  Latency Heat Maps
                                                                                                                                  hjp://queue.acm.org/detail.cfm?id=1809426
                                                                                                                                  hjp://www.brendangregg.com/HeatMaps/latency.html
                                                                                                                                  ARPA Network: hjp://www.computerhistory.org/internethistory/1960s
                                                                                                                                  RSTS/E System User's Guide, 1985, page 4-5
                                                                                                                                  DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X, and FreeBSD, PrenNce Hall 2011
                                                                                                                                  Apollo: hjp://www.hq.nasa.gov/office/pao/History/alsj/a11 hjp://www.hq.nasa.gov/alsj/alsj-LMdocs.html
                                                                                                                                  
                                                                                                                                  slide 65:
                                                                                                                                    EuroBSDcon 2017
                                                                                                                                    Thank You
                                                                                                                                    hjp://slideshare.net/brendangregg
                                                                                                                                    hjp://www.brendangregg.com
                                                                                                                                    [email protected]
                                                                                                                                    @brendangregg