perf

From Gentoo Wiki
Jump to:navigation Jump to:search

This article is a stub. Please help out by expanding it - how to get started.

perf is a tool for profiling Linux with performance counters. It can instrument CPU performance counters, tracepoints, kprobes, and uprobes (dynamic tracing). It is also capable of lightweight profiling.

Installation

USE flags

USE flags for dev-util/perf Userland tools for Linux Performance Counters

+doc Build documentation and man pages. With this USE flag disabled, the --help parameter for perf and its sub-tools will not be available. This is optional because it depends on a few documentation handling tools that are not always welcome on user systems.
+libtraceevent Enable dev-libs/libtraceevent support
+libtracefs Enable dev-libs/libtracefs support
+python Add optional support/bindings for the Python language
+slang Add support for the slang text display library (it's like ncurses, but different)
audit Enable support for Linux audit subsystem using sys-process/audit
babeltrace Enable dev-util/babeltrace support
big-endian Big-endian toolchain support
bpf Enable support for eBPF features with dev-libs/libbpf
caps Use Linux capabilities library to control privilege
capstone Use dev-libs/capstone for disassembly support
crypt Add support for encryption -- using mcrypt or gpg where applicable
debug Enable extra debug codepaths, like asserts and extra output. If you want to get meaningful backtraces see https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces
gtk Add support for x11-libs/gtk+ (The GIMP Toolkit)
java Add support for Java
libpfm Enable dev-libs/libpfm support
lzma Support for LZMA compression algorithm
numa Enable NUMA support using sys-process/numactl (NUMA kernel support is also required)
perl Add support for Perl as a scripting language for perf tools.
systemtap Add support to define SDT event in perf tools.
tcmalloc Use the dev-util/google-perftools libraries to replace the malloc() implementation with a possibly faster one
unwind Use sys-libs/libunwind for frame unwinding support.
zstd Enable support for ZSTD compression

Emerge

root #emerge --ask dev-util/perf

Usage

System wide profile

To profile the entire system, one can simply run perf stat, in the following example, it profiles the system for 5 seconds.

Note
The argument, -s 2, is the SIGINT signal (the same one that C-c sends). Using timeout without this signal will result in no output being displayed upon exit.
user $timeout -s 2 5s perf stat
 Performance counter stats for 'system wide':

        119,873.70 msec cpu-clock                        #   23.996 CPUs utilized
           608,052      context-switches                 #    5.072 K/sec
            52,088      cpu-migrations                   #  434.524 /sec
            54,966      page-faults                      #  458.533 /sec
   149,286,631,432      instructions                     #    1.63  insn per cycle
                                                  #    0.08  stalled cycles per insn
    91,639,578,301      cycles                           #    0.764 GHz
    12,251,888,547      stalled-cycles-frontend          #   13.37% frontend cycles idle
    19,086,150,804      branches                         #  159.219 M/sec
       493,116,174      branch-misses                    #    2.58% of all branches

       4.995528985 seconds time elapsed

Profiling a command

To see the profile statistics for a specific command, simply add the command after the perf stat argument. In this example, the command emerge -ep @world will be used.

user $perf stat emerge -ep @world
 Performance counter stats for 'emerge -ep @world':

         16,518.88 msec task-clock                       #    0.970 CPUs utilized
            18,177      context-switches                 #    1.100 K/sec
            15,651      cpu-migrations                   #  947.462 /sec
           324,994      page-faults                      #   19.674 K/sec
    94,669,490,050      instructions                     #    1.37  insn per cycle
                                                  #    0.14  stalled cycles per insn
    68,918,358,065      cycles                           #    4.172 GHz
    13,046,547,573      stalled-cycles-frontend          #   18.93% frontend cycles idle
    18,809,738,679      branches                         #    1.139 G/sec
       544,326,478      branch-misses                    #    2.89% of all branches

      17.034427384 seconds time elapsed

      14.761382000 seconds user
       1.708188000 seconds sys