DTrace Topics Intro

From Siwiki

Jump to: navigation, search

Contents

[edit] DTrace Topics: Introduction

This section is an introduction to DTrace, and is part of the DTrace Topics collection.

Completion Image:trafficlight_green02.png
Difficulty Image:coffeemug01.png
Audience Everyone

[edit] What is DTrace

DTrace is a dynamic troubleshooting and analysis tool first introduced in the Solaris 10 and OpenSolaris operating systems. DTrace is many things, in particular,

  • A tool
  • A programming language interpreter
  • An instrumentation framework

DTrace provides observability across the entire software stack from one tool. This allows you to examine software execution like never before.

In this section we will provide examples soon and often, as they provide something real to visualise as we learn the background behind DTrace.

[edit] Example #1

Tracing new processes system-wide can be achieved with the following DTrace one-liner,

# dtrace -n 'syscall::exece:return { trace(execname); }'
dtrace: description 'syscall::exece:return ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  76044                     exece:return   man
  0  76044                     exece:return   sh
  0  76044                     exece:return   neqn
  0  76044                     exece:return   tbl
  0  76044                     exece:return   nroff
  0  76044                     exece:return   col
  0  76044                     exece:return   sh
  0  76044                     exece:return   mv
  0  76044                     exece:return   sh
  0  76044                     exece:return   more

The above command printed a line of output as each new process was created, along with the name of the executable. Here we captured the inner working of the man command by tracing the system call layer.

Just this one-liner is very useful; previous techniques to examine process execution had certain drawbacks, and none behaved like a one-off troubleshooting tool. The syntax will be discussed later in this section.

Tracing system calls is the tip of a very large iceberg. DTrace can examine the entire software stack.

[edit] The Software Stack

Image:softwarestack02.png

Consider how the following layers could be examined before DTrace. Examples for what the layers mean have been listed:

The Software Stack
Layer Layer Examples Previous Analysis DTrace Visibility
Dynamic Languages Java, JavaScript, ... debuggers Yes, with providers
User Executable /usr/bin/*, compiled C code truss -ua.out Yes
Libaries /usr/lib/*, compiled C code apptrace, sotruss, truss Yes
Syscall Interface man -s2, read(), write(), fork(), ... truss Yes
Kernel VFS, DNLC, ZFS, TCP, IP, ..., sd, st, hme, ... prex; tnf*, lockstat, mdb, adb Yes
Hardware disk data controller kstat, cpustat, cputrack, busstat, guesswork No. Indirectly, yes

It was already possible to measure most of these layers, but it was at times difficult to use or even find these tools. The truss command is in common usage, however not many people found "truss -u" from the man page. Some of these tools were rarely found by customers at all, such as prex, tnfdump/tnfxtract and busstat. Even if one were to learn all these tools well, gathering their output into one place for cross-analysis adds further challenges.

DTrace is all seeing; from one tool we can analyse the entire software stack.

[edit] What DTrace is Like

DTrace has the combined capabilities of numerous previous tools and more:

DTrace Simalarities
Tool Capability
truss -ua.out tracing user functions
apptrace tracing library calls
truss tracing system calls
prex; tnf* tracing some kernel functions
lockstat profiling the kernel
mdb -k accessing kernel VM
mdb -p accessing process VM

DTrace also provides a programming language similar to C and awk.

[edit] Syscall Example: truss

The following is an example of using truss to examine the syscalls performed by the date command:

$ truss date
execve("/usr/bin/date", 0x08047C9C, 0x08047CA4)  argc = 1
resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12
resolvepath("/usr/bin/date", "/usr/bin/date", 1023) = 13
xstat(2, "/usr/bin/date", 0x08047A58)           = 0
open("/var/ld/ld.config", O_RDONLY)             = 3
fxstat(2, 3, 0x08047988)                        = 0
mmap(0x00000000, 152, PROT_READ, MAP_SHARED, 3, 0) = 0xFEFB0000
close(3)                                        = 0
mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1
sysconfig(_CONFIG_PAGESIZE)                     = 4096
[...]

truss has been an invaluable tool for providing syscall tracing, which is often representive of process behaviour (but not always). However there were a few drawbacks with truss, in particular:

  • truss only examines 1 process at a time
  • Output is limited to the options provided by truss
  • truss slows down the target

There are many options to truss that provide some excellent forms of output, and more can be added. However there was no way around single process tracing and slowing down the target.

[edit] Syscall Example: DTrace

The following example shows syscall tracing using DTrace:

# dtrace -n 'syscall:::entry { printf("%16s %x %x", execname, arg0, arg1); }'
dtrace: description 'syscall:::entry ' matched 233 probes
CPU     ID                    FUNCTION:NAME
  1  75943                       read:entry             Xorg f 8047130
  1  76211                  setitimer:entry             Xorg 0 8047610
  1  76143                     writev:entry             Xorg 22 80477f8
  1  76255                    pollsys:entry             Xorg 8046da0 1a
  1  75943                       read:entry             Xorg 22 85121b0
  1  76035                      ioctl:entry      soffice.bin 6 5301
  1  76035                      ioctl:entry      soffice.bin 6 5301
  1  76255                    pollsys:entry      soffice.bin 8047530 2
[...]

Some advangates with DTrace:

  • Every process, or specific processes, can be traced simultaneously.
  • The output is completely customizable.
  • Minimum performance cost.

[edit] What is DTrace for

  • Troubleshooting software bugs
    • Proving what the problem is, and isn't.
    • Measuring the magnitude of the problem.
  • Detailed observability
    • Observing devices, such as disk or network activity.
    • Observing applications, whether they are from Solaris, 3rd party, or in-house.
  • Capturing profiling data for performance analysis
    • If there is latency somewhere, DTrace can find it

[edit] What isn't DTrace

  • DTrace isn't a replacement for kstat or SMNP
    • kstat already provides inexpensive long term monitoring.
  • DTrace isn't sentient, it needs to borrow your brain to do the thinking
  • DTrace isn't “dTrace”

[edit] Who is DTrace for

[edit] Application Developers

  • Fetch in-flight profiling data without restarting the apps, even on customer production servers.
  • Detailed visibility of all the functions that they wrote, and the rest of the software stack.
  • Add static probes as a stable debug interface.

[edit] Application Support

  • Provides a comprehensive insight into application behavior.
  • Analyze faults and root-cause performance issues.
  • Prove where issues are, and measure their magnitude.

[edit] System Administrators

  • Troubleshoot, analyze, investigate where never before.
  • See more of your system - fills in many observability gaps.

[edit] Database Administrators

  • Analyze throughput performance issues across all system components.

[edit] Security Administrators

  • Customized short-term auditing
  • Malware deciphering

[edit] Kernel Engineers

  • Fetch kernel trace data from almost every function.
  • Function arguments are auto-casted providing access to all struct members.
  • Fetch nanosecond timestamps for function execution.
  • Troubleshoot device drivers, including during boot.
  • Add statically defined trace points for debugging.

[edit] How to use DTrace

DTrace can be used by either,

  1. Running prewritten one-liners and scripts
  2. Writing your own one-liners and scripts
    • Encouraged - the possibilities are endless
    • It helps to know C
    • It can help to know operating system fundamentals

[edit] DTrace Wins

  • Finding unnecessary work
    Having deep visibility often finds work being performed that isn't needed. Eliminating these can produce the biggest DTrace wins – 2x, 20x, etc.
  • Solving performance issues
    Being able to measure where the latencies are, and show what their costs are. These can produce typical performance wins – 5%, 10%, etc.
  • Finding bugs
    Many bugs are found though static debug frameworks; DTrace is a dynamic framework that allows custom and comprehensive debug info to be fetched when needed.
  • Proving performance issues
    Many valuable DTrace wins have no immediate percent improvement, they are about gathering evidence to prove the existence and magnitude of issues.

[edit] Example Scenario

It can be valuable to illustrate scenarios where DTrace can be of use in the real world.

The following examples are fictional and any resembelence to real people or companies is purely accidental!

[edit] Measuring Fault Magnitude

In this example, a customer has a performance issue on their system,

Image:dtrace_scenario02.png

The customer asks their software vendors for help.

[edit] Without DTrace

The vendors suggest numerous different theories, everything from patch levels to firmware bugs.

No one can identify the issue, and the vendors appear at times to be blaming one another,

Image:dtrace_scenario3.png

The customer is asked to perform numerous expirements to attempt to identify the problem. This results in either,

  • finding the problem - a happy ending.
  • finding a problem - this may be the problem, or a symptom of the problem, or another issue altogether.
  • finding nothing conclusive.

For example, the customer uses several performance analysis tools and finds,

  • the CPUs are usually 100% busy
  • the network interfaces are pushing around 50,000 packets per second.

The customer suspects that the performance issue is caused by the CPUs being busy, and that the CPUs being busy is caused by the the CPU cost of processing so many packets. They snoop the network packets and find that most are repeated hostname lookups, and guess that these were caused by a bug in the application. They then ask the application vendor to fix the issue,

Image:dtrace_scenario6.png

The application vendor isn't convinced that the customer has found the issue, and can't cost-justify the time to fix this without proof. The situation is a stalemate.

[edit] With DTrace

Performance issues can be complex due to the numerous interactions between layers of the software stack. Previous debugging tools could only observe portions of that stack - making it difficult or sometimes impossible to analyse software interaction. DTrace can examine the entire software stack, and can also measure the nanosecond time cost of software events. So rather than suggesting where latencies may be due to statistic counters, DTrace can prove where latencies are using nanosecond timestamps, no matter where they are.

For our example scenario, the customer uses DTrace to measure:

  • Application latency caused by threads waiting on CPU run queues (saturation), and waiting for system calls to service (utilization).
    They can now prove that CPU load is indeed the problem.
  • Time spent servicing network interrupts.
    They then can prove that the CPU load is caused by network packets.
  • Application stack traces when network I/O occurs.
    They prove that the network packets causing the issue are caused by the application.

After this analysis, the customer can prove that they found the issue, and with DTrace they are able to measure its magnitude in terms of nanoseconds or ratio of transaction time wasted,

Image:dtrace_scenario07.png

The application vendor is happy that the customer has measured the problem, as it helps them cost-justify the programmer's time required to fix the issue.

Continuing with this scenario... The application vendor can't simulate the problem - it only happens on the customer's site with a production load. So they then use DTrace on the customer's application (without restarting it) to quickly identify why they have caused so many hostname lookups, and how they are being called.

They find that their application calls the standard OS network libraries in the standard way, which were supposed to cache hostnames. A test program proves that there is a bug is in the OS library,

Image:dtrace_scenario08.png

The OS vendor has also never seen this issue in testing, and uses DTrace on live production environments to see it for themselves.

The bug is fixed, and everyone is happy.

[edit] Answers to initial questions

  • DTrace is not available for Solaris 9.
  • You need to be root, or have the correct privileges, to run /usr/sbin/dtrace.
  • There is a GUI called chime.
  • DTrace is safe for production use, provided you don't deliberately try to cause harm.
  • DTrace has low impact when in use, and zero impact when not.

[edit] Terminology

Learning terminology can be the easy part. For DTrace there aren't too many terms to learn, and stepping through them will cover many important DTrace topics.

# dtrace -n 'syscall::exece:return { trace(execname); }'
dtrace: description 'syscall::exece:return ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  76044                     exece:return   man  
  0  76044                     exece:return   sh   
  0  76044                     exece:return   neqn 
[...]

The above output contained,

  • consumer: "dtrace"
    The /usr/sbin/dtrace command is a consumer of the DTrace framework (via libdtrace(3LIB)).
  • probe: "syscall::exece:return"
    This describes what points in software to instrument and trace.
  • action: "{ trace(execname); }"
    The action is what happens when an instrumentation point is hit. This action prints output.

[edit] Consumer

libdtrace(3LIB) is the library that is the interface between userland tools such as /usr/sbin/dtrace and the kernel DTrace framework. The following consumers of libdtrace(3LIB) all live under /usr/sbin,

  • dtrace
    command line and scripting interface
  • lockstat
    kernel lock statistics
  • plockstat
    user-level lock statistics
  • intrstat
    run-time interrupt statistics

libdtrace is currently a private interface and not to be used directly (nor is there any great reason to); the supported interface is dtrace(1M). NOTE: You are still encouraged to use libkstat(3LIB) and proc(4) directly, rather than wrapping /usr/bin consumers.

[edit] Privileges

Only the root user and users who have been granted DTrace privileges can use DTrace. Solaris provides different DTrace privileges for different tasks, which can be assigned to developers and users as appropriate. These privileges are from the Solaris 10 "Least Privilege" feature. See the Guide for more info.

[edit] Probes

Data is generated from instrumentation points called “probes”. DTrace provides thousands of probes. Probe examples:

Probe Examples
Probe Name Description
syscall::read:entry A read() syscall began
proc:::exec-success A process created successfully
io:::start An I/O was issued (disk/vol/NFS)
io:::done An I/O completed

The component names will be discussed in the following sections.

[edit] Listing Probes

dtrace -l lists all currently available probes that you have privilege to see, with one probe per line:

# dtrace -l | more
   ID   PROVIDER            MODULE                          FUNCTION NAME
    1     dtrace                                                     BEGIN
    2     dtrace                                                     END
    3     dtrace                                                     ERROR
    4      sched                FX                          fx_yield schedctl-yield
    5      sched                FX                fx_change_priority change-pri
    6      sched                FX                          fx_yield change-pri
    7      sched                FX                         fx_wakeup change-pri
    8      sched                FX                        fx_preempt change-pri
    9      sched                FX                           fx_tick schedctl-nopreempt
   10      sched                FX                        fx_preempt schedctl-nopreempt
   11      sched                FX                        fx_preempt schedctl-preempt
   12        fbt                FX                           fx_init entry
[...]
# dtrace -l | wc -l
   69880

Here the root user sees 69,879 available probes. The probe count changes – it is dynamic (DTrace).

[edit] Probe Names

Probe names are a four-tuple: provider:module:function:name. Eg, fbt:zfs:arc_read:entry.

  • Provider
    A library of related probes.
  • Module
    The module the function belongs to, either a kernel module or user segment.
  • Function
    The function name that contains the probe.
  • Name
    The name of the probe.

[edit] Tracing Probes

dtrace -n takes a probe name and enables tracing:

# dtrace -n syscall::exece:return
dtrace: description 'syscall::exece:return' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  76044                     exece:return 
  0  76044                     exece:return 
  0  76044                     exece:return 
  0  76044                     exece:return 
^C

The default output contains,

  • CPU: CPU id that event occured on (if this changes, the output may be shuffled)
  • ID: DTrace probe id
  • FUNCTION:NAME: Part of the probe name

[edit] Actions

When a probe fires, an action executes. The default action is to print the output seen privously using dtrace -n. Actions are written in the D programming language.

Actions can,

  • print output
  • save data to variables, and perform calculations
  • walk kernel or process memory

With destructive actions allowed (using dtrace -w), actions can,

  • raise signals on processes
  • execute shell commands
  • write to some areas of memory

[edit] Command Line Usage

The syntax for using actions at the command line is,

# dtrace -n 'probename { action }'

Multiple probes with actions can be used,

# dtrace -n 'probename0 { action0 } probename1 { action1 } probename2 { action2 }'

Although this starts to become clumsy at the command line, and is better written as a DTrace script.

[edit] trace() example

# dtrace -n 'syscall::exece:return { trace(execname); }'
dtrace: description 'syscall::exece:return ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  76044                     exece:return   man  
  0  76044                     exece:return   sh   
  0  76044                     exece:return   neqn 
  0  76044                     exece:return   tbl  
  0  76044                     exece:return   nroff
  0  76044                     exece:return   col   
[...]

The trace() action accepts one argument and prints it when the probe fired.

[edit] printf() example

# dtrace -n 'syscall::exece:return { printf("%6d %s\n", pid, execname); }'
dtrace: description 'syscall::exece:return ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  74415                     exece:return   4301 sh
  0  74415                     exece:return   4304 neqn
  0  74415                     exece:return   4305 nroff
  0  74415                     exece:return   4306 sh
  0  74415                     exece:return   4308 sh
[...]

DTrace ships with a powerful <tt>printf(), to print formatted output.

[edit] Variables

[edit] Default Variables

Numerous predefined variables are provided for use, including,

pid Process ID
tid Thread ID
timestamp Nanosecond timestamp since boot
execname Process name
arg0, ... Function arguments and return value
errno Last syscall failure error code
probefunc Probe function name (3rd field)
probename Probe name (4th field)
curpsinfo Struct contating current process info, eg, curpsinfo->pr_psargs – process + args

Pointers and structs! DTrace can walk memory using C syntax, and has kernel types predefined.

[edit] curthread

curthread Pointer to current kthread_t

From here you can walk kernel memory and answer endless questions about OS internals.

Eg, the current process user_t is curthread->t_procp->p_user. Now, you might not ever need to use curthread, but it is good to know that you can. (curthread is like the down staircase in nethack, angband, moria, ... you can play for years and still not reach level 50).

[edit] Variable Types

DTrace supports the following variable types for use in one-liners and scripts,

  • Integers
  • Structs
  • Pointers
  • Strings
  • Associative arrays
  • Aggregates

Including types from /usr/include/sys, eg uint32_t.

[edit] Aggregations

A great feature of DTrace is to process data as it is captured, such as using aggregations.

For example, the following one-liner frequency counts syscalls using an aggregation,

# dtrace -n 'syscall:::entry { @num[probefunc] = count(); }'
dtrace: description 'syscall:::entry ' matched 233 probes
^C

 close                                                             1
 fcntl                                                             1
[...]
 lwp_sigmask                                                      32
 setcontext                                                       32
 setitimer                                                        60
 p_online                                                         64
 portfs                                                          132
 writev                                                          170
 write                                                           257
 read                                                            896
 pollsys                                                         959
 ioctl                                                          1253

@num is the aggregation variable, probefunc is the key, and count() is the aggregating function.

[edit] Aggregating Functions

These include,

count() count events, useful for frequency counts
sum(value) sum the value
avg(value) average the value
min(value) find the value minimum
max(value) find the value maximum
quantize(value) generate power-2 distribution plots of the value

[edit] Quantize

Very cool function, here we quantize write sizes:

# dtrace -n 'sysinfo:::writech { @dist[execname] = quantize(arg0); }'
dtrace: description 'sysinfo:::writech ' matched 4 probes
^C
[...]
  ls                                                
           value  ------------- Distribution ------------- count    
               4 |                                         0        
               8 |                                         2        
              16 |                                         0        
              32 |@@@@@@@@@@@@@@@@@@@                      118      
              64 |@@@@@@@@@@@@@@@@@@@@@                    127      
             128 |                                         0      
[...]

Here we see that ls processes usually write between 32 and 127 bytes. Makes sense?

# ls -l /etc
total 793
lrwxrwxrwx   1 root     root          12 Mar 21 03:28 TIMEZONE -> default/init
drwxr-xr-x   4 root     sys            6 Apr 16 06:59 X11
drwxr-xr-x   2 adm      adm            3 Mar 20 09:25 acct 
drwxr-xr-x   3 root     root           3 Apr 16 23:11 ak
lrwxrwxrwx   1 root     root          12 Mar 21 03:28 aliases -> mail/aliases
drwxr-xr-x   5 root     sys            5 Feb 20 23:29 amd64
drwxr-xr-x   7 root     bin           18 Mar 20 09:20 apache
drwxr-xr-x   4 root     bin            7 Feb 20 23:12 apache2
drwxr-xr-x   2 root     sys            5 Feb 20 23:27 apoc
-rw-r--r--   1 root     bin         1012 Mar 20 09:33 auto_home
-rw-r--r--   1 root     bin         1066 Mar 20 09:33 auto_master
lrwxrwxrwx   1 root     root          16 Mar 21 03:28 autopush -> ../sbin/autopu
[...]

ls writes one line at a time, each around 80 chars long. The distribution plot was spot on.

[edit] Predicates

DTrace predicates are used to filter probes, so that the action fires when a conditional is true.

probename /predicate/ { action }

Eg, syscalls for processes called “bash”,

# dtrace -n 'syscall:::entry /execname == "bash"/ { @num[probefunc] = count(); }'
dtrace: description 'syscall:::entry ' matched 233 probes
^C

  exece                                                             2
  forksys                                                           2
  lwp_self                                                          2
  schedctl                                                          2
  waitsys                                                           2
  getpid                                                            4
  gtime                                                             4
  setpgrp                                                          12
  ioctl                                                            28
  read                                                             29
  write                                                            31
  lwp_sigmask                                                      42
  sigaction                                                        62

[edit] Scripting

If your DTrace one-liners get a bit too long, it may be easier to write them as scripts. A common form of DTrace script (or D script) is one that begins with the line "#!/usr/sbin/dtrace -s" and is an executable file.

The following script measures syscalls for processes called "bash",

bash-syscalls.d

#!/usr/sbin/dtrace -s

syscall:::entry
/execname == "bash"/
{
        @num[probefunc] = count();
}

Make the file executable,

# chmod 755 bash-syscalls.d

And it can be run like a regular command,

# ./bash-syscalls.d
dtrace: script './bash-syscalls.d' matched 233 probes
^C

  rexit                                                             2
  exece                                                             3
  forksys                                                           4
  lwp_self                                                          4
  schedctl                                                          4
  waitsys                                                           4
  getpid                                                            8
  gtime                                                             9
  read                                                             13
  stat64                                                           18
  write                                                            19
  setpgrp                                                          24
  ioctl                                                            56
  lwp_sigmask                                                      84
  sigaction                                                       124

[edit] Usage Features

The following are some of the usage features that make DTrace invaluable.

[edit] Measuring Time

Access to high resolution timestamps is of particular use for performance analysis.

  • timestamp - time since boot in nanoseconds
  • vtimestamp - thread on-CPU timestamp

Measuring these for application and operating system function calls will answer,

  • timestamp - where is the latency?
  • vtimestamp - why are the CPUs busy?

[edit] Printing Stacks

Printing user and kernel stack traces explains both why and the how something happened.

For example, why is bash calling read()? Using ustack() action,

# dtrace -n 'syscall::read:entry /execname == "bash"/ { ustack(); }'
dtrace: description 'syscall::read:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  74314                       read:entry 
              libc.so.1`_read+0x7
              bash`rl_getc+0x22
              bash`rl_read_key+0xad
              bash`readline_internal_char+0x5f
              bash`0x80b1171
              bash`0x80b118c
              bash`readline+0x3a
              bash`0x806775b
              bash`0x80676c4
              bash`0x8067f40
              bash`0x8068927
              bash`0x8068492
              bash`yyparse+0x155
              bash`parse_command+0x59
              bash`read_command+0xd8
              bash`reader_loop+0x10f
              bash`main+0x647
              bash`_start+0x7a

[...]

[edit] Sampling

DTrace isn't just about tracing events, DTrace can also sample at customized rates.

Eg, sampling 5-level user stack traces from Xorg,

# dtrace -n 'profile-1001 /execname == "Xorg"/ { @[ustack(5)] = count(); }'
dtrace: description 'profile-1001 ' matched 1 probe
^C 

              Xorg`Xalloc+0x14
              Xorg`AllocatePixmap+0x35
              libfb.so`fbCreatePixmap+0xb8
              nvidia_drv.so`_nv000398X+0x8f
                1
 
[...]

              libfb.so`fbSolid+0x2c6
              libfb.so`fbFill+0xb8
              libfb.so`fbPolyFillRect+0x1d5
              nvidia_drv.so`0xfe09e87b
              Xorg`miColorRects+0x124
               41

              nvidia_drv.so`_nv000592X+0x3d
              0x1016c00
               87

Looks like the nvidia driver was on-CPU 87 times as we sampled at 1001 Hertz.

[edit] See Also

Solaris Internals
Personal tools
The Books
The Ads