DTrace Topics Strategy

From Siwiki

Jump to: navigation, search

Contents

[edit] DTrace Topics: Strategy

This article is about DTrace strategies, and is part of the DTrace Topics collection. A general understanding of DTrace is assumed knowledge, which can be studied from the DTrace Topics: Intro section.

DTrace is a dynamic troubleshooting and analysis tool first introduced in the Solaris 10 and OpenSolaris operating systems.

Completion Image:trafficlight_yellow02.png
Difficulty Image:coffeemug01.png Image:coffeemug01.png Image:coffeemug01.png
Audience All DTrace users

[edit] Strategies

This page documents various DTrace strategies and techniques for approaching problems, which may be especially helpful if you are stuck on a problem and don't know where to start.

[edit] Start Simple

If you have a complex application and don't know where to start DTracing, it may help to write simple versions of the application and practise DTracing them first.

For example, imagine you have a complex application that appears to experience latency when writing to its log files. It may help to write a dummy log file writer - a simple program that writes to a file at a particular rate, and practise DTracing that first. Once you become comfortable DTracing known and simple programs (you know what to look for and how to do it), you can then apply those skills to DTracing the complex application.

Also consider writing simple programs with faults, and see how easy it is to identify these known faults using DTrace. The DTrace Topics: Exercises page has sample programs for you to practise DTracing, and questions to answer.

"But I already have a faulty program to DTrace - the application!"

I know, but your application may have complexities that make DTracing it very challenging. If you have spent a few days with DTrace and don't seem to be getting anywhere, try the simple program approach.

There is another reason to write these too - imagine you DTrace your application and find a bug in a library, the kernel, or a device driver; if you raise this with the appropriate vendor, they may blame other complexities in your application or environment. If instead you send them a 20 line C program that exhibits the same bug, not only does it eliminate other factors, but it lets them compile the program and see the bug for themselves. Immediately.

[edit] Known Counts

If you are trying to find probes of interest from the thousands that are available, it can help to create a known number of events and then frequency count the available probes to see what appears related.

For example, lets say I am interested in tracing ZFS directory creation from within the kernel. I'll use the fbt provider, but even narrowing this to just the ZFS module leaves many to pick from,

# dtrace -ln 'fbt:zfs::entry' | wc -l
    1300

How do I know which to use from the 1299 available probes? (1300 included the header line).

Assuming that no one else is creating directories on this system, I ran the following one-liner to frequency count all ZFS probe events while creating 23 directories in another window (by running mkdir 23 times). 23 is used as it is a reasonably small prime number, and less likely to clash with other regular events. I study the output for events that occured 23 times, or a multiple of 23.

# dtrace -n 'fbt:zfs::entry { @[probefunc] = count(); }'
dtrace: description 'fbt:zfs::entry ' matched 1299 probes
^C

  arc_read_done                                                     1
  dnode_buf_pageout                                                 1
  vdev_cache_allocate                                               1
  vdev_cache_evict                                                  1
  vdev_cache_fill                                                   1
  vdev_cache_hit                                                    1
  vdev_dtl_contains                                                 1
  vdev_mirror_child_select                                          1
  zap_cursor_init                                                   1
  zio_checksum_error                                                1
[...output truncated...]
  space_map_vacate                                                 20
  vdev_space_update                                                20
  mzap_create_impl                                                 23
  zap_create                                                       23
  zfs_mkdir                                                        23
  vdev_label_offset                                                24
  vdev_label_write                                                 24
[...output truncated...]
  zfs_ioc_pool_configs                                             42
  zfs_secpolicy_none                                               42
  bplist_enqueue                                                   46
  bplist_enqueue_deferred                                          46
  dbuf_create_bonus                                                47
  dmu_object_alloc                                                 47
[...output truncated...]
  zfs_getattr                                                    4737
  dmu_object_size_from_db                                        4785
  zfs_ace_access                                                 5122
  vdev_cache_offset_compare                                      5307
  vdev_lookup_top                                                6134
  zio_next_stage                                                 8135
  space_map_seg_compare                                         28385

The output was many pages long and has been truncated to fit here. From the 1299 available probes, only 3 occured 23 times - mzap_create_impl, zap_create and zfs_mkdir. zfs_mkdir sounds like the probe I was looking for.

[edit] Elapsed Times

Elapsed time is the time from one event to another. This may include time that a thread spent off-CPU due to I/O or scheduling, and is a useful metric for understanding response latency. Use the timestamp variable to measure elapsed times.

The following script, bash.d, measures the elapsed time for the read() syscall of any bash process running on the system,

#!/usr/sbin/dtrace -s 

syscall::read:entry
/execname == "bash"/
{
        self->start = timestamp;
}

syscall::read:return
/self->start/
{
        this->elapsed = timestamp - self->start;
        printf("%d ms", this->elapsed / 1000000);
        self->start = 0;
}

Running this script prints inter-keystroke times as I type in another window on the system,

# ./bash.d
dtrace: script './bash.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0  74315                      read:return 49 ms
  0  74315                      read:return 90 ms
  0  74315                      read:return 77 ms
  0  74315                      read:return 139 ms
  0  74315                      read:return 126 ms
^C

Great stuff.

[edit] On-CPU Times

On CPU time is the time a thread spent on-CPU from one event to another. This excludes time the thread was off-CPU, such as due to I/O or scheduling. This is a useful metric for understanding CPU load, and identifying what is consuming CPU. Use the vtimestamp variable for measuring on-CPU time.

The following script, cpu.d, measures on-CPU times for all system calls on the system, and prints their average in nanoseconds,

#!/usr/sbin/dtrace -s

syscall:::entry
{
        self->vstart = vtimestamp;
}

syscall:::return
/self->vstart/
{
        this->oncpu = vtimestamp - self->vstart;
        @[probefunc] = avg(this->oncpu);
        self->vstart = 0;
}

Running this script,

# ./cpu.d
dtrace: script './cpu.d' matched 466 probes
^C 

  getuid                                                          113
  getgid                                                          117
  gtime                                                           117
  p_online                                                        132
  getpid                                                          158
  lwp_sigmask                                                     180
  lwp_self                                                        199
 [...output truncated...]
  getdents64                                                    18985
  stat64                                                        20867
  chmod                                                         23138
  open64                                                        33360
  getcwd                                                        42588
  creat64                                                       43347
  so_socket                                                     48093
  unlink                                                        69454
  forksys                                                       99463
  exece                                                        286992

System calls such as getpid() consume, on average, much less than a microsecond of CPU time. The syscall consuming the most CPU time was exece(), which may come as no suprise considering the code that needs to execute for this syscall.

File system related syscalls such as getdents64(), stat64(), open64() and creat64() are probably consuming this much CPU due to the process of seaching the file system caches (which can be confirmed with more DTracing).

[edit] See Also

Solaris Internals
Personal tools
The Books