DTrace Topics Strategy
From Siwiki
Contents |
[edit] DTrace Topics: Strategy
This article is about DTrace strategies, and is part of the DTrace Topics collection. A general understanding of DTrace is assumed knowledge, which can be studied from the DTrace Topics: Intro section.
DTrace is a dynamic troubleshooting and analysis tool first introduced in the Solaris 10 and OpenSolaris operating systems.
Completion
Difficulty
Audience All DTrace users
[edit] Strategies
This page documents various DTrace strategies and techniques for approaching problems, which may be especially helpful if you are stuck on a problem and don't know where to start.
[edit] Start Simple
If you have a complex application and don't know where to start DTracing, it may help to write simple versions of the application and practise DTracing them first.
For example, imagine you have a complex application that appears to experience latency when writing to its log files. It may help to write a dummy log file writer - a simple program that writes to a file at a particular rate, and practise DTracing that first. Once you become comfortable DTracing known and simple programs (you know what to look for and how to do it), you can then apply those skills to DTracing the complex application.
Also consider writing simple programs with faults, and see how easy it is to identify these known faults using DTrace. The DTrace Topics: Exercises page has sample programs for you to practise DTracing, and questions to answer.
- "But I already have a faulty program to DTrace - the application!"
I know, but your application may have complexities that make DTracing it very challenging. If you have spent a few days with DTrace and don't seem to be getting anywhere, try the simple program approach.
There is another reason to write these too - imagine you DTrace your application and find a bug in a library, the kernel, or a device driver; if you raise this with the appropriate vendor, they may blame other complexities in your application or environment. If instead you send them a 20 line C program that exhibits the same bug, not only does it eliminate other factors, but it lets them compile the program and see the bug for themselves. Immediately.
[edit] Known Counts
If you are trying to find probes of interest from the thousands that are available, it can help to create a known number of events and then frequency count the available probes to see what appears related.
For example, lets say I am interested in tracing ZFS directory creation from within the kernel. I'll use the fbt provider, but even narrowing this to just the ZFS module leaves many to pick from,
# dtrace -ln 'fbt:zfs::entry' | wc -l
1300
How do I know which to use from the 1299 available probes? (1300 included the header line).
Assuming that no one else is creating directories on this system, I ran the following one-liner to frequency count all ZFS probe events while creating 23 directories in another window (by running mkdir 23 times). 23 is used as it is a reasonably small prime number, and less likely to clash with other regular events. I study the output for events that occured 23 times, or a multiple of 23.
# dtrace -n 'fbt:zfs::entry { @[probefunc] = count(); }'
dtrace: description 'fbt:zfs::entry ' matched 1299 probes
^C
arc_read_done 1
dnode_buf_pageout 1
vdev_cache_allocate 1
vdev_cache_evict 1
vdev_cache_fill 1
vdev_cache_hit 1
vdev_dtl_contains 1
vdev_mirror_child_select 1
zap_cursor_init 1
zio_checksum_error 1
[...output truncated...]
space_map_vacate 20
vdev_space_update 20
mzap_create_impl 23
zap_create 23
zfs_mkdir 23
vdev_label_offset 24
vdev_label_write 24
[...output truncated...]
zfs_ioc_pool_configs 42
zfs_secpolicy_none 42
bplist_enqueue 46
bplist_enqueue_deferred 46
dbuf_create_bonus 47
dmu_object_alloc 47
[...output truncated...]
zfs_getattr 4737
dmu_object_size_from_db 4785
zfs_ace_access 5122
vdev_cache_offset_compare 5307
vdev_lookup_top 6134
zio_next_stage 8135
space_map_seg_compare 28385
The output was many pages long and has been truncated to fit here. From the 1299 available probes, only 3 occured 23 times - mzap_create_impl, zap_create and zfs_mkdir. zfs_mkdir sounds like the probe I was looking for.
[edit] Elapsed Times
Elapsed time is the time from one event to another. This may include time that a thread spent off-CPU due to I/O or scheduling, and is a useful metric for understanding response latency. Use the timestamp variable to measure elapsed times.
The following script, bash.d, measures the elapsed time for the read() syscall of any bash process running on the system,
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname == "bash"/
{
self->start = timestamp;
}
syscall::read:return
/self->start/
{
this->elapsed = timestamp - self->start;
printf("%d ms", this->elapsed / 1000000);
self->start = 0;
}
Running this script prints inter-keystroke times as I type in another window on the system,
# ./bash.d dtrace: script './bash.d' matched 2 probes CPU ID FUNCTION:NAME 0 74315 read:return 49 ms 0 74315 read:return 90 ms 0 74315 read:return 77 ms 0 74315 read:return 139 ms 0 74315 read:return 126 ms ^C
Great stuff.
[edit] On-CPU Times
On CPU time is the time a thread spent on-CPU from one event to another. This excludes time the thread was off-CPU, such as due to I/O or scheduling. This is a useful metric for understanding CPU load, and identifying what is consuming CPU. Use the vtimestamp variable for measuring on-CPU time.
The following script, cpu.d, measures on-CPU times for all system calls on the system, and prints their average in nanoseconds,
#!/usr/sbin/dtrace -s
syscall:::entry
{
self->vstart = vtimestamp;
}
syscall:::return
/self->vstart/
{
this->oncpu = vtimestamp - self->vstart;
@[probefunc] = avg(this->oncpu);
self->vstart = 0;
}
Running this script,
# ./cpu.d dtrace: script './cpu.d' matched 466 probes ^C getuid 113 getgid 117 gtime 117 p_online 132 getpid 158 lwp_sigmask 180 lwp_self 199 [...output truncated...] getdents64 18985 stat64 20867 chmod 23138 open64 33360 getcwd 42588 creat64 43347 so_socket 48093 unlink 69454 forksys 99463 exece 286992
System calls such as getpid() consume, on average, much less than a microsecond of CPU time. The syscall consuming the most CPU time was exece(), which may come as no suprise considering the code that needs to execute for this syscall.
File system related syscalls such as getdents64(), stat64(), open64() and creat64() are probably consuming this much CPU due to the process of seaching the file system caches (which can be confirmed with more DTracing).
