DTrace Topics Dos Donts
 DTrace Topics: Dos and Don'ts
This article is about DTrace Dos and Don'ts, and is part of the DTrace Topics collection. A general understanding of DTrace is assumed knowledge, which can be studied from the DTrace Topics: Intro section.
DTrace is a dynamic troubleshooting and analysis tool first introduced in the Solaris 10 and OpenSolaris operating systems. These Dos and Don'ts are recommendations for programming in DTrace.
 Dos and Don'ts
By studying the syntax someone can learn a language. But through practice and experience, someone can master a language. Mastering a language goes beyond the formal syntax - it includes knowing recommended usage, how best to solve problems and how to avoid common mistakes. To describe it simply, Mastering a language includes knowing the do's and don'ts.
Much of the following DTrace do's and don'ts are opinion and recommendations. They are guides for programming but not rules for programming. Many of these have been learnt the hard way, and should save you time by reading them here.
- Do study the field you are about to DTrace.
- Understanding existing tools will give you ideas for what to DTrace, and what doesn't need to be DTraced. If you were about to DTrace disk activity, studying the iostat man page will suggest many metrics of interest, which you could then fetch and analyse further using DTrace.
- Do practice DTrace.
- Becoming good at DTrace requires thinking in certain ways so that you know how to approach and solve problems.
- Don't assume that the output of your DTrace script is correct without careful testing.
- DTrace makes it easy to print numbers, numbers that seem to correspond to the workload. Don't jump the gun and assume that these numbers are correct without testing - perhaps by generating a known workload and checking if the output matches.
- Do use proc rather than syscall for process creating/destruction events.
- Do consider using sysinfo rather than syscall for read/write events.
- sysinfo:::readch and sysinfo:::writech provide successful bytes as arg0, which is easier than processing the syscalls (especially readv/writev), although they don't give file descriptors.
- Do consider using profile to sample events.
- While DTrace has the power to trace events (providing accurate event based timestamps) the overheads from DTrace can become noticable when these events begin happen very frequently (much over 1000/second). The profile provider has a fixed sampling rate capped at 4999 Hertz, meaning both low and fixed overheads.
- Don't use fbt probes if possible.
- The fbt provider is an unstable interface. Try the other stable providers first. If you must use fbt, be aware that probes and arguments can change between minor releases of Solaris.
- Don't match all probes from pid.
- Depending on the application, this will match millions of probes (and DTrace can run out of memory and abort). At least filter to pid:::entry, pid:::return to not trace every instruction, and try filtering further by matching on library name and function name.
- Do use profile:::profile-* probes to sample data per interval.
- Do use profile:::tick-* probes to print output per interval.
- Don't use probe name shortcuts without checking potential matches, eg read:entry instead of syscall::read:entry.
- Using read:entry to match syscalls could match other probes by mistacke. Eg,
dtrace -ln 'read:entry' ID PROVIDER MODULE FUNCTION NAME 744 lx-syscall read entry 11722 fbt genunix read entry 58249 pid117867 libc.so.1 read entry 79901 syscall read entry
Here the fbt, lx-syscall and the pid providers all had a read:entry probe. If you meant syscall, match using the full name syscall::read:entry.
- Do use self-> variables to associate data to a thread.
- Do use this-> variables for fast temporary calculations.
- Do try to use aggregates instead of global counters.
- Eg, @number = count(); causes less overhead than number++;, due to the per-CPU nature of aggregates not needing global locks.
- Do clear self-> variables by setting them to zero after use.
- Do use `_pagesize instead of `pagesize.
- Don't use global variables if possible.
- Global variables cause overhead on multi-CPU servers, where self-> variables and aggregates don't (and are often more appropriate anyway).
- Don't use this-> variables for moving data between different clauses.
- Do check for warning messages during execution.
- DTrace will print warnings on STDERR about dropped events, buffer overflows, etc.
- Don't assume that output on multi-CPU servers is in correct time order.
- Due to the way DTrace gathers data from per-CPU buffers, the output can be shuffled. If this becomes a problem, print the timestamp variable and post sort.
- Do allow other users privilege to run DTrace via privileges (see DTrace Topics: Intro).
- Don't give users dtrace_kernel if possible.
- The dtrace_kernel privilege allows a user to read kernel memory, which may include plaintext passwords and other confidential information. There are situations where this could be used for privilege escalation to root.