DTrace Topics Hints Tips
From Siwiki
Contents |
[edit] DTrace Topics: Hints and Tips
This article is about DTrace hints and tips, and is part of the DTrace Topics collection. A general understanding of DTrace is assumed knowledge, which can be studied from the DTrace Topics: Intro section.
DTrace is a dynamic troubleshooting and analysis tool first introduced in the Solaris 10 and OpenSolaris operating systems.
Completion
Difficulty
Audience All DTrace users
[edit] Hints and Tips
This is a collection of various DTrace hints and tips. Apart from the syntax, the providers and the probes, there are various tricks for using DTrace in certain situations - know-how that is usually gained through experience. Hopefully this page will boost your DTrace-fu a few levels by documenting them.
[edit] Probes
[edit] Cool probes
The following is a short list of favourite probes:
| probe | description |
| syscall::: | trace every system call |
| proc:::exec-success | trace successful process creation |
| io:::start | trace physical I/O requests, while the responsible thread is still on-CPU |
| io:::done | trace physical I/O completion events |
| sysinfo:::readch | trace application reads with successful bytes as arg0 |
| sysinfo:::writech | trace application writes with successful bytes as arg0 |
| profile:::profile-1001 | sample all CPUs at 1001 Hertz |
| profile:::tick-1sec | fire on 1 CPU every second |
DTrace provides tens of thousands of probes.
[edit] syscall::open:entry
The probe syscall::open:entry matches the open() syscall but not the open64() syscall, which many applications use instead. If you meant to trace all open() activity, try syscall::open*:entry.
# dtrace -ln 'syscall::open*:entry' ID PROVIDER MODULE FUNCTION NAME 74318 syscall open entry 74706 syscall open64 entry
[edit] syscall:::entry
When a syscall probe fires, it doesn't mean that a syscall was successful. Beware!
syscall:::return provides the system call return value as arg0, and a value of -1 means that the system call failed (see the man page for each system call for details). For example,
Trace requested system calls:
# dtrace -n 'syscall:::entry { @[probefunc] = count(); }'
Trace successful system calls:
# dtrace -n 'syscall:::return /(int)arg0 >= 0/ { @[probefunc] = count(); }'
Some providers do the work for you; The proc:::exec-success fires when processes are created, which can be easier to use than syscall::exec*:return and checking arg0
[edit] profile-1001, profile-997?
When using DTrace to sample activity, the temptation is to use the probe profile:::profile-1000, which means sample on all CPUs at 1000 Hertz, so that your sample count roughly represents milli-seconds. Yet you will often see other frequencies used, such as profile:::profile-997 and profile:::profile-1001.
The danger of profile:::profile-1000 is that of sampling sychronization. On a multi-CPU server, DTrace sampling may by initiated on one CPU, while on another CPU a thread of an equal frequency is either running or not running. If it were running, sampling may over-count its activity; if it wasn't running, sampling may under-count.
For profile:::profile-1000, sampling synchronization can happen if another thread ran at a frequency of 1000 Hertz, but would also happen to an extent with 100 Hertz, 25 Hertz, 10 Hertz, 1 Hertz, etc. These are frequencies that are likely to be choosen by programmers.
By choosing profile-1001 or profile-997, we are hoping to reduce the chance of sampling sychronization, while keeping a sample count that roughly represents milli-seconds.
The system clock routine is sometimes cited as a source of sampling collision, as it runs at either 100 Hertz or 1000 Hertz. I've never seen true synchronization occur, but I have seen genunix:clock in profile-100 sampled stacks (I need to check the interaction between cyclic creation and clock PIL before confirming that this is possibile).
[edit] Code
[edit] Use self-> and this->
- self
- When measuring thread events from one probe to another, it is usually best to use Thread Local variables, which are prefixed with "self->". These are private to each thread, so we don't need to worry about conflicts from other threads or processes, and they also cause less CPU overhead than global variables.
- this
- "For variables that are only used within the one clause, Clause Local variables can be used, which are prefixed with "this->". They cause the least CPU overhead of all the variables.
These variable types are demonstrated by the following example program, bash.d, which measures the elapsed time for the read() syscall of any bash process running on the system,
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname == "bash"/
{
self->start = timestamp;
}
syscall::read:return
/self->start/
{
this->elapsed = timestamp - self->start;
printf("%d ms", this->elapsed / 1000000);
self->start = 0;
}
Running this script prints inter-keystroke times as I type in another window on the system,
# ./bash.d dtrace: script './bash.d' matched 2 probes CPU ID FUNCTION:NAME 0 74315 read:return 49 ms 0 74315 read:return 90 ms 0 74315 read:return 77 ms ^C
To clearly understand the role of self-> and this->, study the bash.d script and consider the following questions,
- Why did the syscall::read:return probe use a predicate of /self->start/ (which is true if that variable is non-zero)?
- Why didn't the syscall::read:return probe also match the execname (like with syscall::read:entry)?
- Why was the start variable a self-> variable?
- Why was the elapsed variable a this-> variable?
- Can we be sure that a syscall::read:entry is followed by a syscall::read:return for that thread? What if the thread called multiple read() syscalls at the same time? Recursive read() syscalls?
Answers are,
- When executing the script, it is possible to have missed the syscall::read:entry event, but not the syscall::read:return event. The elapsed measurement will be taking timestamp and subtracting self->start - which is zero - resulting in an impossibly large elapsed time (time since boot).
- If self->start is populated, then we can deduce that we have the correct execname - since self->start is only set for the correct execname, and was set in the same thread.
- The Thread Local self-> variable is used to match the correct syscall::read:entry to the correct syscall::read:return, even if multiple threads and processes are calling read() at the same time.
- The Clause Local this-> variable is used as it was a throw-away variable only used in that clause.
- Threads can only call one syscall at a time - so when using Thread Local variables it is easy to match entries to returns.
[edit] Usage
[edit] Post-sort
If your DTrace program prints traced events as they occur, the output may be shuffled on multi-CPU systems.
This is because events details are placed in per-CPU buffers, and then combined by the DTrace consumer (/usr/sbin/dtrace) and printed out. The DTrace consumer reads and prints the buffers one by one, it doesn't combine them and sort them.
To demonstrate this,
# dtrace -n 'profile:::profile-3hz { trace(timestamp); }'
dtrace: description 'profile-3hz ' matched 1 probe
CPU ID FUNCTION:NAME
0 41241 :profile-3hz 1898015274778547
0 41241 :profile-3hz 1898015608118262
0 41241 :profile-3hz 1898015941430060
1 41241 :profile-3hz 1898015275499014
1 41241 :profile-3hz 1898015609173485
1 41241 :profile-3hz 1898015942505828
2 41241 :profile-3hz 1898015275351257
2 41241 :profile-3hz 1898015609180861
2 41241 :profile-3hz 1898015942512708
3 41241 :profile-3hz 1898015274803528
3 41241 :profile-3hz 1898015608120522
3 41241 :profile-3hz 1898015941449884
^C
If you read the timestamps carefully, you'll see that they aren't quite in chronological order. If you look at the CPU column while reading the timestamps, the way DTrace works should become clear.
One solution is to print a timestamp column and to to post sort the output using a tool such as /usr/bin/sort. The switchrate tunable can also be increased to diminish the problem, but not eliminate it.
[edit] getopts
If you want to write a DTrace script that supports command line options and arguments, like other CLI tools, you run into a small problem - DTrace doesn't have getopts() (yet anyway).
DTrace does support,
- numbers as arguments (which are $1, $2, .. in the script)
- strings as arguments (which are $$1, $$2, .. in the script)
- optional arguments when using #pragma D option defaultargs (which means undefined numbers are zero, undefined strings are NULL)
Command line options more advanced than these are not possible easily.
The current solution is to wrap your DTrace script in either shell or Perl. Done carefully and neatly this isn't as bad as it sounds, and can be a sensible union of two different tools. There are many examples of this in the DTraceToolkit, such as fddist.
The following trimmed example is from fddist,
#!/usr/bin/sh
#
[...trimmed...]
##############################
# --- Process Arguments ---
#
### Default variables
opt_read=0; opt_write=0
### Process options
while getopts hrw name
do
case $name in
r) opt_read=1 ;;
w) opt_write=1 ;;
h|?) echo >&2 USAGE: fddist [-r|-w]
exit 1
esac
done
shift `expr $OPTIND - 1`
### Option logic
if [ $opt_read -eq 0 -a $opt_write -eq 0 ]; then
opt_read=1; opt_write=1
fi
#################################
# --- Main Program, DTrace ---
#
/usr/sbin/dtrace -n '
#pragma D option quiet
inline int OPT_read = '$opt_read';
inline int OPT_write = '$opt_write';
inline int FDMAX = 255;
/* print header */
dtrace:::BEGIN
{
printf("Tracing ");
OPT_read && OPT_write ? printf("reads and writes") : 1;
OPT_read && ! OPT_write ? printf("reads") : 1;
! OPT_read && OPT_write ? printf("writes") : 1;
printf("... Hit Ctrl-C to end.\n");
}
/* sample reads */
syscall::*read*:entry
/OPT_read/
{
@Count[execname, pid] = lquantize(arg0, 0, FDMAX, 1);
}
[...trimmed...]
'
There are three general sections to the above script,
- Shell option processing
- Shell to DTrace option passing (highlighted in bold)
- DTrace script in quotes
It is possible to write a mixed DTrace and shell/Perl program with many more than three sections, although the code often becomes quite messy. If possible, stick to this three part approach: shell, shell->DTrace, DTrace.
[edit] Techniques
[edit] User-Level Function Flow
Your function either does either A, B or C, and you want to know which happened. DTrace can trace that the function began and completed, but you want to know what happened inside - the flow of the function.
Depending on the language of the application, some or all of the following techniques will be available for determining function flow.
- Tracing other functions that may be called by your target function
- Examination of function entry arguments
- Examination of function return value and return offset
- The pid provider's instruction tracing
- Examination of uregs to check local variables
- Adding USDT probes to the function code and recompiling
