DTrace Topics Intro
From Siwiki
[edit] DTrace Topics: Introduction
This section is an introduction to DTrace, and is part of the DTrace Topics collection.
Completion
Difficulty
Audience Everyone
[edit] What is DTrace
DTrace is a dynamic troubleshooting and analysis tool first introduced in the Solaris 10 and OpenSolaris operating systems. DTrace is many things, in particular,
- A tool
- A programming language interpreter
- An instrumentation framework
DTrace provides observability across the entire software stack from one tool. This allows you to examine software execution like never before.
In this section we will provide examples soon and often, as they provide something real to visualise as we learn the background behind DTrace.
[edit] Example #1
Tracing new processes system-wide can be achieved with the following DTrace one-liner,
# dtrace -n 'syscall::exece:return { trace(execname); }'
dtrace: description 'syscall::exece:return ' matched 1 probe
CPU ID FUNCTION:NAME
0 76044 exece:return man
0 76044 exece:return sh
0 76044 exece:return neqn
0 76044 exece:return tbl
0 76044 exece:return nroff
0 76044 exece:return col
0 76044 exece:return sh
0 76044 exece:return mv
0 76044 exece:return sh
0 76044 exece:return more
The above command printed a line of output as each new process was created, along with the name of the executable. Here we captured the inner working of the man command by tracing the system call layer.
Just this one-liner is very useful; previous techniques to examine process execution had certain drawbacks, and none behaved like a one-off troubleshooting tool. The syntax will be discussed later in this section.
Tracing system calls is the tip of a very large iceberg. DTrace can examine the entire software stack.
[edit] The Software Stack
Consider how the following layers could be examined before DTrace. Examples for what the layers mean have been listed:
| Layer | Layer Examples | Previous Analysis | DTrace Visibility |
| Dynamic Languages | Java, JavaScript, ... | debuggers | Yes, with providers |
| User Executable | /usr/bin/*, compiled C code | truss -ua.out | Yes |
| Libaries | /usr/lib/*, compiled C code | apptrace, sotruss, truss | Yes |
| Syscall Interface | man -s2, read(), write(), fork(), ... | truss | Yes |
| Kernel | VFS, DNLC, ZFS, TCP, IP, ..., sd, st, hme, ... | prex; tnf*, lockstat, mdb, adb | Yes |
| Hardware | disk data controller | kstat, cpustat, cputrack, busstat, guesswork | No. Indirectly, yes |
It was already possible to measure most of these layers, but it was at times difficult to use or even find these tools. The truss command is in common usage, however not many people found "truss -u" from the man page. Some of these tools were rarely found by customers at all, such as prex, tnfdump/tnfxtract and busstat. Even if one were to learn all these tools well, gathering their output into one place for cross-analysis adds further challenges.
DTrace is all seeing; from one tool we can analyse the entire software stack.
[edit] What DTrace is Like
DTrace has the combined capabilities of numerous previous tools and more:
| Tool | Capability |
| truss -ua.out | tracing user functions |
| apptrace | tracing library calls |
| truss | tracing system calls |
| prex; tnf* | tracing some kernel functions |
| lockstat | profiling the kernel |
| mdb -k | accessing kernel VM |
| mdb -p | accessing process VM |
DTrace also provides a programming language similar to C and awk.
[edit] Syscall Example: truss
The following is an example of using truss to examine the syscalls performed by the date command:
$ truss date
execve("/usr/bin/date", 0x08047C9C, 0x08047CA4) argc = 1
resolvepath("/usr/lib/ld.so.1", "/lib/ld.so.1", 1023) = 12
resolvepath("/usr/bin/date", "/usr/bin/date", 1023) = 13
xstat(2, "/usr/bin/date", 0x08047A58) = 0
open("/var/ld/ld.config", O_RDONLY) = 3
fxstat(2, 3, 0x08047988) = 0
mmap(0x00000000, 152, PROT_READ, MAP_SHARED, 3, 0) = 0xFEFB0000
close(3) = 0
mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1
sysconfig(_CONFIG_PAGESIZE) = 4096
[...]
truss has been an invaluable tool for providing syscall tracing, which is often representive of process behaviour (but not always). However there were a few drawbacks with truss, in particular:
- truss only examines 1 process at a time
- Output is limited to the options provided by truss
- truss slows down the target
There are many options to truss that provide some excellent forms of output, and more can be added. However there was no way around single process tracing and slowing down the target.
[edit] Syscall Example: DTrace
The following example shows syscall tracing using DTrace:
# dtrace -n 'syscall:::entry { printf("%16s %x %x", execname, arg0, arg1); }'
dtrace: description 'syscall:::entry ' matched 233 probes
CPU ID FUNCTION:NAME
1 75943 read:entry Xorg f 8047130
1 76211 setitimer:entry Xorg 0 8047610
1 76143 writev:entry Xorg 22 80477f8
1 76255 pollsys:entry Xorg 8046da0 1a
1 75943 read:entry Xorg 22 85121b0
1 76035 ioctl:entry soffice.bin 6 5301
1 76035 ioctl:entry soffice.bin 6 5301
1 76255 pollsys:entry soffice.bin 8047530 2
[...]
Some advangates with DTrace:
- Every process, or specific processes, can be traced simultaneously.
- The output is completely customizable.
- Minimum performance cost.
[edit] What is DTrace for
- Troubleshooting software bugs
- Proving what the problem is, and isn't.
- Measuring the magnitude of the problem.
- Detailed observability
- Observing devices, such as disk or network activity.
- Observing applications, whether they are from Solaris, 3rd party, or in-house.
- Capturing profiling data for performance analysis
- If there is latency somewhere, DTrace can find it
[edit] What isn't DTrace
- DTrace isn't a replacement for kstat or SMNP
- kstat already provides inexpensive long term monitoring.
- DTrace isn't sentient, it needs to borrow your brain to do the thinking
- DTrace isn't “dTrace”
[edit] Who is DTrace for
[edit] Application Developers
- Fetch in-flight profiling data without restarting the apps, even on customer production servers.
- Detailed visibility of all the functions that they wrote, and the rest of the software stack.
- Add static probes as a stable debug interface.
[edit] Application Support
- Provides a comprehensive insight into application behavior.
- Analyze faults and root-cause performance issues.
- Prove where issues are, and measure their magnitude.
[edit] System Administrators
- Troubleshoot, analyze, investigate where never before.
- See more of your system - fills in many observability gaps.
[edit] Database Administrators
- Analyze throughput performance issues across all system components.
[edit] Security Administrators
- Customized short-term auditing
- Malware deciphering
[edit] Kernel Engineers
- Fetch kernel trace data from almost every function.
- Function arguments are auto-casted providing access to all struct members.
- Fetch nanosecond timestamps for function execution.
- Troubleshoot device drivers, including during boot.
- Add statically defined trace points for debugging.
[edit] How to use DTrace
DTrace can be used by either,
- Running prewritten one-liners and scripts
- DTrace one-liners are easy to use and often useful
- The DTraceToolkit contains over 100 scripts ready to run
- Writing your own one-liners and scripts
- Encouraged - the possibilities are endless
- It helps to know C
- It can help to know operating system fundamentals
[edit] DTrace Wins
- Finding unnecessary work
- Having deep visibility often finds work being performed that isn't needed. Eliminating these can produce the biggest DTrace wins – 2x, 20x, etc.
- Solving performance issues
- Being able to measure where the latencies are, and show what their costs are. These can produce typical performance wins – 5%, 10%, etc.
- Finding bugs
- Many bugs are found though static debug frameworks; DTrace is a dynamic framework that allows custom and comprehensive debug info to be fetched when needed.
- Proving performance issues
- Many valuable DTrace wins have no immediate percent improvement, they are about gathering evidence to prove the existence and magnitude of issues.
[edit] Example Scenario
It can be valuable to illustrate scenarios where DTrace can be of use in the real world.
The following examples are fictional and any resembelence to real people or companies is purely accidental!
[edit] Measuring Fault Magnitude
In this example, a customer has a performance issue on their system,
The customer asks their software vendors for help.
[edit] Without DTrace
The vendors suggest numerous different theories, everything from patch levels to firmware bugs.
No one can identify the issue, and the vendors appear at times to be blaming one another,
The customer is asked to perform numerous expirements to attempt to identify the problem. This results in either,
- finding the problem - a happy ending.
- finding a problem - this may be the problem, or a symptom of the problem, or another issue altogether.
- finding nothing conclusive.
For example, the customer uses several performance analysis tools and finds,
- the CPUs are usually 100% busy
- the network interfaces are pushing around 50,000 packets per second.
The customer suspects that the performance issue is caused by the CPUs being busy, and that the CPUs being busy is caused by the the CPU cost of processing so many packets. They snoop the network packets and find that most are repeated hostname lookups, and guess that these were caused by a bug in the application. They then ask the application vendor to fix the issue,
The application vendor isn't convinced that the customer has found the issue, and can't cost-justify the time to fix this without proof. The situation is a stalemate.
[edit] With DTrace
Performance issues can be complex due to the numerous interactions between layers of the software stack. Previous debugging tools could only observe portions of that stack - making it difficult or sometimes impossible to analyse software interaction. DTrace can examine the entire software stack, and can also measure the nanosecond time cost of software events. So rather than suggesting where latencies may be due to statistic counters, DTrace can prove where latencies are using nanosecond timestamps, no matter where they are.
For our example scenario, the customer uses DTrace to measure:
- Application latency caused by threads waiting on CPU run queues (saturation), and waiting for system calls to service (utilization).
- They can now prove that CPU load is indeed the problem.
- Time spent servicing network interrupts.
- They then can prove that the CPU load is caused by network packets.
- Application stack traces when network I/O occurs.
- They prove that the network packets causing the issue are caused by the application.
After this analysis, the customer can prove that they found the issue, and with DTrace they are able to measure its magnitude in terms of nanoseconds or ratio of transaction time wasted,
The application vendor is happy that the customer has measured the problem, as it helps them cost-justify the programmer's time required to fix the issue.
Continuing with this scenario... The application vendor can't simulate the problem - it only happens on the customer's site with a production load. So they then use DTrace on the customer's application (without restarting it) to quickly identify why they have caused so many hostname lookups, and how they are being called.
They find that their application calls the standard OS network libraries in the standard way, which were supposed to cache hostnames. A test program proves that there is a bug is in the OS library,
The OS vendor has also never seen this issue in testing, and uses DTrace on live production environments to see it for themselves.
The bug is fixed, and everyone is happy.
[edit] Answers to initial questions
- DTrace is not available for Solaris 9.
- You need to be root, or have the correct privileges, to run /usr/sbin/dtrace.
- There is a GUI called chime.
- DTrace is safe for production use, provided you don't deliberately try to cause harm.
- DTrace has low impact when in use, and zero impact when not.
[edit] Terminology
Learning terminology can be the easy part. For DTrace there aren't too many terms to learn, and stepping through them will cover many important DTrace topics.
# dtrace -n 'syscall::exece:return { trace(execname); }'
dtrace: description 'syscall::exece:return ' matched 1 probe
CPU ID FUNCTION:NAME
0 76044 exece:return man
0 76044 exece:return sh
0 76044 exece:return neqn
[...]
The above output contained,
- consumer: "dtrace"
- The /usr/sbin/dtrace command is a consumer of the DTrace framework (via libdtrace(3LIB)).
- probe: "syscall::exece:return"
- This describes what points in software to instrument and trace.
- action: "{ trace(execname); }"
- The action is what happens when an instrumentation point is hit. This action prints output.
[edit] Consumer
libdtrace(3LIB) is the library that is the interface between userland tools such as /usr/sbin/dtrace and the kernel DTrace framework. The following consumers of libdtrace(3LIB) all live under /usr/sbin,
- dtrace
- command line and scripting interface
- lockstat
- kernel lock statistics
- plockstat
- user-level lock statistics
- intrstat
- run-time interrupt statistics
libdtrace is currently a private interface and not to be used directly (nor is there any great reason to); the supported interface is dtrace(1M). NOTE: You are still encouraged to use libkstat(3LIB) and proc(4) directly, rather than wrapping /usr/bin consumers.
[edit] Privileges
Only the root user and users who have been granted DTrace privileges can use DTrace. Solaris provides different DTrace privileges for different tasks, which can be assigned to developers and users as appropriate. These privileges are from the Solaris 10 "Least Privilege" feature. See the Guide for more info.
[edit] Probes
Data is generated from instrumentation points called “probes”. DTrace provides thousands of probes. Probe examples:
| Probe Name | Description |
| syscall::read:entry | A read() syscall began |
| proc:::exec-success | A process created successfully |
| io:::start | An I/O was issued (disk/vol/NFS) |
| io:::done | An I/O completed |
The component names will be discussed in the following sections.
[edit] Listing Probes
dtrace -l lists all currently available probes that you have privilege to see, with one probe per line:
# dtrace -l | more
ID PROVIDER MODULE FUNCTION NAME
1 dtrace BEGIN
2 dtrace END
3 dtrace ERROR
4 sched FX fx_yield schedctl-yield
5 sched FX fx_change_priority change-pri
6 sched FX fx_yield change-pri
7 sched FX fx_wakeup change-pri
8 sched FX fx_preempt change-pri
9 sched FX fx_tick schedctl-nopreempt
10 sched FX fx_preempt schedctl-nopreempt
11 sched FX fx_preempt schedctl-preempt
12 fbt FX fx_init entry
[...]
# dtrace -l | wc -l
69880
Here the root user sees 69,879 available probes. The probe count changes – it is dynamic (DTrace).
[edit] Probe Names
Probe names are a four-tuple: provider:module:function:name. Eg, fbt:zfs:arc_read:entry.
- Provider
- A library of related probes.
- Module
- The module the function belongs to, either a kernel module or user segment.
- Function
- The function name that contains the probe.
- Name
- The name of the probe.
[edit] Tracing Probes
dtrace -n takes a probe name and enables tracing:
# dtrace -n syscall::exece:return dtrace: description 'syscall::exece:return' matched 1 probe CPU ID FUNCTION:NAME 0 76044 exece:return 0 76044 exece:return 0 76044 exece:return 0 76044 exece:return ^C
The default output contains,
- CPU: CPU id that event occured on (if this changes, the output may be shuffled)
- ID: DTrace probe id
- FUNCTION:NAME: Part of the probe name
[edit] Actions
When a probe fires, an action executes. The default action is to print the output seen privously using dtrace -n. Actions are written in the D programming language.
Actions can,
- print output
- save data to variables, and perform calculations
- walk kernel or process memory
With destructive actions allowed (using dtrace -w), actions can,
- raise signals on processes
- execute shell commands
- write to some areas of memory
[edit] Command Line Usage
The syntax for using actions at the command line is,
# dtrace -n 'probename { action }'
Multiple probes with actions can be used,
# dtrace -n 'probename0 { action0 } probename1 { action1 } probename2 { action2 }'
Although this starts to become clumsy at the command line, and is better written as a DTrace script.
[edit] trace() example
# dtrace -n 'syscall::exece:return { trace(execname); }'
dtrace: description 'syscall::exece:return ' matched 1 probe
CPU ID FUNCTION:NAME
0 76044 exece:return man
0 76044 exece:return sh
0 76044 exece:return neqn
0 76044 exece:return tbl
0 76044 exece:return nroff
0 76044 exece:return col
[...]
The trace() action accepts one argument and prints it when the probe fired.
[edit] printf() example
# dtrace -n 'syscall::exece:return { printf("%6d %s\n", pid, execname); }'
dtrace: description 'syscall::exece:return ' matched 1 probe
CPU ID FUNCTION:NAME
0 74415 exece:return 4301 sh
0 74415 exece:return 4304 neqn
0 74415 exece:return 4305 nroff
0 74415 exece:return 4306 sh
0 74415 exece:return 4308 sh
[...]
DTrace ships with a powerful <tt>printf(), to print formatted output.
[edit] Variables
[edit] Default Variables
Numerous predefined variables are provided for use, including,
| pid | Process ID |
| tid | Thread ID |
| timestamp | Nanosecond timestamp since boot |
| execname | Process name |
| arg0, ... | Function arguments and return value |
| errno | Last syscall failure error code |
| probefunc | Probe function name (3rd field) |
| probename | Probe name (4th field) |
| curpsinfo | Struct contating current process info, eg, curpsinfo->pr_psargs – process + args |
Pointers and structs! DTrace can walk memory using C syntax, and has kernel types predefined.
[edit] curthread
| curthread | Pointer to current kthread_t |
From here you can walk kernel memory and answer endless questions about OS internals.
Eg, the current process user_t is curthread->t_procp->p_user. Now, you might not ever need to use curthread, but it is good to know that you can. (curthread is like the down staircase in nethack, angband, moria, ... you can play for years and still not reach level 50).
[edit] Variable Types
DTrace supports the following variable types for use in one-liners and scripts,
- Integers
- Structs
- Pointers
- Strings
- Associative arrays
- Aggregates
Including types from /usr/include/sys, eg uint32_t.
[edit] Aggregations
A great feature of DTrace is to process data as it is captured, such as using aggregations.
For example, the following one-liner frequency counts syscalls using an aggregation,
# dtrace -n 'syscall:::entry { @num[probefunc] = count(); }'
dtrace: description 'syscall:::entry ' matched 233 probes
^C
close 1
fcntl 1
[...]
lwp_sigmask 32
setcontext 32
setitimer 60
p_online 64
portfs 132
writev 170
write 257
read 896
pollsys 959
ioctl 1253
@num is the aggregation variable, probefunc is the key, and count() is the aggregating function.
[edit] Aggregating Functions
These include,
| count() | count events, useful for frequency counts |
| sum(value) | sum the value |
| avg(value) | average the value |
| min(value) | find the value minimum |
| max(value) | find the value maximum |
| quantize(value) | generate power-2 distribution plots of the value |
[edit] Quantize
Very cool function, here we quantize write sizes:
# dtrace -n 'sysinfo:::writech { @dist[execname] = quantize(arg0); }'
dtrace: description 'sysinfo:::writech ' matched 4 probes
^C
[...]
ls
value ------------- Distribution ------------- count
4 | 0
8 | 2
16 | 0
32 |@@@@@@@@@@@@@@@@@@@ 118
64 |@@@@@@@@@@@@@@@@@@@@@ 127
128 | 0
[...]
Here we see that ls processes usually write between 32 and 127 bytes. Makes sense?
# ls -l /etc total 793 lrwxrwxrwx 1 root root 12 Mar 21 03:28 TIMEZONE -> default/init drwxr-xr-x 4 root sys 6 Apr 16 06:59 X11 drwxr-xr-x 2 adm adm 3 Mar 20 09:25 acct drwxr-xr-x 3 root root 3 Apr 16 23:11 ak lrwxrwxrwx 1 root root 12 Mar 21 03:28 aliases -> mail/aliases drwxr-xr-x 5 root sys 5 Feb 20 23:29 amd64 drwxr-xr-x 7 root bin 18 Mar 20 09:20 apache drwxr-xr-x 4 root bin 7 Feb 20 23:12 apache2 drwxr-xr-x 2 root sys 5 Feb 20 23:27 apoc -rw-r--r-- 1 root bin 1012 Mar 20 09:33 auto_home -rw-r--r-- 1 root bin 1066 Mar 20 09:33 auto_master lrwxrwxrwx 1 root root 16 Mar 21 03:28 autopush -> ../sbin/autopu [...]
ls writes one line at a time, each around 80 chars long. The distribution plot was spot on.
[edit] Predicates
DTrace predicates are used to filter probes, so that the action fires when a conditional is true.
probename /predicate/ { action }
Eg, syscalls for processes called “bash”,
# dtrace -n 'syscall:::entry /execname == "bash"/ { @num[probefunc] = count(); }'
dtrace: description 'syscall:::entry ' matched 233 probes
^C
exece 2
forksys 2
lwp_self 2
schedctl 2
waitsys 2
getpid 4
gtime 4
setpgrp 12
ioctl 28
read 29
write 31
lwp_sigmask 42
sigaction 62
[edit] Scripting
If your DTrace one-liners get a bit too long, it may be easier to write them as scripts. A common form of DTrace script (or D script) is one that begins with the line "#!/usr/sbin/dtrace -s" and is an executable file.
The following script measures syscalls for processes called "bash",
bash-syscalls.d
#!/usr/sbin/dtrace -s
syscall:::entry
/execname == "bash"/
{
@num[probefunc] = count();
}
Make the file executable,
# chmod 755 bash-syscalls.d
And it can be run like a regular command,
# ./bash-syscalls.d dtrace: script './bash-syscalls.d' matched 233 probes ^C rexit 2 exece 3 forksys 4 lwp_self 4 schedctl 4 waitsys 4 getpid 8 gtime 9 read 13 stat64 18 write 19 setpgrp 24 ioctl 56 lwp_sigmask 84 sigaction 124
[edit] Usage Features
The following are some of the usage features that make DTrace invaluable.
[edit] Measuring Time
Access to high resolution timestamps is of particular use for performance analysis.
- timestamp - time since boot in nanoseconds
- vtimestamp - thread on-CPU timestamp
Measuring these for application and operating system function calls will answer,
- timestamp - where is the latency?
- vtimestamp - why are the CPUs busy?
[edit] Printing Stacks
Printing user and kernel stack traces explains both why and the how something happened.
For example, why is bash calling read()? Using ustack() action,
# dtrace -n 'syscall::read:entry /execname == "bash"/ { ustack(); }'
dtrace: description 'syscall::read:entry ' matched 1 probe
CPU ID FUNCTION:NAME
0 74314 read:entry
libc.so.1`_read+0x7
bash`rl_getc+0x22
bash`rl_read_key+0xad
bash`readline_internal_char+0x5f
bash`0x80b1171
bash`0x80b118c
bash`readline+0x3a
bash`0x806775b
bash`0x80676c4
bash`0x8067f40
bash`0x8068927
bash`0x8068492
bash`yyparse+0x155
bash`parse_command+0x59
bash`read_command+0xd8
bash`reader_loop+0x10f
bash`main+0x647
bash`_start+0x7a
[...]
[edit] Sampling
DTrace isn't just about tracing events, DTrace can also sample at customized rates.
Eg, sampling 5-level user stack traces from Xorg,
# dtrace -n 'profile-1001 /execname == "Xorg"/ { @[ustack(5)] = count(); }'
dtrace: description 'profile-1001 ' matched 1 probe
^C
Xorg`Xalloc+0x14
Xorg`AllocatePixmap+0x35
libfb.so`fbCreatePixmap+0xb8
nvidia_drv.so`_nv000398X+0x8f
1
[...]
libfb.so`fbSolid+0x2c6
libfb.so`fbFill+0xb8
libfb.so`fbPolyFillRect+0x1d5
nvidia_drv.so`0xfe09e87b
Xorg`miColorRects+0x124
41
nvidia_drv.so`_nv000592X+0x3d
0x1016c00
87
Looks like the nvidia driver was on-CPU 87 times as we sampled at 1001 Hertz.
[edit] See Also
- DTrace Topics: Guide - to learn more about how to use DTrace
- DTrace Topics - other topics as appropriate
- DTrace Home - opensolaris.org community page
- DTrace Topics: Intro - PDF presentation of this page






