DTrace Topics Features

From Siwiki

Jump to: navigation, search

Contents

[edit] DTrace Topics: Features

This page is a list of DTrace features, and is part of the DTrace Topics collection. A general understanding of DTrace is assumed knowledge, which can be studied from the DTrace Topics: Intro section. DTrace is a dynamic troubleshooting and analysis tool first introduced in the Solaris 10 and OpenSolaris operating systems.

Completion Image:trafficlight_green02.png
Difficulty Image:coffeemug01.png Image:coffeemug01.png
Audience Everyone

The aim of this page is to list features and to provide a short example or screenshot for each. (I've always understood things more quickly if I can see actual screenshots).

For further information and examples for each feature, refer to the other sections in DTrace Topics.

[edit] Solaris DTrace Features

The following list and briefly demonstrate each feature of Solaris DTrace. Some of these features require the latest Solaris versions, and/or additional providers to be downloaded and used.

[edit] Process

[edit] dynamic process tracing

DTrace can trace almost every compiled function entry and return by processes. This includes library segments and the process text segment.

Here we trace libc's strlen() function from PID 749297,

# dtrace -n 'pid$target:libc:strlen:entry' -p 749297
dtrace: description 'pid$target:libc:strlen:entry' matched 1 probe   
CPU     ID                    FUNCTION:NAME
  0  69517                     strlen:entry 
  0  69517                     strlen:entry 
  0  69517                     strlen:entry 
  0  69517                     strlen:entry 
  0  69517                     strlen:entry 
  0  69517                     strlen:entry 
  0  69517                     strlen:entry 
  0  69517                     strlen:entry 
[...]

The code for strlen() is,

size_t
strlen(const char *s)
{
        const char *s0 = s + 1;      
 
        while (*s++ != '\0')
                ;
        return (s - s0);
}

Note that there is no lines of code for DTrace instrumentation. This function is instrumented dynamically when the probe is enabled.

[edit] static process tracing

Static trace points can be placed and compiled into C code, to allow customized process tracing. This technique is used for the creation of some additional providers, such as plockstat.

For example, the following lists the plockstat provider probes that trace mutex release,

# dtrace -ln 'plockstat$target:libc::mutex-release' -p 749297
   ID   PROVIDER            MODULE                          FUNCTION NAME
12030 plockstat749297         libc.so.1              _pthread_spin_unlock mutex-release
12031 plockstat749297         libc.so.1                     lmutex_unlock mutex-release
12032 plockstat749297         libc.so.1                    __mutex_unlock mutex-release
12033 plockstat749297         libc.so.1             mutex_unlock_internal mutex-release
53684 plockstat749297         libc.so.1               mutex_lock_internal mutex-release
53685 plockstat749297         libc.so.1              mutex_unlock_process mutex-release
53686 plockstat749297         libc.so.1                mutex_unlock_queue mutex-release

These are static probes that have been placed in the libc mutex code. For example,

int
mutex_unlock_internal(mutex_t *mp)
{
        ulwp_t *self = curthread;
        uberdata_t *udp = self->ul_uberdata;
        int mtype = mp->mutex_type;
        tdb_mutex_stats_t *msp;
        int error;
        lwpid_t lwpid;

        if ((mtype & LOCK_ERRORCHECK) && !mutex_is_held(mp))
                return (EPERM);

        if (self->ul_error_detection && !mutex_is_held(mp))
                lock_error(mp, "mutex_unlock", NULL, NULL);

        if ((mtype & LOCK_RECURSIVE) && mp->mutex_rcount != 0) {
                mp->mutex_rcount--;
                DTRACE_PROBE2(plockstat, mutex__release, mp, 1);
[...]

[edit] dynamic process instruction tracing

DTrace can trace CPU instruction execution for processes.

The following traces each CPU instruction of the strlen() function from PID 749297, on an amd64 server,

# dtrace -n 'pid$target:libc:strlen:' -p 749297
dtrace: description 'pid$target:libc:strlen:' matched 40 probes
CPU     ID                    FUNCTION:NAME
  1  69517                     strlen:entry 
  1  44681                         strlen:0 
  1  44682                         strlen:4 
  1  44683                         strlen:6 
  1  44684                         strlen:9 
  1  22825                        strlen:30 
  1  22826                        strlen:32 
  1  22827                        strlen:35 
  1  22828                        strlen:3b 
  1  22829                        strlen:3d 
  1  22830                        strlen:43 
  1  22831                        strlen:45 
  1  44380                        strlen:47 
  1  44381                        strlen:4a 
  1  44382                        strlen:4d 
  1  44383                        strlen:51 
  1  44384                        strlen:55 
  1  44680                    strlen:return 
[...]

The probe names represent the instruction offset in strlen(). Compare to the disassembly,

# mdb -p 749297
Loading modules: [ ld.so.1 libc.so.1 ]
> strlen::dis
libc.so.1`strlen:               movl   0x4(%esp),%edx
libc.so.1`strlen+4:             movl   %edx,%eax
libc.so.1`strlen+6:             andl   $0x3,%edx
libc.so.1`strlen+9:             je     +0x25    <libc.so.1`strlen+0x30>
libc.so.1`strlen+0xb:           cmpb   $0x0,(%eax)
libc.so.1`strlen+0xe:           je     +0x41    <libc.so.1`strlen+0x51>
libc.so.1`strlen+0x10:          incl   %eax
libc.so.1`strlen+0x11:          cmpl   $0x3,%edx
libc.so.1`strlen+0x14:          je     +0x1a    <libc.so.1`strlen+0x30>
libc.so.1`strlen+0x16:          cmpb   $0x0,(%eax)
libc.so.1`strlen+0x19:          je     +0x36    <libc.so.1`strlen+0x51>
libc.so.1`strlen+0x1b:          incl   %eax
libc.so.1`strlen+0x1c:          cmpl   $0x2,%edx
libc.so.1`strlen+0x1f:          je     +0xf     <libc.so.1`strlen+0x30>
libc.so.1`strlen+0x21:          cmpb   $0x0,(%eax)
libc.so.1`strlen+0x24:          je     +0x2b    <libc.so.1`strlen+0x51>
libc.so.1`strlen+0x26:          incl   %eax
libc.so.1`strlen+0x27:          nop    
libc.so.1`strlen+0x28:          nop    
libc.so.1`strlen+0x29:          nop    
libc.so.1`strlen+0x2a:          nop    
libc.so.1`strlen+0x2b:          nop    
libc.so.1`strlen+0x2c:          nop    
libc.so.1`strlen+0x2d:          nop    
libc.so.1`strlen+0x2e:          nop    
libc.so.1`strlen+0x2f:          nop    
libc.so.1`strlen+0x30:          movl   (%eax),%ecx
libc.so.1`strlen+0x32:          addl   $0x4,%eax
libc.so.1`strlen+0x35:          leal   0xfefefeff(%ecx),%edx
libc.so.1`strlen+0x3b:          notl   %ecx
libc.so.1`strlen+0x3d:          andl   $0x80808080,%ecx
libc.so.1`strlen+0x43:          andl   %edx,%ecx
libc.so.1`strlen+0x45:          je     -0x17    <libc.so.1`strlen+0x30>
libc.so.1`strlen+0x47:          bsfl   %ecx,%ecx
libc.so.1`strlen+0x4a:          shrl   $0x3,%ecx
libc.so.1`strlen+0x4d:          leal   -0x4(%eax,%ecx),%eax
libc.so.1`strlen+0x51:          subl   0x4(%esp),%eax
libc.so.1`strlen+0x55:          ret     
>

[edit] process stack tracing

DTrace can print process stack back traces whenever a probe fires, and even aggregate (frequency count) stack traces.

For example, here we print a stack trace whenever processes called "bash" call a read() system call,

# dtrace -n 'syscall::read:entry /execname == "bash"/ { ustack(); }'
dtrace: description 'syscall::read:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  20069                       read:entry 
              libc.so.1`_read+0x7
              bash`rl_getc+0x22
              bash`rl_read_key+0xad
              bash`readline_internal_char+0x5f
              bash`0x80b1171
              bash`0x80b118c
              bash`readline+0x3a
              bash`0x806775b
              bash`0x80676c4
              bash`0x8067f40
              bash`0x8068927
              bash`0x8068492
              bash`yyparse+0x155
              bash`parse_command+0x59
              bash`read_command+0xd8
              bash`reader_loop+0x10f
              bash`main+0x647
              bash`_start+0x7a

[...]

[edit] process profiling

DTrace can sample process activity at a configurable rate, running a programmable action for each sample.

One use of profiling is to sample the user-level stack trace for when a process is on-CPU, for a rough idea of what is consuming CPU time. For example,

# dtrace -n 'profile:::profile-1001 /execname == "top"/ { @[ustack()] = count(); }'
dtrace: description 'profile:::profile-1001 ' matched 1 probe
^C
[...truncated...]

             libc.so.1`_close+0x7
             top`getptable+0x1b9
             top`get_process_info+0x14
             top`main+0x695
             top`_start+0x80
              57

             libc.so.1`_read+0x7
             top`getptable+0x101
             top`get_process_info+0x14
             top`main+0x695
             top`_start+0x80
             322

[edit] syscall tracing

DTrace can trace the entry and return of system calls through the syscall provider, and examine their entry arguments and return value.

The following lists system call probes,

# dtrace -ln 'syscall:::'
   ID   PROVIDER            MODULE                          FUNCTION NAME
20063    syscall                                               nosys entry
20064    syscall                                               nosys return
20065    syscall                                               rexit entry
20066    syscall                                               rexit return
20067    syscall                                             forkall entry
20068    syscall                                             forkall return
20069    syscall                                                read entry
20070    syscall                                                read return
20071    syscall                                               write entry
20072    syscall                                               write return
20073    syscall                                                open entry
20074    syscall                                                open return
[...400+ lines truncated...]

For example, the following prints a distribution plot for the write() syscall sizes for sshd processes,

# dtrace -n 'syscall::write:entry /execname == "sshd"/ { @["sshd bytes:"] = quantize(arg2); }'
dtrace: description 'syscall::write:entry ' matched 1 probe
^C

  sshd bytes:                                       
           value  ------------- Distribution ------------- count    
               0 |                                         0        
               1 |@@@@@                                    39       
               2 |                                         1        
               4 |                                         3        
               8 |@                                        4        
              16 |@                                        7        
              32 |@@@@@@                                   46       
              64 |@@@@@@@@@@@@@@@@@@@@@@                   168      
             128 |@@                                       14       
             256 |@@                                       14       
             512 |@                                        10       
            1024 |                                         1        
            2048 |                                         0        

[edit] user lock tracing

User lock tracing (process lock tracing) can be performed using either the plockstat provider, or the /usr/sbin/plockstat command (which also uses DTrace).

For example, the following DTrace one-liner counts process lock events for PID 749297,

# dtrace -n 'plockstat$target::: { @[probename] = count(); }' -p 749297  
dtrace: description 'plockstat$target::: ' matched 44 probes
^C
 
  mutex-acquire                                                   491
  mutex-release                                                   491

The plockstat command can produce higher level reports for process lock activity.

[edit] System

[edit] system info tracing

System info tracing provides probe points for when regular system events occur, such as those measured by kstat. The provider is sysinfo.

For example, here we use system info tracing to measure process reads (from all read syscall variants), and print the data as distribution plots by process name,

# dtrace -n 'sysinfo:::readch { @dist[execname] = quantize(arg0); }'
dtrace: description 'sysinfo:::readch ' matched 4 probes
^C
[...truncated...]

   gnome-terminal                                    
           value  ------------- Distribution ------------- count    
              -1 |                                         0        
               0 |@@                                       20       
               1 |@                                        10       
               2 |@                                        5        
               4 |                                         0        
               8 |                                         0        
              16 |                                         1        
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         301      
              64 |@@@@                                     35       
             128 |                                         3        
             256 |                                         0        

  Xorg                                              
           value  ------------- Distribution ------------- count    
              -1 |                                         0        
               0 |@@@@@@@@@@@@@                            340      
               1 |                                         0        
               2 |                                         0        
               4 |@                                        15       
               8 |@@                                       55       
              16 |@@@@@@@@@@@@                             321      
              32 |@@@@@@@                                  193      
              64 |                                         8        
             128 |                                         6        
             256 |@                                        24       
             512 |                                         11       
            1024 |@                                        30       
            2048 |@                                        25       
            4096 |@                                        25       
            8192 |                                         0   

[edit] virtual memory tracing

DTrace can trace various virtual memory events from the vminfo provider.

For example, this one-liner measures address space faults by process name - which is an indication that a process is consuming pages of memory,

# dtrace -n 'vminfo:::as_fault { @mem[execname] = sum(arg0); }' 
dtrace: description 'vminfo:::as_fault ' matched 1 probe
^C

  ls                                                               86
  dtrace                                                           91
  bash                                                            233
  java                                                           1649

[edit] scheduler tracing

DTrace can perform scheduler tracing with the sched provider.

For example, here to count all scheduler events while DTrace was tracing,

# dtrace -n 'sched::: { @[probename] = count(); }' 
dtrace: description 'sched::: ' matched 55 probes 
^C

  surrender                                                         5
  tick                                                             20
  remain-cpu                                                       23
  change-pri                                                       41
  preempt                                                          81
  wakeup                                                         1597
  sleep                                                          1602
  dequeue                                                        1679
  enqueue                                                        1679
  off-cpu                                                        2844
  on-cpu                                                         2844

[edit] process event tracing

DTrace can trace high level process events using the proc provider.

For example, the following uses the proc:::exec-success probe to trace when new processes are executed on the system. This one liner also prints out the process name and arguments (up to 80 chars),

# dtrace -n 'proc:::exec-success { trace(curpsinfo->pr_psargs); }'
dtrace: description 'proc:::exec-success ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  53978         exec_common:exec-success   man dtrace                       
  0  53978         exec_common:exec-success   sh -c cd /usr/man; tbl /usr/man/man1m/dtrace.1m |neqn /usr/share/lib/pub/eqncha
  0  53978         exec_common:exec-success   tbl /usr/man/man1m/dtrace.1m     
  0  53978         exec_common:exec-success   neqn /usr/share/lib/pub/eqnchar -
  0  53978         exec_common:exec-success   nroff -u0 -Tlp -man -            
  0  53978         exec_common:exec-success   col -x                           
  0  53978         exec_common:exec-success   sh -c trap  1 15; /usr/bin/mv -f /tmp/mpzKqWey /usr/man/cat1m/dtrace.1m 2> /d
  0  53978         exec_common:exec-success   /usr/bin/mv -f /tmp/mpzKqWey /usr/man/cat1m/dtrace.1m
  0  53978         exec_common:exec-success   sh -c less -siM /tmp/mpzKqWey    
  0  53978         exec_common:exec-success   less -siM /tmp/mpzKqWey          
^C

[edit] I/O tracing

DTrace can trace disk I/O by using the io provider. It has probes to trace disk I/O requests and completions, and also client based NFS requests.

For example, the following one-liner traces I/O requests by process name, and prints a distribution plot of the I/O transaction size,

# dtrace -n 'io:::start { @size[execname] = quantize(args[0]->b_bcount); }'
dtrace: description 'io:::start ' matched 6 probes
^C

  sched                                             
           value  ------------- Distribution ------------- count    
             256 |                                         0        
             512 |@@@                                      9        
            1024 |@@@@@@@@@                                23       
            2048 |@@@@@@@@@@                               25       
            4096 |@@@@@@@@@@@@                             32       
            8192 |@@@                                      7        
           16384 |@@@                                      7        
           32768 |                                         0        

  tar                                               
           value  ------------- Distribution ------------- count    
             512 |                                         0        
            1024 |                                         3        
            2048 |@                                        5        
            4096 |@                                        9        
            8192 |@@                                       10       
           16384 |@@@@@@                                   35       
           32768 |@@@                                      16       
           65536 |@@@@@@@@@@@@@@@@@@@@@@@@@                160      
          131072 |@@                                       13       
          262144 |                                         0    

[edit] filesystem tracing

DTrace can perform filesystem tracing using either the fsinfo provider for VFS level tracing, or the fbt provider for raw filesystem module tracing.

For example, this one-liner traces filesystem events called by the tar command,

# dtrace -n 'fsinfo::: /execname == "tar"/ { @[probename] = count(); }'
dtrace: description 'fsinfo::: ' matched 44 probes
^C
 
  getsecattr                                                        1
  realvp                                                            3
  addmap                                                           10
  dispose                                                          12
  map                                                              13
  delmap                                                           14
  readlink                                                         27
  getpage                                                         115
  cmp                                                             816
  readdir                                                        1119
  seek                                                           1188
  close                                                         11588
  open                                                          11588
  getattr                                                       11996
  access                                                        12386
  read                                                          14224
  rwlock                                                        15343
  rwunlock                                                      15343
  lookup                                                        24472

Since fsinfo probes are associated with the syscall, we can associate them to the on-CPU process name. fbt tracing provider deeper insight into filesystem module operation, but associating the activity to the process name gets a little harder.

The following example lists some of the raw zfs probes available,

# dtrace -ln 'fbt:zfs::'
   ID   PROVIDER            MODULE                          FUNCTION NAME
 1455        fbt               zfs                          buf_hash entry
 1456        fbt               zfs                          buf_hash return
 1457        fbt               zfs                     buf_hash_find entry
 1458        fbt               zfs                     buf_hash_find return
 1459        fbt               zfs                   buf_hash_insert entry
 1460        fbt               zfs                   buf_hash_insert return
 1461        fbt               zfs                   buf_hash_remove entry
 1462        fbt               zfs                   buf_hash_remove return
 1463        fbt               zfs                          buf_fini entry
 1464        fbt               zfs                          buf_fini return
 1465        fbt               zfs                          hdr_cons entry
[...truncated...]

[edit] system event tracing

DTrace is able to trace system events using the sysevent provider.

For example, the following one-liner traced sysevents while an svcadm enable telnet command was issued,

# dtrace -n 'sysevent:::post { printf("%s %s %s", args[1]->se_class, args[1]->se_subclass, args[1]->se_publisher); }'
dtrace: description 'sysevent:::post ' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0  53983              evch_chpublish:post master state_change com.sun:usr:librestart:206826
  0  53983              evch_chpublish:post protocol restarter com.sun:usr:svc.startd:100004
  0  53983              evch_chpublish:post protocol restarter com.sun:usr:svc.startd:100004
  0  53983              evch_chpublish:post master state_change com.sun:usr:librestart:206826
  0  53983              evch_chpublish:post master state_change com.sun:usr:librestart:206826
  1  53983              evch_chpublish:post protocol restarter com.sun:usr:svc.startd:100004
  1  53983              evch_chpublish:post protocol restarter com.sun:usr:svc.startd:100004
  1  53983              evch_chpublish:post protocol restarter com.sun:usr:svc.startd:100004
^C

[edit] postmortem tracing

If a system panics and creates a crash dump while DTrace was running, the data from the DTrace buffer can be inspected from the crash dump using mdb.

For example, the following DTrace command was run shortly before a panic,

# dtrace -b 1m -x bufpolicy=ring -n 'proc:::exec-success { printf("%d %s", timestamp, curpsinfo->pr_psargs); }' &

A timestamp was printed as we are on a multi-CPU server, and would like to post sort the output.

After the system reboots, the DTrace buffer can be read from the crash dump using mdb -k,

# cd /var/crash/mars
# mdb -k *.0
Loading modules: [ unix genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp scsi_vhci zfs random nfs ip hook neti sctp arp 
usba s1394 fctl nca lofs audiosup md cpc crypto fcp logindmux ptm ufs sppp ipc ]
> ::dtrace_state
            ADDR MINOR             PROC NAME                         FILE
fffffffecece1300     2 fffffffecb702520 dtrace           fffffffeceb7a270
>
> fffffffecece1300::dtrace ! sort -n +3
CPU     ID                    FUNCTION:NAME
  1    781         exec_common:exec-success 624213668834 ls
  1    781         exec_common:exec-success 625325705509 uptime
  0    781         exec_common:exec-success 625341515071 uptime
  0    781         exec_common:exec-success 629566284296 who -r
  0    781         exec_common:exec-success 669056986690 man -l uadmin
  1    781         exec_common:exec-success 672480981449 man uadmin.2
  1    781         exec_common:exec-success 672923551403 sh -c cd /usr/share/man; tbl /usr/share/man/man2/uadmin.2 |neqn /usr/share/lib/
  1    781         exec_common:exec-success 672925729979 tbl /usr/share/man/man2/uadmin.2
  0    781         exec_common:exec-success 672925980156 neqn /usr/share/lib/pub/eqnchar -
  0    781         exec_common:exec-success 672927753034 col -x
  1    781         exec_common:exec-success 672927856012 nroff -u0 -Tlp -man -
  0    781         exec_common:exec-success 672933591477 sh -c trap  1 15; /usr/bin/mv -f /tmp/mpnfaONb /usr/share/man/cat2/uadmin.2 2
  1    781         exec_common:exec-success 672935081720 /usr/bin/mv -f /tmp/mpnfaONb /usr/share/man/cat2/uadmin.2
  0    781         exec_common:exec-success 672936740752 sh -c more -s /tmp/mpnfaONb
  1    781         exec_common:exec-success 672938059459 more -s /tmp/mpnfaONb
  0    781         exec_common:exec-success 719796979050 uadmin 5 1
>

In this case, the panic was deliberately caused by the uadmin command, which is the last command recorded in the crash dump. (in fact, you can also see me running the man command beforehand, to remember which code was for causing a panic!).

[edit] Kernel

[edit] dynamic kernel tracing

The fbt provider allows tracing of almost every kernel function entry and return.

For example, the following traces when the ts_preempt() kernel function is entered,

# dtrace -n 'fbt:TS:ts_preempt:entry'
dtrace: description 'fbt:TS:ts_preempt:entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  18235                 ts_preempt:entry 
  0  18235                 ts_preempt:entry 
  0  18235                 ts_preempt:entry 
  0  18235                 ts_preempt:entry 
  0  18235                 ts_preempt:entry 
  0  18235                 ts_preempt:entry 
  0  18235                 ts_preempt:entry 
[...]

The code for ts_preempt() is below,

/*
 * Arrange for thread to be placed in appropriate location
 * on dispatcher queue.
 *
 * This is called with the current thread in TS_ONPROC and locked.
 */
static void
ts_preempt(kthread_t *t)
{
        tsproc_t        *tspp = (tsproc_t *)(t->t_cldata);
        klwp_t          *lwp = curthread->t_lwp;
        pri_t           oldpri = t->t_pri;

Note that there are no lines in the code for DTrace instrumentation - the instrumentation is added in memory when the probe is enabled (this is dynamic tracing).

[edit] static kernel tracing

DTrace allows static tracing points to be inserted into kernel code, which become available under the sdt provider for tracing.

For example, the sdt::priv-err probe traces when a privilege request is denied. Here the probe is listed, which shows that it fires from three different functions,

# dtrace -ln 'sdt:::priv-err'
   ID   PROVIDER            MODULE                          FUNCTION NAME
54164        sdt           genunix                  priv_policy_only priv-err
54165        sdt           genunix                priv_policy_choice priv-err
54166        sdt           genunix                   priv_policy_err priv-err

The following is the kernel code for the sdt::priv_policy_err:priv-err probe, with the static probe highlighted,

/*
 * Audit failure, log error message.
 */
static void
priv_policy_err(const cred_t *cr, int priv, boolean_t allzone, const char *msg)
{

#ifdef C2_AUDIT
        if (audit_active)
                audit_priv(priv, allzone ? ZONEPRIVS(cr) : NULL, 0);
#endif
        DTRACE_PROBE2(priv__err, int, priv, boolean_t, allzone);
[...]

[edit] kernel stack tracing

DTrace can print kernel stack back traces whenever a probe fires, and even aggregate (frequency count) stack traces.

For example, here we print a stack trace whenever a disk I/O is issued,

# dtrace -n 'io:::start { stack(); }'
dtrace: description 'io:::start ' matched 6 probes
CPU     ID                    FUNCTION:NAME
  0  53946              bdev_strategy:start 
              genunix`ldi_strategy+0x54
              zfs`vdev_disk_io_start+0x214
              zfs`vdev_io_start+0x1d
              zfs`zio_vdev_io_start+0x205
              zfs`zio_next_stage_async+0x173
              zfs`zio_nowait+0x11
              zfs`vdev_mirror_io_start+0x19a
              zfs`zio_vdev_io_start+0x213
              zfs`zio_next_stage+0x16b
              zfs`zio_ready+0x10e
              zfs`zio_next_stage+0x16b
              zfs`zio_dva_allocate+0x140
              zfs`zio_next_stage+0x16b
              zfs`zio_checksum_generate+0x93
              zfs`zio_next_stage+0x16b
              zfs`zio_write_compress+0x2ef
              zfs`zio_next_stage+0x16b
              zfs`zio_wait_for_children+0x56
              zfs`zio_wait_children_ready+0x20
              zfs`zio_next_stage_async+0x173

[...]

[edit] kernel profiling

The kernel can be profiled, meaning a programmable event can be fired at a configurable rate. This is performed using the profile provider.

A useful application of this is to sample the kernel stack trace at roughly 1000 Hertz, to provide an insight

# dtrace -n 'profile:::profile-1001 /arg0/ { @[stack()] = count(); }'            
dtrace: description 'profile:::profile-1001 ' matched 1 probe
^C
[...truncated...]

              unix`ddi_get8+0xf
              ata`ata_id_common+0x1b8
              ata`ata_disk_id+0x4b
              ata`ata_drive_type+0xce
              ata`ata_probe+0x96
              genunix`devi_probe+0x45
              genunix`probe_node+0x62
              genunix`i_ndi_config_node+0x10d
              genunix`i_ddi_attachchild+0x67
              genunix`devi_attach_node+0xcc
              genunix`config_immediate_children+0xf2
              genunix`devi_config_common+0x9f
              genunix`ndi_devi_config+0x34
              devfs`dv_filldir+0x7c
              devfs`devfs_readdir+0xf9
              genunix`fop_readdir+0x3e
              genunix`getdents64+0xbd
              unix`sys_syscall32+0x1ff
              232

              unix`mach_cpu_idle+0x6
              unix`cpu_idle+0xc8
              unix`idle+0x10e
              unix`thread_start+0x8
             1657

[edit] high resolution profiling

DTrace makes use of Solaris cyclics for high resolution profiling. A safety limit has been set to limit this to 4999 Hertz.

For example, here we sample the current kernel function at 3456 Hertz,

# dtrace -n 'profile:::profile-3456 /arg0/ { @[stack(1)] = count(); }'   
dtrace: description 'profile:::profile-3456 ' matched 1 probe
^C
[...truncated...]

              unix`ddi_get8+0xf
              487

              unix`tsc_gethrtime+0x62
              871

              unix`mach_cpu_idle+0x6
             3838

[edit] kernel CTF

DTrace has the ability to walk structures in kernel memory using C operators, without pre-casting. This is made possible by Compact ANSI-C Type Format (CTF), which encodes type and structure information in the binary, for debugging.

For example, the argument to ts_preempt() is a pointer to kthread_t,

static void
ts_preempt(kthread_t *t)      
{
[...]

When tracing ts_preempt(), DTrace provides this argument as args[0], which is already casted as a pointer to kthread_t. The following demonstrates walking this structure,

# dtrace -n 'fbt::ts_preempt:entry { trace(args[0]->t_procp->p_user.u_comm); }'
dtrace: description 'fbt::ts_preempt:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  18235                 ts_preempt:entry   Xorg                             
  0  18235                 ts_preempt:entry   Xorg                             
  0  18235                 ts_preempt:entry   Xorg                             
  0  18235                 ts_preempt:entry   opera                            
  0  18235                 ts_preempt:entry   opera                            
  0  18235                 ts_preempt:entry   java_vm                          
  0  18235                 ts_preempt:entry   opera                            
  1  18235                 ts_preempt:entry   gnome-terminal                   
  1  18235                 ts_preempt:entry   gnome-terminal                   
  1  18235                 ts_preempt:entry   opera                            
  1  18235                 ts_preempt:entry   opera                            
  1  18235                 ts_preempt:entry   Xorg                   
[...]

[edit] kernel lock tracing

DTrace can perform kernel lock event tracing through the use of the lockstat provider or the /usr/sbin/lockstat command.

The following sums blocked time for kernel adaptive mutex locks, by kernel stack trace. The units are nanoseconds,

# dtrace -n 'lockstat:::adaptive-block { @time[stack()] = sum(arg1); }'    
dtrace: description 'lockstat:::adaptive-block ' matched 1 probe
^C
[...truncated...]

             iscsi`iscsi_handle_logout+0xc5
             iscsi`iscsi_conn_state_logged_in+0x7f
             iscsi`iscsi_conn_state_machine+0x93
             iscsi`iscsi_conn_offline+0xac
             iscsi`iscsi_sess_destroy+0x88
             iscsi`iscsi_ioctl_sendtgts_get+0x201
             iscsi`iscsid_do_sendtgts+0x79
             iscsi`iscsid_thread_sendtgts+0x88
             iscsi`iscsi_threads_entry+0x15
             genunix`taskq_thread+0x1dc
             unix`thread_start+0x8
          361309

             zfs`vdev_queue_io_done+0x3b
             zfs`vdev_disk_io_done+0x16
             zfs`vdev_io_done+0x1d
             zfs`zio_vdev_io_done+0x1b
             genunix`taskq_thread+0x1dc
             unix`thread_start+0x8
          899509

The lockstat command (which uses DTrace) has also been provided to produce high level reports easily.

[edit] anonymous tracing

Anonymous tracing is the ability to trace during boot, before processes have began executing.

For example, here we use dtrace -A to initiate a trace on nge events during boot, which is our network driver,

# dtrace -A -n 'fbt:nge::entry { trace(timestamp); }'
dtrace: saved anonymous enabling in /kernel/drv/dtrace.conf
dtrace: added forceload directives to /etc/system
dtrace: run update_drv(1M) or reboot to enable changes
#
# init 6
[...time passes...]

After the system reboots, dtrace -a is used to retrieve the anonymous trace. Here it is post sorted as we are on a multi-CPU server,

# dtrace -a > /tmp/out
# sort -n +3 /tmp/out

CPU     ID                    FUNCTION:NAME
  1  66351                 nge_attach:entry       50340150685
  1  66321              nge_get_props:entry       50340920147
  1  66319        nge_check_desc_prop:entry       50340930003
  1  66501          nge_chip_cfg_init:entry       50340944350
  1  66523             nge_alloc_bufs:entry       50340969182
  1  66473          nge_alloc_dma_mem:entry       50340970449
  1  66473          nge_alloc_dma_mem:entry       50340998798
  1  66473          nge_alloc_dma_mem:entry       50341011886
[...]

[edit] Languages

[edit] Java tracing

The hotspot provider was integrated into JDK 6.0, which provides hundreds of probes for tracing the JVM. See DTrace Topics: Java for details.

For example, here we run the TestGreeting Java prorgam with extended DTrace probes enabled - so that we can trace the execution of methods,

# java -XX:+ExtendedDTraceProbes TestGreeting
Hello DTrace!
Hello DTrace!
Hello DTrace!
[...]

Now DTrace is able to trace Java method calls as they occur,

# dtrace -qn 'hotspot*:::method-entry { printf("-> %4s.%s\n", 
 stringof(copyin(arg1, arg2)), stringof(copyin(arg3, arg4))); }'  
-> Greeting.greet
-> java/io/PrintStream.println
-> java/io/PrintStream.print
-> java/io/PrintStream.write
-> java/io/PrintStream.ensureOpen
-> java/io/Writer.write
-> java/io/BufferedWriter.write
-> java/io/BufferedWriter.ensureOpen
-> java/io/BufferedWriter.min
-> java/lang/String.getChars
-> java/lang/System.arraycopy
-> java/io/BufferedWriter.flushBuffer
-> java/io/BufferedWriter.ensureOpen
-> java/io/OutputStreamWriter.write

[edit] Java stack tracing

The jstack() action can be used to trace user stack traces with Java frame translation.

In this example, whenever java processes call the ioctl() system call, DTrace aggregates their stack trace using jstack(),

# dtrace -n 'syscall::ioctl:entry /execname == "java"/ { @[jstack()] = count(); }'
dtrace: description 'syscall::ioctl:entry ' matched 1 probe
^C


              libc.so.1`ioctl+0x7
              libX11.so.4`_X11TransBytesReadable+0x14
              libX11.so.4`_XEventsQueued+0x130
              libX11.so.4`XEventsQueued+0x4f
              libmawt.so`Java_sun_awt_X11_XlibWrapper_XEventsQueued+0x1f
              sun/awt/X11/XlibWrapper.XEventsQueued(JI)I*
              sun/awt/X11/XToolkit.run(Z)V*
              0xb6e317b0
                6

              libc.so.1`ioctl+0x7
              libX11.so.4`_X11TransBytesReadable+0x14
              libX11.so.4`_XEventsQueued+0x130
              libX11.so.4`XEventsQueued+0x4f
              libmawt.so`Java_sun_awt_X11_XlibWrapper_XEventsQueued+0x1f
              sun/awt/X11/XlibWrapper.XEventsQueued(JI)I*
              sun/awt/X11/XToolkit.run(Z)V*
              0xb6e317b0
                6

Lines such as sun/awt/X11/XToolkit.run() are from Java.

[edit] JavaScript tracing

When running a browser with an instrumented JavaScript library, it is possible to trace JavaScript activity such as function calls, object creation and garbage collection. The libmosjs library has been instrumented with JavaScript probes for the Mozilla and Firefox browsers, and is currently part of the Mozilla DTrace project, and Mozilla RFE 370906. See DTrace_Topics: JavaScript for details.

The following example traces the flow of JavaScript functions as they are called, and prints their entry arguments and return values as hexidecimal values (strings can be printed instead when needed), and return line numbers,

# ./js_argflow.d
C TIME                  FILE                 -- FUNC()
1 2007 May 31 06:07:34  clock2.html           -> startTime()
1 2007 May 31 06:07:34  clock2.html             -> getHours()
1 2007 May 31 06:07:34  clock2.html             <- startTime:9 = 0x80000000
1 2007 May 31 06:07:34  clocklib.js             -> padLibZero(0x6)
1 2007 May 31 06:07:34  clocklib.js             <- padLibZero:3 = 0x955fd70
1 2007 May 31 06:07:34  clock2.html             -> getMinutes()
1 2007 May 31 06:07:34  clock2.html             <- startTime:10 = 0x80000000
1 2007 May 31 06:07:34  clocklib.js             -> padLibZero(0x7)
1 2007 May 31 06:07:34  clocklib.js             <- padLibZero:3 = 0x955fd50
1 2007 May 31 06:07:34  clock2.html             -> getSeconds()
1 2007 May 31 06:07:34  clock2.html             <- startTime:11 = 0x80000000
1 2007 May 31 06:07:34  clocklib.js             -> padLibZero(0x22)
1 2007 May 31 06:07:34  clocklib.js             <- padLibZero:4 = 0x22
1 2007 May 31 06:07:34  clock2.html             -> getElementById()
1 2007 May 31 06:07:34  clock2.html             <- startTime:12 = 0x80000000
1 2007 May 31 06:07:34  clock2.html             -> setTimeout()
1 2007 May 31 06:07:34  clock2.html             <- startTime:13 = 0x80000000
1 2007 May 31 06:07:34  clock2.html           <- startTime:13 = 0x80000000
^C

[edit] Ruby tracing

Ruby can be patched and compiled with a Ruby DTrace provider, which provides visibility of events such as function and line execution, and object manegement. You can use either Rich Lowe's Ruby Provider, or Joyent's Ruby Provider.

The following one-liner frequency counts method calls,

# dtrace -n 'ruby$target:::function-entry { @[copyinstr(arg1)] = count(); }'

[edit] Applications

[edit] iSCSI tracing

An iSCSI provider has been developed and will be integrated in a future versions of Solaris. For details, see DTrace Topics: iSCSI.

A simple and useful example of the iSCSI provider is tracing iSCSI operation by IP address,

# ./iscsiwho.d
Tracing... Hit Ctrl-C to end.
^C
   REMOTE IP                     iSCSI EVENT    COUNT
   192.168.1.109                 nop-receive        1
   192.168.1.109                    nop-send        1
   192.168.1.109               scsi-response      126
   192.168.1.109                scsi-command      208
   192.168.1.109                data-receive     1207
   192.168.1.109                data-request     1207
   192.168.1.109                   data-send     1290     

[edit] X11 tracing

An X Windowing System provider has been developed and released in Solaris 10 Update 4. See DTrace Topics: X for details.

In this simple example, X events are frequency counted,

# dtrace -n 'Xserver*::: { @[probename] = count(); }'
dtrace: description 'Xserver*::: ' matched 21 probes
^C 

  client-auth                                                       1
  client-connect                                                    1
  client-disconnect                                                 1
  send-event                                                        5
  resource-alloc                                                  290
  resource-free                                                   290
  request-done                                                   3137
  request-start                                                  3137

[edit] Trace Capabilities

[edit] scriptability

DTrace allows scripts to be written in its D language. Scripts can be executable and begin with the line: #!/usr/sbin/dtrace -s, although there are other ways to script using DTrace.

For example, the following simple script is made executable and run at the command line,

# cat syscall.d
#!/usr/sbin/dtrace -s

syscall:::entry
{
        @num[execname, probefunc] = count();
} 

dtrace:::END
{
        printa("   %-32s %-32s %@8d\n", @num);
}
#
# chmod 755 syscall.d
#
# ./syscall.d
dtrace: script './syscall.d' matched 236 probes
^C
CPU     ID                    FUNCTION:NAME
  0      2                             :END
   devfsadm                         gtime                                   1
   devfsadm                         lwp_park                                1
[...output truncated...]
   dtrace                           p_online                               64
   Xorg                             read                                   78
   operapluginwrapp                 ioctl                                 142
   dtrace                           ioctl                                 143
   operapluginwrapp                 pollsys                               143
   opera                            gtime                                 180
   opera                            ioctl                                 190
   opera                            pollsys                               192

[edit] aggregations

DTrace has a powerful data type called an aggregation, which is identified by the symbol "@" in the D language. Aggregations can be used in different ways depending on the aggregating action that is used, such as "count()".

For example, here we use a count() aggregation to frequency count system calls,

# dtrace -n 'syscall:::entry { @[probefunc] = count(); }'
dtrace: description 'syscall:::entry ' matched 235 probes
^C

  fstat                                                             1
  mmap                                                              1
  schedctl                                                          1
  sysconfig                                                         3
  lwp_park                                                          4
  sigaction                                                         4
  brk                                                               6
  setcontext                                                        9
  writev                                                            9
  stat64                                                           12
  setitimer                                                        17
  lwp_sigmask                                                      21
  write                                                            31
  portfs                                                           55
  read                                                             62
  p_online                                                         64
  gtime                                                            90
  pollsys                                                         200
  ioctl                                                           227

Now a quantize() aggregation is used to print a distribution plot of syscall read size,

# dtrace -n 'sysinfo:::readch { @["read size (bytes):"] = quantize(arg0); }'
dtrace: description 'sysinfo:::readch ' matched 4 probes
^C

  read size (bytes):
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@                                    158
               1 |@@                                       61
               2 |                                         3
               4 |                                         1
               8 |                                         4
              16 |@@@@@                                    141
              32 |@@@@@                                    163
              64 |@@@@@@@@@@@@@@@@@@@@@@                   658
             128 |                                         0
             256 |                                         9
             512 |@                                        16
            1024 |                                         0
            2048 |                                         5
            4096 |                                         5
            8192 |                                         0

[edit] predicates

Predicaties are filter expressions, and can be used to focus tracing on the intended target.

Here we use a predicate to frequency count system calls from processes with the name "sshd",

# dtrace -n 'syscall:::entry /execname == "sshd"/ { @[probefunc] = count(); }'
dtrace: description 'syscall:::entry ' matched 235 probes
^C

  read                                                             30
  write                                                            30
  pollsys                                                          52
  lwp_sigmask                                                     104

[edit] thread local storage

Thread local variables can be used to associate values with the current thread. They are prefixed with "self->".

This script uses thread local variables to time read() syscalls from processes called "bash",

#!/usr/sbin/dtrace -s

syscall::read:entry
/execname == "bash"/
{
        self->start = timestamp;
}

syscall::read:return
/self->start/
{
        printf("%d ms", (timestamp - self->start) / 1000000);
        self->start = 0;
}

Now running this script while I type in another window,

# ./read.d
dtrace: script './read.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0  20070                      read:return 167 ms
  0  20070                      read:return 196 ms
  0  20070                      read:return 164 ms
  0  20070                      read:return 220 ms
  0  20070                      read:return 285 ms
  0  20070                      read:return 16 ms
  0  20070                      read:return 1451 ms
  0  20070                      read:return 138 ms
  0  20070                      read:return 279 ms
^C

The times above are the keystroke latencies as I type in the bash shell.

[edit] clause local storage

Clause local variables are valid within their action clause, and are used as temporary variables that have little performance overhead. They are prefixed with the text "this->".

The following example script (taken from the previous feature) uses a clause local variable to improve readability slightly,

#!/usr/sbin/dtrace -s

syscall::read:entry
/execname == "bash"/
{
        self->start = timestamp;
}

syscall::read:return
/self->start/
{
        this->elapsed = (timestamp - self->start) / 1000000;
        printf("%d ms", this->elapsed);
        self->start = 0;
}

The output is the same as before.

[edit] speculative tracing

(speculate())

[edit] is-enabled

DTrace supports "is-enabled" macros - which allow code to only be run when DTrace has tracing enabled. This is useful for reducing the not-enabled overhead when the probe additions require calling CPU expensive functions to fetch their argument data.

For example, the DTrace JavaScript provider uses is-enabled macros to only call functions if the probes are enabled. The following is from js/src/jsinterp.c,

#ifdef INCLUDE_MOZILLA_DTRACE
                /* DTrace function entry, inlines */
                if (TRACE_MOZILLA_JS_FUNCTION_ENTRY_ENABLED())
                    jsdtrace_function_entry(fp, fun);
                if (TRACE_MOZILLA_JS_FUNCTION_INFO_ENABLED())
                    jsdtrace_function_info(cx, fp, newifp->frame.down, fun);
                if (TRACE_MOZILLA_JS_FUNCTION_ARGS_ENABLED())
                    jsdtrace_function_args(cx, fp, fun);
#endif

[edit] switch buffers

The switch buffer is the default buffer policy, which maintains two buffers and switches between them when the consumer performs a read. By default, the consumer reads from the switch buffer at 1 Hertz; this can be increased with the switchrate tunable.

For example, here we change switchrate to be 10 Hertz, so that the output appears to scroll smoothly (something that will be lost in this static screenshot - sorry!),

# dtrace -x switchrate=10 -n 'profile:::profile-10 { trace(timestamp); }'
dtrace: description 'profile:::profile-10 ' matched 1 probe
dtrace: buffer size lowered to 2m
CPU     ID                    FUNCTION:NAME
  0  38453                      :profile-10    34104872632841
  0  38453                      :profile-10    34104972631213
  0  38453                      :profile-10    34105072631940
  0  38453                      :profile-10    34105172632310
  0  38453                      :profile-10    34105272632251
  0  38453                      :profile-10    34105372633003
  0  38453                      :profile-10    34105472632577
  0  38453                      :profile-10    34105572632676
  0  38453                      :profile-10    34105672631851
  0  38453                      :profile-10    34105772631102
[...]

[edit] ring buffers

DTrace supports "ring" buffers, which overwrite old data so that the most recent output is held in the buffer.

Here the buffer size is set to 4 Kbytes, the buffer policy is set to "ring", and we probe some fast events that will fill the buffer rapidly,

# dtrace -b 4k -x bufpolicy=ring -n 'profile:::profile-100 { printf("%Y %s", walltimestamp, execname); }'
dtrace: description 'profile:::profile-100 ' matched 1 probe
^C
CPU     ID                    FUNCTION:NAME
  0  38455                     :profile-100 2007 Jun  3 18:05:17 bash
  0  38455                     :profile-100 2007 Jun  3 18:05:17 date
  0  38455                     :profile-100 2007 Jun  3 18:05:17 date
  0  38455                     :profile-100 2007 Jun  3 18:05:17 sched
  0  38455                     :profile-100 2007 Jun  3 18:05:17 date
  0  38455                     :profile-100 2007 Jun  3 18:05:17 date
  0  38455                     :profile-100 2007 Jun  3 18:05:17 bash
  0  38455                     :profile-100 2007 Jun  3 18:05:17 bash
  0  38455                     :profile-100 2007 Jun  3 18:05:17 sched
  0  38455                     :profile-100 2007 Jun  3 18:05:17 sched
  0  38455                     :profile-100 2007 Jun  3 18:05:17 gnome-terminal
  0  38455                     :profile-100 2007 Jun  3 18:05:17 sched
  0  38455                     :profile-100 2007 Jun  3 18:05:17 bash
  0  38455                     :profile-100 2007 Jun  3 18:05:17 bash
  0  38455                     :profile-100 2007 Jun  3 18:05:17 bash

When Ctrl-C is hit, the output shows that only the last several events were in the buffer - despite this one-liner running for several seconds.

[edit] fill buffers

DTrace supports "fill" buffers - which terminate DTrace when they fill with output.

Here we set the buffer size to 4 Kbytes, we set the buffer policy to "fill", and probe syscall events,

# dtrace -b 4k -x bufpolicy=fill -n 'syscall:::entry { trace(execname); }'
dtrace: description 'syscall:::entry ' matched 225 probes
CPU     ID                    FUNCTION:NAME
  0    103                      ioctl:entry   dtrace
  0    103                      ioctl:entry   dtrace
  0    239                  sysconfig:entry   dtrace
  0    239                  sysconfig:entry   dtrace
  0    173                  sigaction:entry   dtrace
  0    173                  sigaction:entry   dtrace
  0    103                      ioctl:entry   dtrace
  0    197                       mmap:entry   dtrace
  0    361                   schedctl:entry   dtrace
  0    131                   lwp_park:entry   dtrace
  0     13                      write:entry   gnome-terminal
  0    103                      ioctl:entry   gnome-terminal
  0    317                    pollsys:entry   gnome-terminal
  0     13                      write:entry   gnome-terminal
  0    103                      ioctl:entry   gnome-terminal

[edit] translators

DTrace supports translators - which can take a pointer to a struct and represent choosen members in a stable way.

For example, the DTrace io provider uses a translator to provide a stable collection of device info, drawn from a struct buf pointer, which would otherwise be complex to fetch. This is from /usr/lib/dtrace/io.d,

typedef struct devinfo {
        int dev_major;                  /* major number */
        int dev_minor;                  /* minor number */
        int dev_instance;               /* instance number */
        string dev_name;                /* name of device */
        string dev_statname;            /* name of device + instance/minor */
        string dev_pathname;            /* pathname of device */
} devinfo_t;

#pragma D binding "1.0" translator
translator devinfo_t < struct buf *B > {
        dev_major = B->b_dip != NULL ? getmajor(B->b_edev) :
            getmajor(B->b_file->v_vfsp->vfs_dev);
        dev_minor = B->b_dip != NULL ? getminor(B->b_edev) :
            getminor(B->b_file->v_vfsp->vfs_dev);
        dev_instance = B->b_dip == NULL ?
            getminor(B->b_file->v_vfsp->vfs_dev) :
            ((struct dev_info *)B->b_dip)->devi_instance;
        dev_name = B->b_dip == NULL ? "nfs" :
            stringof(`devnamesp[getmajor(B->b_edev)].dn_name);
        dev_statname = strjoin(B->b_dip == NULL ? "nfs" :
            stringof(`devnamesp[getmajor(B->b_edev)].dn_name),
            lltostr(B->b_dip == NULL ? getminor(B->b_file->v_vfsp->vfs_dev) :
            ((struct dev_info *)B->b_dip)->devi_instance == 0 &&
            ((struct dev_info *)B->b_dip)->devi_parent != NULL &&
            ((struct dev_info *)B->b_dip)->devi_parent->devi_node_name ==
            "pseudo" ? getminor(B->b_edev) :
            ((struct dev_info *)B->b_dip)->devi_instance));
        dev_pathname = B->b_dip == NULL ? "<nfs>" :
            ddi_pathname(B->b_dip, getminor(B->b_edev));
};

For an example of using this translator, the following one-liner counts I/O requests by device name,

# dtrace -n 'io:::start { @[args[1]->dev_statname] = count(); }'
dtrace: description 'io:::start ' matched 6 probes
^C

  cmdk0                                                            40

[edit] cacheable predicates

DTrace features cacheable predicates, which reduce the CPU overhead of DTrace predicate execution.

In the following example D script extract, the /self->ok/ predicate would cache,

[...]
syscall::read:entry
/self->ok/
{
        @["syscall read()"] = count();
}

syscall::write:entry
/self->ok/
{
        @["syscall write()"] = count();
}
[...]

[edit] stability framework

DTrace supports a stability framework, which includes carefully choosen semantics to describe the stability level of probes and arguments.

Here we print the stability info for the io:genunix:bdev_strategy:start probe,

# dtrace -v -ln 'io:genunix:bdev_strategy:start'
   ID   PROVIDER            MODULE                          FUNCTION NAME
53946         io           genunix                     bdev_strategy start 

        Probe Description Attributes
                Identifier Names: Private
                Data Semantics:   Private
                Dependency Class: Unknown

        Argument Attributes
                Identifier Names: Evolving
                Data Semantics:   Evolving
                Dependency Class: ISA

        Argument Types
                args[0]: bufinfo_t *
                args[1]: devinfo_t *
                args[2]: fileinfo_t *

[edit] Safety

[edit] DTrace test suite

One of the greatest features of DTrace is the comprehensive test suite (SUNWdtrt), which enhances the stability and safety of DTrace as it is developed. Any new feature is run through these comprehensive tests, on a variety of platforms. The test suite has been made public so that customers can also run it themselves.

For example, running the DTrace test suite (which at times will cause serious CPU load on the system - to deliberately put DTrace through its paces),

# cd /opt/SUNWdtrt
# ./bin/dtest
testing /opt/SUNWdtrt/tst/common/aggs/err.D_AGG_FUNC.bad.d ... [307920]
testing /opt/SUNWdtrt/tst/common/aggs/err.D_AGG_MDIM.bad.d ... [307922]
testing /opt/SUNWdtrt/tst/common/aggs/err.D_AGG_NULL.bad.d ... [307924]
testing /opt/SUNWdtrt/tst/common/aggs/err.D_AGG_REDEF.redef.d ... [307926]
testing /opt/SUNWdtrt/tst/common/aggs/err.D_AGG_SCALAR.avgtoofew.d ... [307928]
testing /opt/SUNWdtrt/tst/common/aggs/err.D_AGG_SCALAR.maxnoarg.d ... [307930]
testing /opt/SUNWdtrt/tst/common/aggs/err.D_AGG_SCALAR.mintoofew.d ... [307932]
testing /opt/SUNWdtrt/tst/common/aggs/err.D_AGG_SCALAR.quantizetoofew.d ... [307934]
[...]

There are over one thousand tests in the test suite. For a list, look in this directory,

# find /opt/SUNWdtrt/tst
/opt/SUNWdtrt/tst
/opt/SUNWdtrt/tst/common
/opt/SUNWdtrt/tst/common/tracemem
/opt/SUNWdtrt/tst/common/tracemem/tst.rootvp.d
/opt/SUNWdtrt/tst/common/tracemem/err.D_TRACEMEM_SIZE.negsize.d
/opt/SUNWdtrt/tst/common/tracemem/err.D_PROTO_LEN.toofew.d
/opt/SUNWdtrt/tst/common/tracemem/err.D_PROTO_LEN.toomany.d
/opt/SUNWdtrt/tst/common/tracemem/err.D_TRACEMEM_ADDR.badaddr.d
/opt/SUNWdtrt/tst/common/tracemem/err.D_PROTO_ARG.badsize.d
/opt/SUNWdtrt/tst/common/tracemem/err.D_TRACEMEM_SIZE.zerosize.d
/opt/SUNWdtrt/tst/common/union
/opt/SUNWdtrt/tst/common/union/err.D_DECL_INCOMPLETE.recursive.d
/opt/SUNWdtrt/tst/common/union/err.D_DECL_INCOMPLETE.simple.d
/opt/SUNWdtrt/tst/common/union/err.D_DECL_COMBO.UnionWithoutColon1.d
/opt/SUNWdtrt/tst/common/union/err.D_ADDROF_VAR.UnionPointer.d
/opt/SUNWdtrt/tst/common/union/err.D_DECL_INCOMPLETE.order.d
/opt/SUNWdtrt/tst/common/union/tst.UnionAssoc.d
[...over 1000 lines truncated...]

[edit] Useability

[edit] privileges

DTrace provides three different privilegs to the Solaris Least Privilege feature, so that different levels of DTrace access can be assigned to non-root users. The privileges are called dtrace_user, dtrace_proc and dtrace_kernel.

For example, here the brendan user has the dtrace_user privilege, and has a limited set of probes available when compared to root,

brendan$ ppriv $$
849:    bash
flags = <none>
        E: basic,dtrace_user
        I: basic,dtrace_user
        P: basic,dtrace_user
        L: all
brendan$ dtrace -l | wc -l
     471

root# dtrace -l | wc -l
   38456

[edit] JNI Binding

There is a JNI binding so that DTrace can be easily used from a Java program. See the website for details.

The following is an example Java code snippet that uses the JNI binding,

import org.opensolaris.os.dtrace.*;
[...]
    Consumer consumer = new LocalConsumer();
    consumer.addConsumerListener(new ConsumerAdapter() {
        public void dataReceived(DataEvent e) {
            System.out.println(e.getProbeData());
        }
    });
 
    try {
        consumer.open();
        consumer.compile(file, macroArgs);
        consumer.enable();
        consumer.go();
[...]

[edit] Tools

[edit] DTraceToolkit

The DTraceToolkit is a collection of over 100 documented DTrace scripts, ready for use.

The following is iosnoop (just one useful script for many), which traces disk activity live,

root@laptop:/opt/DTT> ./iosnoop
  UID   PID D    BLOCK   SIZE       COMM PATHNAME
    0  3221 R   380304   4096     dtrace /usr/lib/libdtrace.so.1
  100  3222 R   399064   8192         ls /usr/lib/libc/libc_hwcap1.so.1
  100  3222 R  5390542   1024         ls /export/home/brendan/Audio/Review
  100  3222 R   398304   4096         ls /usr/lib/libc/libc_hwcap1.so.1
  100  3223 R    54528   8192       bash /sbin/uname
  100  3223 R    54544   2048       bash /sbin/uname
  100  3224 R    36128   8192       bash /usr/sbin/df
  100  3224 R    36144   8192       bash /usr/sbin/df
  100  3224 R    36160   4096       bash /usr/sbin/df
  100  3224 R    36168   4096       bash /usr/sbin/df
  100  3224 R   343272   4096         df /lib/libcmd.so.1
  100  3224 R   398928   4096         df /usr/lib/libc/libc_hwcap1.so.1
  100  3224 R   398920   4096         df /usr/lib/libc/libc_hwcap1.so.1
  100  3224 R   716512   4096         df /dev/
  100  3225 R  3843568   5120         ls /etc
[...]

Many of the scripts are written to behave like regular CLI tools with options and arguments. For example, runinng -h for usage help,

# ./iosnoop -h
USAGE: iosnoop [-a|-A|-DeghiNostv] [-d device] [-f filename]
               [-m mount_point] [-n name] [-p PID]
       iosnoop          # default output
                -a      # print all data (mostly)
                -A      # dump all data, space delimited
                -D      # print time delta, us (elapsed)
                -e      # print device name
                -g      # print command arguments
                -i      # print device instance
                -N      # print major and minor numbers
                -o      # print disk delta time, us
                -s      # print start time, us
                -t      # print completion time, us
                -v      # print completion time, string
                -d device       # instance name to snoop
                -f filename     # snoop this file only
                -m mount_point  # this FS only
                -n name         # this process name only
                -p PID          # this PID only
   eg,
        iosnoop -v      # human readable timestamps
        iosnoop -N      # print major and minor numbers
        iosnoop -m /    # snoop events on filesystem / only

[edit] chime GUI

Chime is a GUI for DTrace written in Java, and allows custom monitoring GUIs to be constructed quickly.

The following are some screenshots (from the OpenSolaris.org website),

Image:dtrace_chime_01.gif

Image:dtrace_chime_02.gif

Solaris Internals
Personal tools
The Books
The Ads