DTrace Topics Developer

From Siwiki

Jump to: navigation, search

Contents

[edit] DTrace Topics: Developer

This article is about using DTrace for Application Developers, and is part of the DTrace Topics collection. A general understanding of DTrace is assumed knowledge, which can be studied from the DTrace Topics: Intro section.

DTrace is a dynamic troubleshooting and analysis tool first introduced in the Solaris 10 and OpenSolaris operating systems.

Completion Image:trafficlight_yellow02.png
Difficulty Image:coffeemug01.png Image:coffeemug01.png
Audience Software developers

[edit] Application Developers

DTrace can be used by application developers for both troubleshooting and performance analysis.

[edit] DTrace Capabilities

DTrace has numerous capabilities, such as those described in the Intro. Of most interest to application developers include the following:

  • DTrace can be executed on production customer environments without restarting the application.
    Some problems only manifest in production, and can't be replicated in development or testing.
    Customers run software on unusual or large configurations that would be expensive to replicate in testing.
  • DTrace can examine the entire software stack from one tool.
    The bugs you are looking for may actually be in user-libraries, the kernel or in device drivers; DTrace allows you to examine all software on the system, not just your own.
  • DTrace can fetch nanosecond timestamps from almost every function entry and return.
    Prove where latencies are by measuring them, no matter where they lurk. This helps justify the investment of programming time to fix issues.
  • DTrace provides a framework for static tracing.
    This allows custom debugging probes to be placed in code, which exist in production code but are only enabled when needed.

[edit] DTrace Strategies

Strategies for using DTrace depends on the language of the application, but usually includes the following:

  • sample - Use the profile provider to sample application and kernel stack traces.
While this is simple and prone to sampling errors, it can get fast results.
  • trace - Use a language provider to trace your own application code function entries and returns.
Watch the functions you wrote as they execute, fetching nanosecond timestamps to determine where latencies are.
  • arguments - Examine function entry arguments and return values.
  • static probes - Insert static probes into application code for tracing application specific events.
Create standard probes with names and arguments that are application meaningful. transaction-start, transaction-end, etc.
  • system analysis - Use the other DTrace providers to examine system behaviour.
Other debuggers and analysers only show you some layers of the software stack. DTrace sees all.
  • simplify - Start by writing mini-apps and practice DTracing those.
Your application may be hideously complex, so start with simple programs ("Hello World!" onwards) and DTrace those first, to become comfortable with DTrace. See DTrace_Topics_Exercises.

The following sections provide suggestions and examples for analysis.

[edit] Providers

The following DTrace providers may be of use to application developers, depending on the language of the application:

Selected DTrace Providers
Provider Application Language Description
profile C, C++, Java The profile provider can sample stack traces at a configurable rate (for example, 1000hz). This is a simple but often effective way to determine if there are hotspots in application code. Uses include:
  • C (user-level): the ustack() action retrieves the user-level stack trace.
  • C (kernel): the stack() action retrieves the kernel-level stack trace.
  • C++: the ustack() action can be used, with the DTrace output piped through c++filt to improve readability.
  • Java: the jstack() action can be used to fetch a stack trace with Java translations.
pid C, C++, * The pid provider can trace most function entries and returns of user-level C and C++ software, and individual CPU instructions within functions.

With some effort, the pid provider can also be used to examine the operation of all other languages (shell, Perl, Ruby, Python, Cobol, etc), by either tracing the operation of the interpreter or the execution flow of compiled code. Some of this work isn't for the feint-hearted, but there are certainly things to try.

hotspot Java The hotspot provider was introduced into JDK 6.0, and can examine class events, method execution and other JVM events. See the Java section.
mozilla JavaScript The trace_mozilla provider allows tracing of JavaScript function entries and returns. It is in active development, and requires compliation of mozilla to get working. See the OpenSolaris site

There are many more providers in DTrace for examining system and kernel events, such as fbt which provides tracing for almost every kernel function entry and return.

[edit] Profiling

The term profiling in DTrace means sampling, and can be achieved with the profile provider. Profiling is especially useful for performance analysis.

profile Probes
Probe Name Description
profile:::profile-* to sample across all CPUs.
profile:::tick-* to execute on one CPU.

The easiest action to use with the profile:::profile- probes, is to count the frequency of sampled stack traces. Sampling can usually identify which code is making the CPUs busy, as that code will be on-CPU when the sample is taken.

However many types of application latency won't be found by sampling on-CPU stack traces. If an application is slow because it is waiting on slow I/O (disk or network), then sampling on-CPU stack traces won't find the latency - as the application isn't on-CPU. For these types of latency we need to trace events, which can be achieved using language providers or on-CPU and off-CPU events from the sched provider.

So while profiling won't identify all latencies, it is easy to do and it is worth a try.

[edit] C

The following one-liner demonstrates profiling on the "xmms" multimedia player, which is written in C,

# dtrace -n 'profile-1001 /pid == $target/ { @num[ustack()] = count(); }' -p `pgrep -n xmms`
dtrace: description 'profile-1001 ' matched 1 probe
^C

[...output truncated...]

             libc.so.1`ioctl+0x7
             libmpg123.so`0xfe4d61f7
             xmms`input_get_time+0x1c
             xmms`idle_func+0x44
             libglib-1.2.so.0.0.10`g_timeout_dispatch+0x1e
             libglib-1.2.so.0.0.10`g_main_dispatch+0xf1
             libglib-1.2.so.0.0.10`g_main_iterate+0x565
             libglib-1.2.so.0.0.10`g_main_run+0x33
             libgtk-1.2.so.0.9.1`gtk_main+0xc6
             xmms`main+0x10cb
             xmms`_start+0x80
              33

             libgdk-1.2.so.0.9.1`gdk_rgb_convert_0888+0x78
             libgdk-1.2.so.0.9.1`gdk_rgb_convert_indexed_generic+0x5d
              libgdk-1.2.so.0.9.1`gdk_draw_rgb_image_core+0x19c
             libgdk-1.2.so.0.9.1`gdk_draw_indexed_image+0x8e
             libbscope.so`0xfe381d99
             xmms`vis_send_data+0xaf
             xmms`input_update_vis+0x99
             xmms`idle_func+0x76
             libglib-1.2.so.0.0.10`g_timeout_dispatch+0x1e
             libglib-1.2.so.0.0.10`g_main_dispatch+0xf1
             libglib-1.2.so.0.0.10`g_main_iterate+0x565
             libglib-1.2.so.0.0.10`g_main_run+0x33
             libgtk-1.2.so.0.9.1`gtk_main+0xc6
             xmms`main+0x10cb
             xmms`_start+0x80
              45

This ino-liner sampled the stack trace for the most recent xmms process, at 1001 Hertz. Many screenfulls of output were produced, with the most commonly sampled stack traces at the end. This shows that for 45 samples, gdk routines were on the CPU, which were called by xmms's vis_send_data(); for 33 samples, libc's ioctl was on the CPU, called by xmms's input_get_time().

The ustack() action collects the user stack, which is translated into symbols when the dtrace consumer prints the output. If your process has ended by then, you will only see hexidecimal addresses.

The forms of ustack() are,

void ustack(int nframes, int strsize)
void ustack(int nframes)
void ustack(void)

Stack traces can get long, beyond the default sized string buffers in DTrace. If you think you have had truncation, try ustack(64, 2048), which allows each stack to reach 64 lines of output and 2 Kbytes of text. These values can be tuned globally via ustackframes and strsize. In some scenarios, DTrace will print warnings if it had to truncate.

[edit] C++

Examining C++ stack traces can be performed in the same way as with C, however the output can be improved if piped through the c++filt program.

C++ stack frames can become very long, meaning you are likely to want to increase the ustackframes and strsize globals, or set these as arguments when calling ustack().

[edit] Java

Capturing Java stack traces can be performed in a similar way to C, but using the jstack() action instead of ustack(). Tunables for jstack() are jstackframes and jstackstrsize, if your stack traces become long and exceed the default buffers.

See DTrace Topics: Java for demonstrations of jstack().

[edit] Application Tracing

Depending on the application language, DTrace usually has the capability to watch as application functions are entered and returned, and may also be able to examine function arguments and return values.

[edit] Probe Overhead

If you trace frequent events (many thousand a second), the overheads from DTrace can start to become noticable. There are a few different ways to estimate overhead discussed in DTrace Topics: Overhead; the following is one that may be easiest for application developers:

[edit] Test Workload

Create a test workload that can be applied with consistant results. Run that workload and time how long it took,

  • Normally
  • With DTrace tracing
  • With your standard analysers/profilers collecting data

Ideally your workload would run for about a minute, and would apply a variety of events to simulate real world activity (if that's too hard, dummy events are still better than not performing this test at all). A few calculations based on the recorded times should show the overheads of DTrace, and the overheads of your usual analyser tools (for comparison).

Consider effects such as caching - you may need to flush caches or ignore the first run, and make multiple runs and average the results.

A simple example is as follows.

[edit] System Tracing

Solaris Internals
Personal tools
The Books