DTrace Topics Limitations

From Siwiki

Jump to: navigation, search

Contents

[edit] DTrace Topics: Limitations

This article is about DTrace Limitations, and is part of the DTrace Topics collection. A good understanding of DTrace is assumed knowledge, which can be studied from the DTrace Topics: Intro section.

DTrace is a dynamic troubleshooting and analysis tool first introduced in the Solaris 10 and OpenSolaris operating systems.

Completion Image:trafficlight_yellow02.png
Difficulty Image:coffeemug01.png Image:coffeemug01.png Image:coffeemug01.png Image:coffeemug01.png
Audience All DTrace users

[edit] Limitations

There are some things that DTrace simply can't do, some that it can't currently do, and some that it can do but isn't good at. The following sections document them for DTrace on recent builds of Solaris.

[edit] Can't do

[edit] Kernel instruction tracing

DTrace can't watch each individual CPU instruction for kernel functions (DTrace can watch these for user-level functions). From DTrace's point of view, a kernel function enters then returns - we don't see the instruction path directly.

This hasn't proven to be a real problem. There are numerous clever ways to figure out what the instruction path actually was; they include:

  • Use arg0 from function return to see the function offset of the return (may indicate which branch was taken).
  • Trace functions that occured within this function.
    This approach will almost always identify code path. It can't be used if the function genuinely calls nothing.
  • Trace variables at the function return that would identify code path.
    You usually need to cache pointers as they are used in other functions, or even use the CPU register variables to read variables that aren't referenced by any function.
  • Use the profile provider to sample the instruction counter, eg, at 4983 Hertz.
    It may help to create a load which calls your function frequently, so that it is often on the CPU to be sampled.
  • If all else fails, should an SDT probe be placed to identify this code path?

[edit] Can't do yet

[edit] PIC observability

To be able to configure and read the CPU performance instrumentation counters would be quite valuable for fetching low level hardware metrics. DTrace can't do this yet, although there is a project at Sun to attempt to achieve this.

[edit] Isn't Suited For

[edit] Security Auditing

DTrace was designed as a troubleshooting and analysis tool, not a security tool. The visibility DTrace provides can solve many security monitoring problems, however there is a problem - DTrace can drop events if the system becomes too busy, and even abort tracing entirerly.

As a saftey measure, if DTrace detects that it is degrading system performance it will either drop events or abort execution. This can happen if you attempt to analyze too many events (for example, every function entry in the kernel). Great for troubeshooting, bad for security. If DTrace was used as a security auditing tool, then the bad guys already know one way to defeat the auditing system. There is no supported way to turn off this saftey measure (no, really, you don't want to turn it off).

The good news is that you can detect if something bad happened to DTrace. If DTrace drops events, it prints messages to STDERR; if DTrace aborts, then the process is gone. If you were determined to use DTrace as a security auditing tool, then you could write a framework to take meaningful action when these events occured. But before you do that -- are you sure that the supported Solaris Auditing software (aka BSM auditing), can't solve your auditing needs instead?

[edit] Fast Monitoring

While DTrace has low overhead while in use, it isn't zero overhead. If you are tracing events that happen less than 1000 times a second, then that overhead should be negligible. If you start tracing events that occur much more frequently (eg 100,000 times a second), then you will start to notice the DTrace overhead - which could reach 50% of the CPU capacity.

An example may be an application server which processes thousands of transactions per second. DTrace can trace each transaction and provide a detailed latency breakdown across the software stack - great! Not so great if those latency metrics were affected by the DTrace overhead. Solutions:

  • Pick fewer events to DTrace.
  • Trace the application server when it isn't so busy, or in development with a smaller test load.
    While some issues may only appear under heavy load, you may get enough information under smaller load to estimate what the bottlenecks would be.
  • DTrace for short intervals and assume that all time measurements are scaled by the same factor due to overheads.
    Even if they aren't, you may still be able to spot serious hotspots.
  • Place time metrics in the application code (if you wern't doing this already).
    DTrace could be used to read those metrics.
  • Use the profile provider and sample at a fixed rate of 1000hz or so, rather than trace at a rate that scales with load.
Solaris Internals
Personal tools
The Books