DTrace Topics Style

From Siwiki

Jump to: navigation, search

Contents

[edit] DTrace Topics: Style

This article is about DTrace programming style, and is part of the DTrace Topics collection. A general understanding of DTrace is assumed knowledge, which can be studied from the DTrace Topics: Intro section.

DTrace is a dynamic troubleshooting and analysis tool first introduced in the Solaris 10 and OpenSolaris operating systems. DTrace processes a programming language called 'D'.

The following are D programming language style suggestions written by Brendan Gregg, creator of the DTraceToolkit and co-author of Solaris Performance and Tools.

Completion Image:trafficlight_yellow02.png
Difficulty Image:coffeemug01.png Image:coffeemug01.png Image:coffeemug01.png Image:coffeemug01.png
Audience DTrace users who are publishing scripts

[edit] Intro

Many languages have well defined styles to follow, they help the programmer write in a neat consistant way and in the long run make fewer mistakes. They also help other programmers rapidly understand what they have written. These styles are documented in style guides, or "programming best practices".

Styles are borne from experience - finding out, often the hard way, what does and doesn't work. Since early 2004 I've written over a thousand DTrace scripts, around a hundred of which are in the DTraceToolkit. I've learnt a lot about what does and doesn't work, and have adapted my style numerous times. My last few style changes were costly, as it meant updating and retesting dozens of scripts from the DTraceToolkit. Hopefully I can save you similar pain, by documenting what I have learnt.

Some of this style is derived from cstyle, the tool used to check the style of the Solaris and OpenSolaris source to ensure it matches "Bill Joy Normal Form". I often run cstyle itself on DTrace scripts, as most of the warnings are still appropriate (thanks to the similarity of D to C programming).

This style will help if you wish to publish your scripts on the Internet, and is crucial for scripts in the DTraceToolkit. It is similar but not the same as the style used in the DTrace Guide (which was written by the authors of DTrace itself). For that reason this is known as the "DTraceToolkit Style", not the official "DTrace Style" as used in the DTrace Guide.

When using this guide, you may find that you don't agree on every point made. That is fine - these are suggestions. I don't agree on all the details of other programming style guides, however I usually follow them as obeying a standard can provide greater value than to use my own pecularities.

[edit] Generic Coding

The following are the same as C programmang in the style enforced by cstyle.

[edit] Line width of 80 chars

Each line width must not exceed 80 characters (with a tabstop of 8). The entire Solaris kernel is <= 80 chars, so your small DTrace script should have no problem. A soft limit of 79 characters is encouraged, for a few reasons, including avoiding a problem on some terminals which auto-join copy-n-pasted text which hits 80 chars.

The following is BAD,

printf("%-20s %7s %7s %7s %7s %7s %7s %7s\n", "Time", "scall/s", "sread/s", "swrit/s", "fork/s", "exec/s", "open/s", "stat/s"); 

The following is GOOD,

printf("%-20s %7s %7s %7s %7s %7s %7s %7s\n", "Time", "scall/s", "sread/s",
    "swrit/s", "fork/s", "exec/s", "open/s", "stat/s");

The following is also GOOD,

printf("%-20s %7s %7s %7s %7s %7s %7s %7s\n", "Time",
    "scall/s", "sread/s", "swrit/s", "fork/s", "exec/s", "open/s", "stat/s");

It may make logical sense to take a new line at a different point other than the first opportunity before the 80 char limit, it may not. So long as 80 characters isn't exceeded.

[edit] Line continuation of 4 chars

If a line exceeds the line width (80 chars), the remaining data can be placed on the following line with an indentation of 4 characters.

The following is BAD,

printf("%-20s %7s %7s %7s %7s\n",
       "Time", "scall/s", "sread/s", "swrit/s", "fork/s");   

The following is GOOD,

printf("%-20s %7s %7s %7s %7s\n",
    "Time", "scall/s", "sread/s", "swrit/s", "fork/s");

[edit] Term seperator

Terms seperated by a comma must have a space after the comma.

The following is BAD,

printf("%6s %-16s %1s %s\n","PID","CMD","D","BYTES");   

The following is GOOD,

printf("%6s %-16s %1s %s\n", "PID", "CMD", "D", "BYTES");   

[edit] Comments

Comments are either a line comment or a block comment, of a very particular style (cstyle).

The following is BAD,

/******************
 * Process io start
 ******************/
io:::start
{
        /* fetch
           details */
        this->size = args[0]->b_bcount;    /*** b_count is bytes ***/   

The following is GOOD,

/*
 * Process io start
 */
io:::start
{
        /* fetch details */
        this->size = args[0]->b_bcount;    /* b_count is bytes */   

The cstyle tool is very strict with comment formatting. For example, the first line in the above GOOD example must not have a trailing space.

[edit] cstyle

The remaining rules for coding can be learned by using the cstyle tool.

The following is BAD,

# cstyle cputypes.d
cputypes.d: 42: missing space around assignment operator
cputypes.d: 47: comma or semicolon followed by non-blank
cputypes.d: 48: missing space around assignment operator
cputypes.d: 55: continuation line not indented by 4 spaces   
cputypes.d: 59: line > 80 characters
cputypes.d: 62: improper first line of block comment
cputypes.d: 62: missing blank after open comment
cputypes.d: 66: indent by spaces instead of tabs
cputypes.d: 68: last line in file is blank

The following is GOOD,

# cstyle cputypes.d
cputypes.d: 42: missing space around assignment operator   

We allow the warning for line 42 as it is a DTrace directive that cstyle does not understand,

# sed '42!d' cputypes.d
#pragma D option bufsize=64k   

[edit] DTrace Specific

[edit] Fully qualified probe names

When specifying probes, you must use all four fields (if available), provider:module:function:name. The shortcuts that DTrace allows are suitable for when hacking at the command line, however for scripting it is both clearer and safer to specify the full name.

The following is BAD,

fork1:entry   

The following is GOOD,

syscall::fork1:entry   

In fact, the BAD example above is especially bad as it matches the fork1 probe in both the fbt and the syscall providers - producing duplicated results. If you write such a shortcut that only matches the desired probes now, in a future version of Solaris more probes may be added such that it becomes incorrect. To be safe, always fully qualify.

For consistancy, fully qualify the BEGIN probe as well,

dtrace:::BEGIN   

This is the greatest deviation to the DTrace Guide style, which often uses just "BEGIN" to specify this probe.

[edit] BEGIN with a printf

When you run scripts that use the quiet pragma, the BEGIN statement must print something to let the user know when DTrace has begun tracing. This may be a header, or a message to say that tracing has begun.

The following is BAD,

# ./awkward_silence.d   

The following is GOOD,

# ./bitesize.d   
Tracing... Hit Ctrl-C to end.   

The following is also GOOD,

# ./dnlcsnoop.d   
PID CMD           TIME HIT PATH   

And the following is FINE,

# ./readbytes.d
dtrace: script './readbytes.d' matched 4 probes   

which is the default behaviour of DTrace, and does indeed note when tracing has begun.

[edit] Sampling/Tracing...

Scripts that collect data and then print a report when Ctrl-C is hit must print a BEGIN message. That BEGIN message should convey the behaviour of your script.

The following is BAD,

# ./mystery.d   
Somehow gathering data... Exit the usual way.

If your script traces events (eg, io:::start), then the following is GOOD,

# ./bitesize.d   
Tracing... Hit Ctrl-C to end.   

If your script samples data (eg, profile:::profile-1000hz), then the following is GOOD,

# ./pridist.d
Sampling... Hit Ctrl-C to end.   

Whenever the user sees "Sampling", it informs them that the script may be subject to sampling errors and that the rate may need to be customised.

[edit] Units

Scripts that output numbers are encouraged to provide units in the output if space permits. The following points explain usage.

  • Preferred usage is of the form "Kbytes/sec", however this length may be more suited to documentation.
  • A shorter version of "Kbytes/sec" is "KB/s", which may be more suited for command outputs.
  • Kilobits can be written as "Kb", or better "Kbits" to avoid confusion.
  • For column headers, more caps are allowed for the longer forms: eg, "KBYTES/s" and "KBITS/s".
  • 1 Kbyte = 1024 bytes; and 1 Kbit = 1000 bits.
    No SI binary prefixes yet (KiB/kibibyte), but this may change in the future. For now it shouldn't be a problem - DTrace scripts are short enough that people can read them to see what was used.
  • For rate data, it is best to present the output in per second units.
  • If per interval units are used, writing "Kbytes/int" or "KB/i" should be used.

The following is BAD,

# ./measure.d   
Tracing... Hit Ctrl-C to end.   
^C
Average: 152

The following is GOOD,

# ./measure.d   
Tracing... Hit Ctrl-C to end.   
^C
Average: 152 Kbytes/sec

[edit] Truncating

If your script truncates output, you should report this as part of the output.

The following is BAD,

# ./agg.d 
Tracing... Hit Ctrl-C to end.
^C
Top syscalls,
   writev                288
   write                 406
   pollsys              1278
   read                 1349
   ioctl                1529

The following is GOOD,

# ./agg.d 
Tracing... Hit Ctrl-C to end.
^C
Top 5 syscalls,
   writev                288
   write                 406
   pollsys              1278
   read                 1349
   ioctl                1529

The description is now "Top 5".

An exception to this may be prstat or top style scripts, which refresh the screen. Truncation from these style of tools is expected. It should still however be clearly documented.

[edit] Output width of 80 chars

You script output must under normal circumstances fit within an 80 character width.

The following is BAD,

# ./syscalls.d 
Tracing... Hit Ctrl-C to end.
^C
Top 5 syscalls,

  EXEC                                                SYSCALL                                                       COUNT
  dtrace                                              ioctl                                                           147
  xmms                                                pollsys                                                         165
  xmms                                                ioctl                                                           262
  Xorg                                                pollsys                                                         285
  Xorg                                                read                                                            332

The following is GOOD,

# ./syscalls.d 
Tracing... Hit Ctrl-C to end.
^C
Top 5 Syscalls,

 EXEC             SYSCALL             COUNT
 dtrace           ioctl                 147
 xmms             pollsys               165
 xmms             ioctl                 262
 Xorg             pollsys               285
 Xorg             read                  332

The following is FINE,

# ./open.d
Tracing... Hit Ctrl-C to end.
^C
 Top 5 Pathnames Opened,

     COUNT PATHNAME
         1 /var/sadm/pkg/SUNWstaroffice-gnome-integration/save/pspool/SUNWstaroffice-gnome-integration/install
         1 /var/sadm/pkg/SUNWstaroffice-gnome-integration/save/pspool/SUNWstaroffice-gnome-integration/pkginfo
         1 /var/sadm/pkg/SUNWstaroffice-gnome-integration/save/pspool/SUNWstaroffice-gnome-integration/pkgmap
         2 /etc/resolv.conf
         3 /etc/svc/volatile/repository_door

Truncating pathnames may be a bigger crime than exceeding 80 chars, and ls -l and find have set a precedant for this anyway. In this case, the style is to place the pathname field (the most varying field) as the right most field.

[edit] Variable types

  • Temporary calculations within a clause should use this-> variables.
  • Global variables should be avoided if possible (they can hurt performance).

[edit] Memory cleanup

Variables must be set to zero after final use, especially global hashes and thread local variables (self->). Otherwise, memory is leaked and you may encounter dynamic variable drops.

The following is BAD,

syscall::read:return
/self->start/
{
        @latency = quantize(timestamp - self->start);
}

The following is GOOD,

syscall::read:return
/self->start/
{
        @latency = quantize(timestamp - self->start);
        self->start = 0;
}

Assuming self->start wasn't needed after that clause.

[edit] Variable names

[edit] Creating scripts

[edit] Research first

[edit] Grok the topic

[edit] Write test cases

[edit] Platforms

[edit] Measure script impact

[edit] Documentation

[edit] Man pages

[edit] Example files

[edit] Use ./

[edit] Script wrapping

[edit] Don't

[edit] Options

[edit] Language

Solaris Internals
Personal tools
The Books
The Ads